IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    Function multi-versioning

    MaskRay发表于 2023-02-06 00:35:46
    love 0

    GCC supports some function attributes for function multi-versioning:a way for a function to have multiple implementations, each using adifferent set of ISA extensions. A function attribute specifiesdifferent requirements of ISA extensions. The generated program decodesthe CPU model and features at run-time, and picks the most restrictiveimplementation which is satisfied by the CPU, assuming that the mostrestrictive implementation has the best performance.

    __attribute__((target(...)))

    __attribute__((target(...))) has been available for along time, even before attributes for function multi-versioning wereintroduced. Here are some links to relevant documentation.

    • https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#:~:text=target%20(
    • Attributesin Clang#target
    • https://gcc.gnu.org/onlinedocs/gcc/AArch64-Function-Attributes.html
    • https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html

    Usually we use different function names for different implementationsand define a dispatch function. This approach is like a manual ifunc.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    extern int flags;

    static __attribute__((target("default"))) int foo_default(int a) { return a & a-1; }
    static __attribute__((target("arch=x86-64-v2"))) int foo_v2(int a) { return a & a-1; }
    static __attribute__((target("arch=x86-64-v3"))) int foo_v3(int a) { return a & a-1; }

    int foo(int a) {
    if (flags & 2) return foo_v3(a);
    if (flags & 1) return foo_v2(a);
    return foo_default(a);
    }

    The function bodies are duplicated. We can define a[[gnu::always_inline]] function shared by the differentimplementations.

    1
    2
    3
    4
    __attribute__((always_inline)) static inline foo_impl(int a) { return a & a-1; }
    static __attribute__((target("default"))) int foo_default(int a) { return foo(a); }
    static __attribute__((target("arch=x86-64-v2"))) int foo_v2(int a) { return foo(a); }
    static __attribute__((target("arch=x86-64-v3"))) int foo_v3(int a) { return foo(a); }

    Let's check the behavior of an external linkage. In C++ mode, GCC andClang emit two symbols _Z3foov and_Z3foov.sse4.2 for the following program:

    1
    2
    __attribute__((target("default"))) int foo(void) { return 0; }
    __attribute__((target("sse4.2"))) int foo(void) { return 1; }

    In C mode, GCC reports error: redefinition of ‘foo’.Clang emits two symbols foo andfoo.see4.2.

    With more than one declaration, the compiler merges the attributes.

    1
    2
    3
    4
    5
    6
    7
    int foo(void);
    __attribute__((target("avx2"))) int foo(void) { return 0; }

    //---

    __attribute__((target("avx2"))) int foo(void);
    int foo(void) { return 0; }

    __attribute__((target_clones(...)))

    This is the first attribute that GCC introduced to convenientfunction multi-versioning. Since GCC 6, we can just define one functionwith the attribute specifying all supported targets.

    1
    2
    3
    4
    5
    // b.c
    __attribute__((target_clones("default","arch=x86-64-v2","arch=x86-64-v3")))
    int foo(int a) { return a & a-1; }

    int foo_plus_1(int a) { return foo(a) + 1; }

    See the GCC doc (Common Function Attributes) and Attributesin Clang#target_clones. Clang only supports some basic forms, notarch=.

    For the above function, GCC emits three implementationsfoo.default, foo.arch_x86_64_v2, andfoo.arch_x86_64_v3. foo is a dispatch functionwhich selects one of the implementations. This is implemented as a GNUindirect function (ifunc). The ifunc resolver is called once by rtld atthe relocation resolving phase. The resolver references a function and avariable defined in the runtime (libgcc).

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
            .section        .text.foo.resolver,"axG",@progbits,foo.resolver,comdat
    .p2align 4
    .weak foo.resolver
    .type foo.resolver, @function
    foo.resolver:
    subq $8, %rsp
    call __cpu_indicator_init@PLT
    movq __cpu_features2@GOTPCREL(%rip), %rax
    movl 8(%rax), %eax
    testb $2, %al
    je .L8
    leaq foo.arch_x86_64_v3(%rip), %rax
    .L7:
    addq $8, %rsp
    ret
    .L8:
    testb $1, %al
    leaq foo.arch_x86_64_v2(%rip), %rdx
    leaq foo.default(%rip), %rax
    cmovne %rdx, %rax
    jmp .L7
    .size foo.resolver, .-foo.resolver
    .globl foo
    .type foo, @gnu_indirect_function
    .set foo,foo.resolver

    As an ifunc, foo defeats interprocedural optimizations.We can see that foo_plus_1 does not inlinefoo.

    The attribute can apply to a non-definition declaration.foo.default, foo.arch_x86_64_v2, andfoo.arch_x86_64_v3 are undefined symbols while (GCC:foo, Clang: foo.ifunc) andfoo.resolver remain as definitions.

    1
    2
    3
    4
    // a.c
    __attribute__((target_clones("default","arch=x86-64-v2","arch=x86-64-v3")))
    int foo(int a);
    int main(void) { foo(0); }

    In llvm-project, compiler-rt provides an alternativeimplementation.

    x86

    The runtime executes cpuid, extracts information aboutthe x86 family model and available CPU features, and stores them into__cpu_model and __cpu_features2. The resolverdecodes the information and selects the best implementation.

    AArch64

    The support is missing/incomplete as of GCC 12 and Clang 16.0.

    1
    2
    __attribute__((target_clones("sha2+memtag2", "fcma+sve2-pmull128")))
    void foo() {}

    (compiler-rt/lib/builtins/cpu_model.c defines somesymbols like __aarch64_have_lse_atomics. GCCcommit)

    __attribute__((cpu_dispatch(...)))and __attribute__((cpu_specific(...)))
    • Attributesin Clang#cpu_dispatch

    Supported by Intel C++ Compiler and later ported to Clang. GCCdoesn't support the two attributes. They feel like legacy and are asubset of target_clones.

    The declaration and definition can be in different translation unitslike target_clones, but different attributes are used.

    1
    2
    3
    echo '__attribute__((cpu_dispatch(ivybridge, atom, sandybridge))) void foo(void); int main(void) { foo(); }' > a.c
    echo '__attribute__((cpu_specific(ivybridge, atom, sandybridge))) void foo(void) {}' > b.c
    clang a.c b.c -o a
    __attribute__((target_version(...)))

    Arm C Language Extensions introduced a new GNU attributetarget_version.

    • https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning
    • Attributesin Clang#target_version
    1
    2
    int __attribute__((target_version("default"))) tv(void) { return 0; }
    int __attribute__((target_version("fp16+simd"))) tv(void) { return 1; }

    The semantics are not very clear in the latest Clang. GCC does notsupport the attribute as of 2023-02.



沪ICP备19023445号-2号
友情链接