GCC supports some function attributes for function multi-versioning:a way for a function to have multiple implementations, each using adifferent set of ISA extensions. A function attribute specifiesdifferent requirements of ISA extensions. The generated program decodesthe CPU model and features at run-time, and picks the most restrictiveimplementation which is satisfied by the CPU, assuming that the mostrestrictive implementation has the best performance.
__attribute__((target(...)))
__attribute__((target(...)))
has been available for along time, even before attributes for function multi-versioning wereintroduced. Here are some links to relevant documentation.
Usually we use different function names for different implementationsand define a dispatch function. This approach is like a manual ifunc.1
2
3
4
5
6
7
8
9
10
11extern int flags;
static __attribute__((target("default"))) int foo_default(int a) { return a & a-1; }
static __attribute__((target("arch=x86-64-v2"))) int foo_v2(int a) { return a & a-1; }
static __attribute__((target("arch=x86-64-v3"))) int foo_v3(int a) { return a & a-1; }
int foo(int a) {
if (flags & 2) return foo_v3(a);
if (flags & 1) return foo_v2(a);
return foo_default(a);
}
The function bodies are duplicated. We can define a[[gnu::always_inline]]
function shared by the differentimplementations. 1
2
3
4__attribute__((always_inline)) static inline foo_impl(int a) { return a & a-1; }
static __attribute__((target("default"))) int foo_default(int a) { return foo(a); }
static __attribute__((target("arch=x86-64-v2"))) int foo_v2(int a) { return foo(a); }
static __attribute__((target("arch=x86-64-v3"))) int foo_v3(int a) { return foo(a); }
Let's check the behavior of an external linkage. In C++ mode, GCC andClang emit two symbols _Z3foov
and_Z3foov.sse4.2
for the following program: 1
2__attribute__((target("default"))) int foo(void) { return 0; }
__attribute__((target("sse4.2"))) int foo(void) { return 1; }
In C mode, GCC reports error: redefinition of ‘foo’
.Clang emits two symbols foo
andfoo.see4.2
.
With more than one declaration, the compiler merges the attributes.1
2
3
4
5
6
7int foo(void);
__attribute__((target("avx2"))) int foo(void) { return 0; }
//---
__attribute__((target("avx2"))) int foo(void);
int foo(void) { return 0; }
__attribute__((target_clones(...)))
This is the first attribute that GCC introduced to convenientfunction multi-versioning. Since GCC 6, we can just define one functionwith the attribute specifying all supported targets.
1 | // b.c |
See the GCC doc (Common Function Attributes) and Attributesin Clang#target_clones. Clang only supports some basic forms, notarch=
.
For the above function, GCC emits three implementationsfoo.default
, foo.arch_x86_64_v2
, andfoo.arch_x86_64_v3
. foo
is a dispatch functionwhich selects one of the implementations. This is implemented as a GNUindirect function (ifunc). The ifunc resolver is called once by rtld atthe relocation resolving phase. The resolver references a function and avariable defined in the runtime (libgcc).
1 | .section .text.foo.resolver,"axG",@progbits,foo.resolver,comdat |
As an ifunc, foo
defeats interprocedural optimizations.We can see that foo_plus_1
does not inlinefoo
.
The attribute can apply to a non-definition declaration.foo.default
, foo.arch_x86_64_v2
, andfoo.arch_x86_64_v3
are undefined symbols while (GCC:foo
, Clang: foo.ifunc
) andfoo.resolver
remain as definitions. 1
2
3
4// a.c
__attribute__((target_clones("default","arch=x86-64-v2","arch=x86-64-v3")))
int foo(int a);
int main(void) { foo(0); }
In llvm-project, compiler-rt provides an alternativeimplementation.
The runtime executes cpuid
, extracts information aboutthe x86 family model and available CPU features, and stores them into__cpu_model
and __cpu_features2
. The resolverdecodes the information and selects the best implementation.
The support is missing/incomplete as of GCC 12 and Clang 16.0.
1 | __attribute__((target_clones("sha2+memtag2", "fcma+sve2-pmull128"))) |
(compiler-rt/lib/builtins/cpu_model.c
defines somesymbols like __aarch64_have_lse_atomics
. GCCcommit)
__attribute__((cpu_dispatch(...)))
and __attribute__((cpu_specific(...)))
Supported by Intel C++ Compiler and later ported to Clang. GCCdoesn't support the two attributes. They feel like legacy and are asubset of target_clones
.
The declaration and definition can be in different translation unitslike target_clones
, but different attributes are used.
1 | echo '__attribute__((cpu_dispatch(ivybridge, atom, sandybridge))) void foo(void); int main(void) { foo(); }' > a.c |
__attribute__((target_version(...)))
Arm C Language Extensions introduced a new GNU attributetarget_version
.
1 | int __attribute__((target_version("default"))) tv(void) { return 0; } |
The semantics are not very clear in the latest Clang. GCC does notsupport the attribute as of 2023-02.