The Linux kernel is written in C, but it also leverages extensionsprovided by GCC. In 2022, it moved from GCC/Clang-std=gnu89
to -std=gnu11
. This articleexplores my notes on how these GNU extensions are utilized within thekernel.
Statementexpressions are commonly used in macros.
Some macros use this extension to restart a for loop in a macro'sreplacement list.
1 | // include/linux/wait.h |
include/linux/instruction_pointer.h
utilizes the featureto obtain the current instruction pointer:
1 |
Bytecode interpreters often leverage this extension. BPF, forinstance, utilizes this extension. We can also see this extension indrm_exec_retry_on_contention
for the Direct RenderingManager.
When it comes to Position-Dependent Code (PDC), a switch statementwith a default label marked as __builtin_unreachable
can bejust as efficient as using labels as values. However, forPosition-Independent Code (PIC), absolute addresses offer a slightperformance edge, although at the cost of requiring more dynamicrelocations.
1 | static void *tbl[] = {&&do_ret, &&do_inc, &&do_dec}; |
typeof
and__auto_type
typeof
is used in numerous code.__auto_type
has a few occurrences.
C23 standardizes typeof
and auto
.
x ?: y
is used in numerous code.
The C standard (at least C11 and C23) specifies that:
If the member declaration list does not contain any named members,either directly or via an anonymous structure or anonymous union, thebehavior is undefined.
The empty structure extension enables a dummy structure when aconfiguration option disables the functionality.
1 |
|
This extension (case low ... high:
), frequently used inthe kernel, allows us to specify a range of consecutive values in asingle case label within a switch statement.
glibc 2.3.4 introduced _FORTIFY_SOURCE
in 2004 to catchsecurity errors due to misuse of some C library functions (primarilybuffer overflow). The implementation leverages inline functions and__builtin_object_size
.
Linux kernel introduced CONFIG_FORTIFY_SOURCE
in 2017-07.Like the userspace, the implementation relies on optimizations likeinlining and constant folding.
For example, include/linux/string.h
combines thisfeature with BUILD_BUG_ON
(__attribute__((error(...)))
) to provide compile-timeerrors:
1 |
The Clang implementation utilizes__attribute__((pass_object_size(type)))
and__attribute__((overloadable(...)))
.
Clang introduced__builtin_dynamic_object_size
in 2019 to possiblyevaluate the size dynamically. The feature was ported to GCC in 2021 andis used by the kernel.
#pragma GCC diagnostic
is occasionally used to disable a local diagnostic.
include/linux/hidden.h
uses #pragma GCC visibility("hidden")
to force direct access (instead of GOT indirection) for external symbolsfor -fpie/-fpic
.
bpf uses #pragma GCC poison
to forbid some undesired identifiers.
Structure-layoutpragmas are commonly used.
The Linux kernel uses inline assembly for a few reasons
GCC provides built-in functions to generate machine code instructionsdesigned for certain tasks like popcount/clz/ctz. The compiler has moreinformation about the function's purpose and leverages internaloptimizations tailored for these tasks.
__builtin_choose_expr
This is analogous to ? :
but the condition is a constantexpression and the return type is not altered by promotion rules. InGCC, __builtin_choose_expr
is not available in C++.
1 | // include/linux/math.h |
__builtin_constant_p
__builtin_constant_p
identifies whether an expressioncan be evaluated to a constant. This capability unlocks several codepatterns.
Conditional static assertions:
1 | BUILD_BUG_ON(__builtin_constant_p(nr) && nr != 1); |
__builtin_constant_p
decides whetherMAYBE_BUILD_BUG_ON
expands to a compile-time static assertmechanism (__attribute__((error(...)))
) or a runtimemechanism. 1
2
3
4
5
6
7
Alternative code paths:
Sometimes, the constant expression case might enable a more efficientcode path.
1 |
Constant Folding Optimizations:
By knowing expressions are constant, the compiler can performconstant folding optimizations. This means pre-calculating theexpression's value during compilation and replacing it with the actualconstant in the code. This leads to potentially smaller and faster codeas the calculation doesn't need to be done at runtime. However, I noticethat __builtin_constant_p
is often abused. This is one ofthe source why the kernel cannot be built with -O0
.
__builtin_expect
Used by likely
and unlikely
macros to givethe compiler hints for optimization.
1 | // include/linux/compiler.h |
__builtin_frame_address
Used by stack trace utilities.
__builtin_offsetof
Used to implement offsetof
.
__builtin_*_overflow
Used by include/linux/overflow.h
for overflowchecking.
__builtin_prefetch
Used as the default implementation of prefetch
andprefetchw
.
__builtin_unreachable
Used by include/linux/compiler.h:unreachable
to informunreachability for optimization or diagnostics suppressing purposes.
__builtin_types_compatible_p
This is often used to emulate C11 static_assert
1 |
|
or implement C++ function overloading, e.g.
1 | // fs/bcachefs/util.h |
include/linux/jump_label.h
combines__builtin_types_compatible_p
and dead code elimination toemulate C++17 if constexpr
.____wrong_branch_error
is not defined. If the unexpectedcode path is taken, there will be a linker error.
1 |
See perfmutex: Add thread safety annotations.
tools/include/linux/compiler.h
defines macros for manycommonly used attributes. Many of them are commonly used by otherprojects and probably not worth describing.
tools/testing/selftests
has a few__attribute__((constructor))
for initialization.
__attribute__((error(...)))
is used with inlining anddead code elimination to provide static assertion. 1
2
3
4
5
6
7
8
9
10
11
12// include/linux/compiler_types.h
__attribute__((naked))
(__naked
) is used byarm and bpf code.
__attribute__((section(...)))
is used to place functionsin the specified section, e.g. .init.text
.
__attribute__((weak))
is used to allow a genericfunction to be replaced by an arch-specific implementation. The weakreference feature is used by some__start_<sectionname>
/__stop_<sectionname>
encapsulation symbols.
#define __noendbr __attribute__((nocf_check))
is usedto disable Intel CET for some special functions.
__attribute__((cleanup(...)))
runs a function when thevariable goes out of scope. include/linux/cleanup.h
definesDEFINE_FREE
and __free
based on thisfeature.
1 | // include/linux/slab.h |
__attribute__((aligned(...)))
and__attribute__((packed))
are often to control structurelayouts.
-fno-delete-null-pointer-checks
Assume that programs can safely dereference null pointers, and codeor data element may reside at address zero.
-fno-strict-aliasing
The C11 standard (section 6.5) defines strict aliasing rules.
An object shall have its stored value accessed only by an lvalueexpression that has one of the following types:
— a type compatible with the effective type of the object, — aqualified version of a type compatible with the effective type of theobject, — a type that is the signed or unsigned type corresponding tothe effective type of the object, — a type that is the signed orunsigned type corresponding to a qualified version of the effective typeof the object, — an aggregate or union type that includes one of theaforementioned types among its members (including, recursively, a memberof a subaggregate or contained union), or — a character type.
Compilers leverage these aliasing rules for optimizations, which canbe disabled with -fno-strict-aliasing
.
Linus Torvalds has expressed reservations about strict aliasing. Hereis a discussionfrom 2018.
-fno-strict-overflow
In GCC, this option is identical to-fwrapv -fwrapv-pointer
: make signed integer overflowdefined using wraparound semantics. While avoiding undefined behaviors,unexpected arithmetic overflow bugs might be lurking.
[RFC]Mitigating unexpected arithmetic overflow is a 2024-05 thread aboutdetecting and mitigating these issues.