UNDER CONSTRUCTION
Control-flow integrity (CFI) refers to techniques which prevent control-flow hijacking attacks. This article describes some compiler/hardware features with a focus on llvm-project implementations.
CFI commonly divided into forward-edge CFI (e.g. indirect calls) and backward-edge CFI (e.g. function returns). AIUI exception handling and symbol interposition are not categorized.
Let's start with backward-edge CFI. Fine-grained technologies check that a return address refers to a possible caller. This is unimplemented and the additional guarantee is possibly less useful.
Coarse technologies just check that return addresses are not tempered. Return addresses are typically stored on a memory region called "stack" with function arguments, local variables, and register save areas. Stack smashing is an attack that overwrites the return address to hijack the control flow. The name is made popular by Aleph One (Elias Levy)'s paper Smashing The Stack For Fun And Profit.
StackGuard: Automatic Adaptive Detection and Prevention of Buffer-Overflow Attacks (1998) detects hijacted return addresses on the stack.
GCC 4.1 implemented -fstack-protector
and -fstack-protector-all
. More variants are added over the years, e.g. -fstack-protector-strong
for GCC 4.9.
OpenBSD introduced Retguard in 2017. See RETGUARD and stack canaries. It is more effective than StackGuard with more expensive function prologues/epilogues.
For each instrumented function, a cookie is allocated from a pool of 4000 entries. In the prologue, return_address ^ cookie
is pushed next to the return address. The epilogue pops the XOR value and the return address and verifies that they match ((value ^ cookie) == return_address
).
Not encrypting the return address directly is important to preserve return address prediction for the CPU. The two int3
instructions are to disrupt ROP gadgets which may form from je ...; retq
(02 cc cc c3
is addb %ah, %cl; int3; retq
). If a static branch predictor exists (likely not-taken for a forward branch) the initial prediction is likely wrong.
1 | // From https://www.openbsd.org/papers/eurobsdcon2018-rop.pdf |
Code-Pointer Integrity (2014) proposed an instrumentation which was merged into LLVM in 2015. Use clang -fsanitize=safe-stack
.
The pass moves some stack objects into a separate stack (normally referenced by a thread-local variable __safestack_unsafe_stack_ptr
or via a function call __safestack_pointer_address
). These objects include those which are not guaranteed to be free of stack smashing (mainly via ScalarEvolution).
As an example, the local variable a
below is moved to the unsafe stack as there is a risk that bar
may have out-of-bounds accesses.
1 | void bar(int *); |
1 | foo: # @foo |
compiler-rt/lib/safestack/
provides the runtime.
If the two do not match, it indicates that the return address was altered.
-fsanitize=shadow-call-stack
See https://clang.llvm.org/docs/ShadowCallStack.html. The instrumentation stores the return address in a shadow stack during a function call. Upon return, the return address is popped from the shadow stack. The return address is also stored on the regular stack for return address prediction and compatibility with unwinders, but is otherwise unused.
To use this for AArch64, run clang --target=aarch64-unknown-linux-gnu -fsanitize=shadow-call-stack -ffixed-x18
. This is implemented for RISC-V as well. Interestingly, you can still use -ffixed-x18
, as x18 (aka s2) is a callee-saved register.
GCC 12.0 ported the feature.
In the Linux kernel, select CONFIG_SHADOW_CALL_STACK
to use this technology.
Supported by Intel's 11th Gen and AMD Zen 3.
A RET instruction pops the return address from both the regular stack and the shadow stack, and compares them. A control protection exception (#CP) is raised in case of a mismatch.
On Windows the technology is used by Hardware-enforced Stack Protection. In the MSVC linker, /cetcompat
marks an executable image as compatible with Control-flow Enforcement Technology (CET) Shadow Stack.
Instructions are provided to sign a pointer with a 64-bit context value (usually zero, X16, or SP) and a 128-bit secret key. The instructions are allocated from the HINT space for compatibility with older CPUs.
A major use case is to sign/authenticate return addresses with PACIASP/AUTIASP.
ld.lld added support in https://reviews.llvm.org/D62609 (with substantial changes afterwards). If -z pac-plt
is specified, autia1716
is used for a PLT entry. A relocatable file with .note.gnu.property
and the GNU_PROPERTY_AARCH64_FEATURE_1_PAC
bit cleared gets a warning.
Now let's discuss forward-edge CFI technologies.
-fsanitize=cfi
See https://clang.llvm.org/docs/ControlFlowIntegrity.html. Clang has implemented a number of CFI schemes under this umbrella option. They all rely on link-time optimizations.
Windows 8.1 introduced Control Flow Guard. It was implemented in llvm-project in 2019. Use clang-cl /guard:cf
or clang --target=x86_64-pc-windows-gnu -mguard=cf
.
The compiler instruments indirect calls to call a global function pointer (___guard_check_icall_fptr
or __guard_dispatch_icall_fptr
) and records valid indirect call targets in special sections (.gfids$y
, .giats$y
, .gljmp$y
). An instrumented file with applicable functions defines the @feat.00
symbol with at least one bit of 0x4800.
The linker combines the sections, marks additional symbols (e.g. /entry
), creates address tables (__guard_fids_table
, __guard_iat_table
, __guard_longjmp_table
(unless /guard:nolongjmp
), __guard_eh_cont_table
).
At run-time, the global function pointer refers to a function which verifies that an indirect call target is valid.
1 | leaq target(%rip), %rax |
-fsanitize=kcfi
Introduced to llvm-project in 2022-11 (milestone: 16.0.0. Glad as a reviewer).
This is part of Intel Control-flow Enforcement Technology. When enabled, the CPU ensures that every indirect branch lands on a special instruction (endbr32
or endbr64
), otherwise a control-protection (#CP) exception is raised.
ld.lld added support in 2022-01:
endbr
.note.gnu.property
have set the GNU_PROPERTY_X86_FEATURE_1_SHSTK
bit, or -z shstk
, the output will have the bit.