A control-flow graph (CFG) is a graph representation of all pathsthat might be traversed through a program during its execution.Control-flow integrity (CFI) refers to security policy dictating thatprogram execution must follow a control-flow graph. This articledescribes some features that compilers and hardware can use to enforceCFI, with a focus on llvm-project implementations.
CFI schemes are typically divided into forward-edge (e.g. indirectcalls) and backward-edge (mainly function returns). It should be notedthat exception handling and symbol interposition are not included inthese categories, as far as my understanding goes.
Let's start with backward-edge CFI. Fine-grained schemes checkwhether a return address refers to a possible caller in the control-flowgraph. However, this is a very difficult problem, and the additionalguarantee may not be that useful.
Coarse-grained schemes, on the other hand, simply just check thatreturn addresses have not been tampered with. Return addresses aretypically stored in a memory region called the "stack" along withfunction arguments, local variables, and register save areas. Stacksmashing is an attack that overwrites the return address to hijack thecontrol flow. The name was made popular by Aleph One (Elias Levy)'spaper Smashing The Stack For Fun And Profit.
StackGuard: Automatic Adaptive Detection and Prevention ofBuffer-Overflow Attacks (1998) introduced an approach to detecttampered return addresses on the stack.
GCC 4.1 implemented-fstack-protector
and -fstack-protector-all
.Additional variants have been added over the years, such as -fstack-protector-strong
for GCC 4.9 and -fstack-protector-explicit
in 2015-01.
The idea is that an attack overwriting the return address will likelyoverwrite the canary value as well. If the attacker doesn't know thecanary value, returning from the function will crash.
The canary is stored either in a libc global variable(__stack_chk_guard
) or a field in the thread control block(e.g. %fs:40
on x86-64). Every architecture may have apreference on the location. In a glibc port, only one of-mstack-protector-guard=global
and-mstack-protector-guard=tls
is supported (BZ#26817). (Changing this requires symbol versioning wrestling and maynot be necessaryas we can move to a more fine-grained scheme.) Under the current threadstack allocation scheme, the thread control block is allocated next tothe thread stack. This means that a large out-of-bounds stack write canpotentially overwrite the canary.
1 | void foo(const char *a) { |
1 | # x86-64 assembly |
GCC source code lets either libssp or libc provide__stack_chk_guard
(seeTARGET_LIBC_PROVIDES_SSP
). It seems that libssp isobsoleted for modern systems.
musl sets((char *)&__stack_chk_guard)[1] = 0;
:
struct pthread
) will fail.There are two related function attributes: stack_protect
and no_stack_protector
. Strangelystack_protect
is not namedstack_protector
.
-mstack-protector-guard-symbol=
can change the globalvariable name from __stack_chk_guard
to something else. Theoption was introduced for the Linux kernel. Seearch/*/include/asm/stackprotector.h
for how the canary isinitialized in the Linux kernel. It is at least per-cpu. Some portssupport per-taskstack canary (searchconfig STACKPROTECTOR_PER_TASK
).
OpenBSD introduced Retguard in 2017 to improve security. Details canbe found in RETGUARD and stackcanaries. Retguard is more fine-grained than StackGuard, but it hasmore expensive function prologues and epilogues.
For each instrumented function, a cookie is allocated from a pool of4000 entries. In the prologue, return_address ^ cookie
ispushed next to the return address (similar to the XOR Random Canary). Theepilogue pops the XOR value and the return address and verifies thatthey match ((value ^ cookie) == return_address
).
Not encrypting the return address directly is important to preservereturn address prediction for the CPU. The two int3
instructions are used to disrupt ROP gadgets that may form fromje ...; retq
(02 cc cc c3
isaddb %ah, %cl; int3; retq
). (ROP gadgetsremoval notes that gadget removal may not be useful.) If a staticbranch predictor exists (likely not-taken for a forward branch) theinitial prediction is likely wrong.
1 | // From https://www.openbsd.org/papers/eurobsdcon2018-rop.pdf |
Code-PointerIntegrity (2014) proposed stack object instrumentation, which wasmerged into LLVM in 2015. To use it, we can runclang -fsanitize=safe-stack
. In a link action, the driveroption links in the runtime compiler-rt/lib/safestack
.
The pass moves some stack objects into a separate stack, which isnormally referenced by a thread-local variable__safestack_unsafe_stack_ptr
or via a function call__safestack_pointer_address
. These objects include thosethat are not guaranteed to be free of stack smashing, mainly viaScalarEvolution
.
As an example, the local variable a
below is moved tothe unsafe stack as there is a risk that bar
may haveout-of-bounds accesses.
1 | void bar(int *); |
1 | foo: # @foo |
This technique is very old. Stack Shield (1999)is an early scheme based on assembly instrumentation.
During a function call, the return address is stored in a shadowstack. The normal stack may contain a copy, or (as a variant) not atall. Upon return, an entry is popped from the shadow stack. It is eitherused as the return address or (as a variant) compared with the normalreturn address.
Below we describe some userspace and hardware-assisted schemes.
-fsanitize=shadow-call-stack
See https://clang.llvm.org/docs/ShadowCallStack.html. Duringa function call, the instrumentation stores the return address in ashadow stack. When returning from the function, the return address ispopped from the shadow stack. Additionally, the return address is storedon the regular stack for return address prediction and compatibilitywith unwinders, but it is otherwise unused.
To use this for AArch64, runclang --target=aarch64-unknown-linux-gnu -fsanitize=shadow-call-stack -ffixed-x18
.1
2
3
4
5
6
7
8strx30, [x18], #8 // push the return address to the shadow stack
subsp, sp, #32
stpx29, x30, [sp, #16] // the normal stack contains a copy as well
...
ldpx29, x30, [sp, #16]
addsp, sp, #32
ldrx30, [x18, #-8]! // pop the return address from the shadow stack
ret
GCC portedthe feature in 2022-02 (milestone: 12.0).
This is implemented forRISC-V as well. Interestingly, you can still use-ffixed-x18
, as x18 (aka s2) is a callee-saved register.When using RVC, the push operation on the shadow stack uses 6 bytes. Inthe future, x3 (gp) is another choice and reserving it as a platformregister will be helpful.
1 | // clang --target=riscv64-unknown-linux-gnu -fsanitize=shadow-call-stack -ffixed-x18 |
In the Linux kernel, select CONFIG_SHADOW_CALL_STACK
touse this scheme.
Supported by Intel's 11th Gen and AMD Zen 3.
A RET instruction pops the return address from both the regular stackand the shadow stack, and compares them. A control protection exception(#CP) is raised in case of a mismatch.
If all relocatable files with .note.gnu.property
haveset the GNU_PROPERTY_X86_FEATURE_1_SHSTK
bit, or-z force-ibt
is specified, the output will have thebit.
On Windows the scheme is branded as Hardware-enforced StackProtection. In the MSVC linker, /cetcompat
marks anexecutable image as compatible with Control-flow Enforcement Technology(CET) Shadow Stack.
setjmp/longjmp
need to save and restore the shadow stackpointer.
By storing only return addresses, the shadow stack allows for anefficient stack unwinding scheme. As a result, the frame pointer is nolonger necessary, as we no longer need a frame chain to trace throughreturn addresses.
However, in many implementations, accessing the shadow stack from adifferent process can be challenging. This can make out-of-processunwinding difficult.
Also available as Armv8.1-M Pointer Authentication.
Instructions are provided to sign a pointer with a 64-bit user-chosencontext value (usually zero, X16, or SP) and a 128-bit secret key. Thecomputed Pointer Authentication Code is stored in the unused high bitsof the pointer. The instructions are allocated from the HINT space forcompatibility with older CPUs.
A major use case is to sign/authenticate the return address.paciasp
is inserted at the start of the function prologueto sign the to-be-saved LR (X30). (It serves as an implicitbti c
as well.) autiasp
is inserted at the endof the function prologue to authenticate the loaded LR.
1 | paciasp # instrumented |
ld.lld added support in https://reviews.llvm.org/D62609 (with substantialchanges afterwards). If -z pac-plt
is specified,autia1716
is used for a PLT entry; a relocatable file with.note.gnu.property
and theGNU_PROPERTY_AARCH64_FEATURE_1_PAC
bit cleared gets awarning.
Let's discuss forward-edge CFI schemes.
Ideally, for an indirect call, we would want to enforce that thetarget is among the targets in the control-flow graph. However, this isdifficult to implement efficiently, so many schemes only ensure that thefunction signature matches. For example, in the codevoid f(void (*indirect)(void)) { indirect(); }
, ifindirect
actually refers to a function with arguments, theCFI scheme will flag this indirect call.
In C, checking that the argument and return types match suffices.Many schemes use type hashes. In C++, language constructs such asvirtual functions and pointers to member functions add more restrictionto what can be indirectly called. -fsanitize=cfi
hasimplemented many C++ specific checks that are missing from many otherCFI schemes.
pax-future.txt
https://pax.grsecurity.net/docs/pax-future.txt (c.2)describes a scheme where
In case of a mismatch, it indicates that the callee does not have theintended prototype. The program should terminate.
1 | callee |
The epilogue assumes that a call site hash is present, so aninstrumented function cannot be called by a non-instrumented callsite.
-fsanitize=cfi
See https://clang.llvm.org/docs/ControlFlowIntegrity.html.Clang has implemented the traditional indirect function call check andmany C++ specific checks under this umbrella option. These checks allrely on TypeMetadata and link-time optimizations: LTO collects functions withthe same signature, enabling efficient type checks.
The underlying LLVM IR technique is shared with virtual function calloptimization (devirtualization).
1 | struct A { void f(); }; |
1 | struct A { virtual void f() {} }; |
For virtual calls, the related virtual tables are placed together inthe object file. This requires LTO. After loading the dynamic type of anobject, -fsanitize=cfi
can efficiently check whether thetype matches an address point of possible virtual tables.
1 | .section.data.rel.ro._ZTV1D,"aGw",@progbits,_ZTV1D,comdat |
Cross-DSO CFI requires a runtime (seecompiler-rt/lib/cfi
).
Control Flow Guard was introduced in Windows 8.1 Preview. It was implemented in llvm-projectin 2019. To use it, we can run clang-cl /guard:cf
orclang --target=x86_64-pc-windows-gnu -mguard=cf
.
The compiler instruments indirect calls to call a global functionpointer (___guard_check_icall_fptr
or__guard_dispatch_icall_fptr
) and records valid indirectcall targets in special sections: .gfids$y
(address-takenfunctions), .giats$y
(address-taken IAT entries),.gljmp$y
(longjmp targets), and .gehcont$y
(ehcont targets). An instrumented file with applicable functions definesthe @feat.00
symbol with at least one bit of 0x4800.
The linker combines the sections, marks additional symbols (e.g./entry
), creates address tables(__guard_fids_table
, __guard_iat_table
,__guard_longjmp_table
(unless/guard:nolongjmp
), __guard_eh_cont_table
).
At run-time, the global function pointer refers to a function thatverifies that an indirect call target is valid.
1 | leaq target(%rip), %rax |
This is an improved Control Flow Guard that is similar topax-future.txt
(c.2). To use it, runcl.exe /guard:xfg
. On x86-64, the instrumentation calls thefunction pointer __guard_xfg_dispatch_icall_fptr
instead of__guard_dispatch_icall_fptr
. It also provides the prototypehash as an argument. The runtime checks whether the prototype hashmatches the callee.
1 | void bar(int a) {} |
1 | gxfg$y SEGMENT |
-fsanitize=kcfi
Introduced to llvm-project in 2022-11 (milestone: 16.0.0.Glad as a reviewer). GCC featurerequest.
The instrumentation does the following:
.kcfi_traps
. The Linux kernel uses the section to checkwhether a trap is caused by KCFI.__kcfi_typeid_<func>
if the function has a Cidentifier name and is address-taken.1 | __cfi_bar: |
The large number of NOPs is to leave room for FineIBT patching atrun-time.
An instrumented indirect call site cannot call an uninstrumentedtarget. This property appears to be satisfied in the Linux kernel.
See cfi:Switch to -fsanitize=kcfi for the Linux kernel implementation.
This is a hybrid scheme that combines instrumentation and hardwarefeatures. It uses both an indirect branch target indicator (ENDBR) and aprototype hash check. For an analysis, see https://dustri.org/b/paper-notes-fineibt.html.
The compromise testb 0x11, fs:0x48
can be disabled atrun-time. fs:0x48
utilizes an unused fieldtcbhead_t:unused_vgetcpu_cache
insysdeps/x86_64/nptl/tls.h
. This allows an attacker todisable FineIBT by zeroing this byte.
x86/ibt:Implement FineIBT is the Linux kernel implementation which patches-fsanitize=kcfi
output into FineIBT.
This is part of Intel Control-flow Enforcement Technology. Whenenabled, the CPU guarantees that every indirect branch lands on aspecial instruction (ENDBR, endbr32
for x86-32 andendbr64
for x86-64). In case of an violation, acontrol-protection (#CP) exception will be raised. A jump/callinstruction with the notrack
prefix skips the check.
With -fcf-protection={branch,full}
, the compiler insertsENDBR at the start of a basic block which may be reached indirectly.This is conservative:
endbr
as itmay be reached via PLT__attribute__((nocf_check))
can disable ENDBR insertionfor a function.
ld.lld added support in 2020-01:
endbr
.note.gnu.property
(whichcontains NT_GNU_PROPERTY_TYPE_0
notes) have set theGNU_PROPERTY_X86_FEATURE_1_IBT
bit, or -z ibt
is specified, the output will have the bit and synthesize ENDBRcompatible PLT entries.The GNU ld implementation uses a second PLT scheme(.plt.sec
). I was sad about it, but in the end I decided tofollow suit for ld.lld. Some tools (e.g. objdump) use heuristics todetect foo@plt
and using an alternative form will requirethem to adapt. mold nevertheless settled on its own scheme.
swapcontext
may return with an indirect branch. Thisfunction can be annotated with__attribute__((indirect_branch))
so that the call site willbe followed by an endbr
.
glibc can be built with --enable-cet
. By default it usesthe NT_GNU_PROPERTY_TYPE_0
notes to decide whether toenable CET. If any of the executable and all shared objects doesn't havethe GNU_PROPERTY_X86_FEATURE_1_IBT
bit, rtld will callarch_prctl
with ARCH_CET_DISABLE
to disableIBT. SHSTK is similar. A dlopen'ed shared object triggers the check aswell.
As of 2023-01, the Linux kernel support is work in progress. It doesnot support ARCH_CET_STATUS
/ARCH_CET_DISABLE
yet.
qemu does not support Indirect Branch Tracking yet.
The current scheme as well as Arm BTI conservatively adds an indirectbranch indicator to every non-internal-linkage function, which increasesthe attack surface. AlternativeCET ABI mentioned that we can use NOTRACK
in PLTentries, and use new relocation types to indicate address-significantfunctions. Supporting dlsym
/dlclose
needsspecial tweaking.
Technically the address significance part can be achieved with theexisting SHT_LLVM_ADDRSIG
, then we can avoid introducing anew relocation type for each architecture. I am somewhat unhappy thatSHT_LLVM_ADDRSIG
optimizes for size and does not makebinary manipulation convenient (objcopy
andld -r
sets sh_link=0
ofSHT_LLVM_ADDRSIG
and invalidates the section). See Explain GNU stylelinker options. Since we just deal with x86, introducing newx86-32/x86-64 relocation types is not bad.
When LTO is concerned, this imposes more difficulties as LLVMCodeGendoes not know (https://reviews.llvm.org/D140363):
VisibleToRegularObj
)The two properties address the two problems:
F.hasAddressTaken()
doesn'tindicate that its address is not taken by an ELF relocatable file.F.hasAddressTaken()
for the definition in one module andtrue F.hasAddressTaken()
for a reference in anothermodule.Both properties need additional bookkeeping between LLVMLTO andLLVMCodeGen. The first property additionally needs the linker tocommunicate information to LLVMLTO.
If we make ThinLTO properly combine the address-taken property (closeto!GV.use_empty() && !GV.hasAtLeastLocalUnnamedAddr()
),and provide VisibleToRegularObj
to LLVMCodeGen, we can usethe following condition to decide whether ENDBR is needed with anappropriate code model:
AddressTaken || (!F.hasLocalLinkage() && (VisibleToRegularObj || !F.hasHiddenVisibility()))
(Note: some large x86-64 executables are facing relocation overflowpressure. Range extension thunks may be a future direction. If NOTRACKis not used, we will need to conservatively mark local linkagefunctions.)
This is similar to Intel Indirect Branch Tracking. When enabled, theCPU ensures that every indirect branch lands on a special instruction,otherwise a Branch Target exception is raised. The most common landinginstructions are bti {c,j,jc}
with different branchinstruction compatibility. paciasp
and pacibsp
are implicitly bti c
.
This feature is more fine-grained: every memory page may set a bitVM_ARM64_BTI
(set with the mmap flag PROT_BTI
)to indicate that indirect calls from it do not need protection. Thisallows uninstrumented code to mapped simply by skippingPROT_BTI
.
The value of -mbranch-protection=
can benone
(no hardening), standard
(bti
with a non-leaf protection scope), or +
separated bti
and pac-ret[+b-key,+leaf]
. Ifbti
is enabled, the compiler insertsbti {c,j,jc}
at the start of a basic block which may bereached indirectly. For a non-internal-linkage function, its entry maybe reached by a PLT or range extension thunk, so it is conservativelymarked as needing bti c
.
ld.lld added support in https://reviews.llvm.org/D62609 (with substantialchanges afterwards). If all relocatable files with.note.gnu.property
have set theGNU_PROPERTY_AARCH64_FEATURE_1_BTI
bit, or-z force-bti
is specified, the output will have the bit andsynthesize BTI compatible PLT entries.
qemu has supported Branch Target Identification since 2019.
clang -Wcast-function-type
warns when a function pointeris cast to an incompatible function pointer. Calling such a castfunction pointer likely leads to-fsanitize=cfi
/-fsanitize=kcfi
runtime errors.For practical reasons, GCC made a choice to allow some commonviolations.
Clang 16.0.0 makes -Wcast-function-type
stricter and warns morecases. -Wno-cast-function-type-strict
restores the previousstate that ignores many cases including some ABI equivalent cases.
Forward-edge CFI
-fsanitize=cfi
,-fsanitize=kcfi
, FineIBTLinks: