IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    Control-flow integrity

    MaskRay发表于 2023-03-31 03:25:21
    love 0

    A control-flow graph (CFG) is a graph representation of all pathsthat might be traversed through a program during its execution.Control-flow integrity (CFI) refers to security policy dictating thatprogram execution must follow a control-flow graph. This articledescribes some features that compilers and hardware can use to enforceCFI, with a focus on llvm-project implementations.

    CFI schemes are typically divided into forward-edge (e.g. indirectcalls) and backward-edge (mainly function returns). It should be notedthat exception handling and symbol interposition are not included inthese categories, as far as my understanding goes.

    Let's start with backward-edge CFI. Fine-grained schemes checkwhether a return address refers to a possible caller in the control-flowgraph. However, this is a very difficult problem, and the additionalguarantee may not be that useful.

    Coarse-grained schemes, on the other hand, simply just check thatreturn addresses have not been tampered with. Return addresses aretypically stored in a memory region called the "stack" along withfunction arguments, local variables, and register save areas. Stacksmashing is an attack that overwrites the return address to hijack thecontrol flow. The name was made popular by Aleph One (Elias Levy)'spaper Smashing The Stack For Fun And Profit.

    StackGuard/Stack SmashingProtector

    StackGuard: Automatic Adaptive Detection and Prevention ofBuffer-Overflow Attacks (1998) introduced an approach to detecttampered return addresses on the stack.

    GCC 4.1 implemented-fstack-protector and -fstack-protector-all.Additional variants have been added over the years, such as -fstack-protector-strongfor GCC 4.9 and -fstack-protector-explicitin 2015-01.

    • In the prologue, the canary is loaded from a secret location andplaced before the return address on the stack.
    • In the epilogue, the canary is loaded again and compared with theentry before the return address.

    The idea is that an attack overwriting the return address will likelyoverwrite the canary value as well. If the attacker doesn't know thecanary value, returning from the function will crash.

    The canary is stored either in a libc global variable(__stack_chk_guard) or a field in the thread control block(e.g. %fs:40 on x86-64). Every architecture may have apreference on the location. In a glibc port, only one of-mstack-protector-guard=global and-mstack-protector-guard=tls is supported (BZ#26817). (Changing this requires symbol versioning wrestling and maynot be necessaryas we can move to a more fine-grained scheme.) Under the current threadstack allocation scheme, the thread control block is allocated next tothe thread stack. This means that a large out-of-bounds stack write canpotentially overwrite the canary.

    1
    2
    3
    4
    5
    void foo(const char *a) {
    char s[10];
    strcpy(s, a);
    puts(s);
    }
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    # x86-64 assembly
    movq %fs:40, %rax # load the canary from the thread control block
    movq %rax, 24(%rsp) # place the value before the return address
    ...
    movq %fs:40, %rax # load the canary again
    cmpq 24(%rsp), %rax # compare it with the entry before the return address
    jne .LBB0_2 # fail if mismatching
    <epilogue>
    .LBB0_2:
    callq __stack_chk_fail@PLT

    # AArch64 assembly
    adrp x8, __stack_chk_guard
    ldr x8, [x8, :lo12:__stack_chk_guard] # load the canary from a global variable
    stur x8, [x29, #-8] # place the value before the saved frame pointer/return address
    ...
    adrp x8, __stack_chk_guard
    ldur x9, [x29, #-8] # load the entry before the saved frame pointer/return address
    ldr x8, [x8, :lo12:__stack_chk_guard] # load the canary again
    cmp x8, x9
    b.ne .LBB0_2 # fail if mismatching
    <epilogue>
    .LBB0_2:
    bl __stack_chk_fail

    GCC source code lets either libssp or libc provide__stack_chk_guard (seeTARGET_LIBC_PROVIDES_SSP). It seems that libssp isobsoleted for modern systems.

    musl sets((char *)&__stack_chk_guard)[1] = 0;:

    • The NUL byte serves as a terminator to sabotage string reads asinformation leak.
    • A string overflow which attempts to preserve the canary whileoverwriting bytes after the canary (implementation detail ofstruct pthread) will fail.
    • One byte overwrite is still detectable.
    • The set of possible values is decreased, but this seems acceptablefor 64-bit systems.

    There are two related function attributes: stack_protectand no_stack_protector. Strangelystack_protect is not namedstack_protector.

    -mstack-protector-guard-symbol= can change the globalvariable name from __stack_chk_guard to something else. Theoption was introduced for the Linux kernel. Seearch/*/include/asm/stackprotector.h for how the canary isinitialized in the Linux kernel. It is at least per-cpu. Some portssupport per-taskstack canary (searchconfig STACKPROTECTOR_PER_TASK).

    Retguard

    OpenBSD introduced Retguard in 2017 to improve security. Details canbe found in RETGUARD and stackcanaries. Retguard is more fine-grained than StackGuard, but it hasmore expensive function prologues and epilogues.

    For each instrumented function, a cookie is allocated from a pool of4000 entries. In the prologue, return_address ^ cookie ispushed next to the return address (similar to the XOR Random Canary). Theepilogue pops the XOR value and the return address and verifies thatthey match ((value ^ cookie) == return_address).

    Not encrypting the return address directly is important to preservereturn address prediction for the CPU. The two int3instructions are used to disrupt ROP gadgets that may form fromje ...; retq (02 cc cc c3 isaddb %ah, %cl; int3; retq). (ROP gadgetsremoval notes that gadget removal may not be useful.) If a staticbranch predictor exists (likely not-taken for a forward branch) theinitial prediction is likely wrong.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    // From https://www.openbsd.org/papers/eurobsdcon2018-rop.pdf
    // prologue
    ffffffff819ff700: 4c 8b 1d 61 21 24 00 mov 2367841(%rip),%r11 # <__retguard_2759>
    ffffffff819ff707: 4c 33 1c 24 xor (%rsp),%r11
    ffffffff819ff70b: 55 push %rbp
    ffffffff819ff70c: 48 89 e5 mov %rsp,%rbp
    ffffffff819ff70f: 41 53 push %r11

    // epilogue
    ffffffff8115a457: 41 5b pop %r11
    ffffffff8115a459: 5d pop %rbp
    ffffffff8115a45a: 4c 33 1c 24 xor (%rsp),%r11
    ffffffff8115a45e: 4c 3b 1d 03 74 ae 00 cmp 11432963(%rip),%r11 # <__retguard_2759>
    ffffffff8115a465: 74 02 je ffffffff8115a469
    ffffffff8115a467: cc int3
    ffffffff8115a468: cc int3
    ffffffff8115a469: c3 retq

    SafeStack

    Code-PointerIntegrity (2014) proposed stack object instrumentation, which wasmerged into LLVM in 2015. To use it, we can runclang -fsanitize=safe-stack. In a link action, the driveroption links in the runtime compiler-rt/lib/safestack.

    The pass moves some stack objects into a separate stack, which isnormally referenced by a thread-local variable__safestack_unsafe_stack_ptr or via a function call__safestack_pointer_address. These objects include thosethat are not guaranteed to be free of stack smashing, mainly viaScalarEvolution.

    As an example, the local variable a below is moved tothe unsafe stack as there is a risk that bar may haveout-of-bounds accesses.

    1
    2
    3
    4
    5
    void bar(int *);
    void foo() {
    int a;
    bar(&a);
    }
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    foo:                                    # @foo
    # %bb.0: # %entry
    pushq %r14
    pushq %rbx
    pushq %rax
    movq __safestack_unsafe_stack_ptr@GOTTPOFF(%rip), %rbx
    movq %fs:(%rbx), %r14
    leaq -16(%r14), %rax
    movq %rax, %fs:(%rbx)
    leaq -4(%r14), %rdi
    callq bar@PLT
    movq %r14, %fs:(%rbx)
    addq $8, %rsp
    popq %rbx
    popq %r14
    retq

    Shadow stack

    This technique is very old. Stack Shield (1999)is an early scheme based on assembly instrumentation.

    During a function call, the return address is stored in a shadowstack. The normal stack may contain a copy, or (as a variant) not atall. Upon return, an entry is popped from the shadow stack. It is eitherused as the return address or (as a variant) compared with the normalreturn address.

    Below we describe some userspace and hardware-assisted schemes.

    Userspace

    -fsanitize=shadow-call-stack

    See https://clang.llvm.org/docs/ShadowCallStack.html. Duringa function call, the instrumentation stores the return address in ashadow stack. When returning from the function, the return address ispopped from the shadow stack. Additionally, the return address is storedon the regular stack for return address prediction and compatibilitywith unwinders, but it is otherwise unused.

    To use this for AArch64, runclang --target=aarch64-unknown-linux-gnu -fsanitize=shadow-call-stack -ffixed-x18.

    1
    2
    3
    4
    5
    6
    7
    8
    strx30, [x18], #8      // push the return address to the shadow stack
    subsp, sp, #32
    stpx29, x30, [sp, #16] // the normal stack contains a copy as well
    ...
    ldpx29, x30, [sp, #16]
    addsp, sp, #32
    ldrx30, [x18, #-8]! // pop the return address from the shadow stack
    ret

    GCC portedthe feature in 2022-02 (milestone: 12.0).

    This is implemented forRISC-V as well. Interestingly, you can still use-ffixed-x18, as x18 (aka s2) is a callee-saved register.When using RVC, the push operation on the shadow stack uses 6 bytes. Inthe future, x3 (gp) is another choice and reserving it as a platformregister will be helpful.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    // clang --target=riscv64-unknown-linux-gnu -fsanitize=shadow-call-stack -ffixed-x18
    sd ra, 0(s2) // push the return address to the shadow stack
    addi s2, s2, 8
    addi sp, sp, -32
    sd ra, 24(sp) // the normal stack contains a copy as well
    ...
    addi sp, sp, 32
    ld ra, -8(s2) // pop the return address from the shadow stack
    addi s2, s2, -8
    ret

    In the Linux kernel, select CONFIG_SHADOW_CALL_STACK touse this scheme.

    Hardware-assisted

    IntelControl-flow Enforcement Technology and AMD Shadow Stack

    Supported by Intel's 11th Gen and AMD Zen 3.

    A RET instruction pops the return address from both the regular stackand the shadow stack, and compares them. A control protection exception(#CP) is raised in case of a mismatch.

    If all relocatable files with .note.gnu.property haveset the GNU_PROPERTY_X86_FEATURE_1_SHSTK bit, or-z force-ibt is specified, the output will have thebit.

    On Windows the scheme is branded as Hardware-enforced StackProtection. In the MSVC linker, /cetcompat marks anexecutable image as compatible with Control-flow Enforcement Technology(CET) Shadow Stack.

    setjmp/longjmp need to save and restore the shadow stackpointer.

    Stack unwinding

    By storing only return addresses, the shadow stack allows for anefficient stack unwinding scheme. As a result, the frame pointer is nolonger necessary, as we no longer need a frame chain to trace throughreturn addresses.

    However, in many implementations, accessing the shadow stack from adifferent process can be challenging. This can make out-of-processunwinding difficult.

    Armv8.3 PointerAuthentication

    Also available as Armv8.1-M Pointer Authentication.

    Instructions are provided to sign a pointer with a 64-bit user-chosencontext value (usually zero, X16, or SP) and a 128-bit secret key. Thecomputed Pointer Authentication Code is stored in the unused high bitsof the pointer. The instructions are allocated from the HINT space forcompatibility with older CPUs.

    A major use case is to sign/authenticate the return address.paciasp is inserted at the start of the function prologueto sign the to-be-saved LR (X30). (It serves as an implicitbti c as well.) autiasp is inserted at the endof the function prologue to authenticate the loaded LR.

    1
    2
    3
    4
    5
    6
    7
    8
    paciasp                        # instrumented
    sub sp, sp, #0x20
    stp x29, x30, [sp, #0x10]
    ...
    ldp x29, x30, [sp, #0x10]
    add sp, sp, #0x20
    autiasp # instrumented
    ret

    ld.lld added support in https://reviews.llvm.org/D62609 (with substantialchanges afterwards). If -z pac-plt is specified,autia1716 is used for a PLT entry; a relocatable file with.note.gnu.property and theGNU_PROPERTY_AARCH64_FEATURE_1_PAC bit cleared gets awarning.


    Let's discuss forward-edge CFI schemes.

    Ideally, for an indirect call, we would want to enforce that thetarget is among the targets in the control-flow graph. However, this isdifficult to implement efficiently, so many schemes only ensure that thefunction signature matches. For example, in the codevoid f(void (*indirect)(void)) { indirect(); }, ifindirect actually refers to a function with arguments, theCFI scheme will flag this indirect call.

    In C, checking that the argument and return types match suffices.Many schemes use type hashes. In C++, language constructs such asvirtual functions and pointers to member functions add more restrictionto what can be indirectly called. -fsanitize=cfi hasimplemented many C++ specific checks that are missing from many otherCFI schemes.

    pax-future.txt

    https://pax.grsecurity.net/docs/pax-future.txt (c.2)describes a scheme where

    • a call site provides a hash indicating the intended functionprototype
    • the callee's epilogue checks whether the hash matches its ownprototype

    In case of a mismatch, it indicates that the callee does not have theintended prototype. The program should terminate.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    callee
    epilogue: mov register,[esp]
    cmp [register+1],MAGIC
    jnz .1
    retn
    .1: jmp esp

    caller:
    call callee
    test eax,MAGIC

    The epilogue assumes that a call site hash is present, so aninstrumented function cannot be called by a non-instrumented callsite.

    -fsanitize=cfi

    See https://clang.llvm.org/docs/ControlFlowIntegrity.html.Clang has implemented the traditional indirect function call check andmany C++ specific checks under this umbrella option. These checks allrely on TypeMetadata and link-time optimizations: LTO collects functions withthe same signature, enabling efficient type checks.

    The underlying LLVM IR technique is shared with virtual function calloptimization (devirtualization).

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    struct A { void f(); };
    struct B { void f(); };
    void A::f() {}
    void B::f() {}
    static void (A::*fptr)();

    int main(int argc, char **argv) {
    A *a = new A;
    if (argv[1]) fptr = (void (A::*)())&B::f; // caught by -fsanitize=cfi-mfcall
    else fptr = &A::f; // good
    (a->*fptr)();
    delete a;
    }
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    struct A { virtual void f() {} };
    struct B : A { void g() {} virtual ~B() {} };
    struct C : A { void g() {} };
    struct D { virtual void f() {} };
    void af(A *a) { a->f(); }
    void bg(B *b) { b->g(); }
    int main(int argc, char **argv) {
    B b;
    C c;
    D d;
    af(reinterpret_cast<A *>(&d)); // caught by -fsanitize=cfi-vcall
    bg(&b); // good
    bg(reinterpret_cast<B *>(&c)); // caught by -fsanitize=cfi-nvcall
    }

    For virtual calls, the related virtual tables are placed together inthe object file. This requires LTO. After loading the dynamic type of anobject, -fsanitize=cfi can efficiently check whether thetype matches an address point of possible virtual tables.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    .section.data.rel.ro._ZTV1D,"aGw",@progbits,_ZTV1D,comdat
    .p2align3, 0x0
    _ZTV1D:
    .quad0; .quad_ZTI1D; .quad_ZN1D1fEv

    .section.data.rel.ro..L__unnamed_1,"aw",@progbits
    .p2align3, 0x0
    .L__unnamed_1:
    .quad0; .quad_ZTI1B; .quad_ZN1A1fEv; .quad_ZN1BD2Ev; .quad_ZN1BD0Ev
    .zero24
    .quad0; .quad_ZTI1A; .quad_ZN1A1fEv
    .zero8
    .quad0; .quad_ZTI1C; .quad_ZN1A1fEv
    ...

    movq(%rdi), %rax
    leaq__typeid__ZTS1A_global_addr(%rip), %rdx
    movq%rax, %rcx
    subq%rdx, %rcx
    rorq$__typeid__ZTS1A_align, %rcx
    movq%rcx, %rdx
    subq$__typeid__ZTS1A_size_m1@ABS8, %rdx
    ja.LBB0_2
    jmp.LBB0_1
    .LBB0_1:
    movl$1, %edx
    shll%cl, %edx
    movl$__typeid__ZTS1A_inline_bits, %ecx
    andl%edx, %ecx
    cmpl$0, %ecx
    jne.LBB0_3
    .LBB0_2:# %trap
    ud1l2(%eax), %eax
    .LBB0_3:# %cont
    callq*(%rax)

    Cross-DSO CFI requires a runtime (seecompiler-rt/lib/cfi).

    Control Flow Guard

    Control Flow Guard was introduced in Windows 8.1 Preview. It was implemented in llvm-projectin 2019. To use it, we can run clang-cl /guard:cf orclang --target=x86_64-pc-windows-gnu -mguard=cf.

    The compiler instruments indirect calls to call a global functionpointer (___guard_check_icall_fptr or__guard_dispatch_icall_fptr) and records valid indirectcall targets in special sections: .gfids$y (address-takenfunctions), .giats$y (address-taken IAT entries),.gljmp$y (longjmp targets), and .gehcont$y(ehcont targets). An instrumented file with applicable functions definesthe @feat.00 symbol with at least one bit of 0x4800.

    The linker combines the sections, marks additional symbols (e.g./entry), creates address tables(__guard_fids_table, __guard_iat_table,__guard_longjmp_table (unless/guard:nolongjmp), __guard_eh_cont_table).

    At run-time, the global function pointer refers to a function thatverifies that an indirect call target is valid.

    1
    2
    3
    4
    5
    6
    7
    leaq   target(%rip), %rax
    callq *%rax

    =>

    leaq target(%rip), %rax
    callq *__guard_dispatch_icall_fptr(%rip)

    eXtended Flow Guard

    This is an improved Control Flow Guard that is similar topax-future.txt (c.2). To use it, runcl.exe /guard:xfg. On x86-64, the instrumentation calls thefunction pointer __guard_xfg_dispatch_icall_fptr instead of__guard_dispatch_icall_fptr. It also provides the prototypehash as an argument. The runtime checks whether the prototype hashmatches the callee.

    1
    2
    void bar(int a) {}
    void foo(void (*f)(int)) { f(42); }
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    gxfg$y  SEGMENT
    __guard_xfg?foo@@YAXP6AXH@Z@Z DDSymXIndex: FLAT:?foo@@YAXP6AXH@Z@Z
    DD 00H
    DQ ba49be2b36da9170H
    gxfg$y ENDS
    ; COMDAT gxfg$y
    gxfg$y SEGMENT
    __guard_xfg?bar@@YAXH@Z DDSymXIndex: FLAT:?bar@@YAXH@Z
    DD 00H
    DQ c6c1864950d77370H
    gxfg$y ENDS

    ?bar@@YAXH@Z PROC ; bar, COMDAT
    ret 0
    ?bar@@YAXH@Z ENDP

    ?foo@@YAXP6AXH@Z@Z PROC ; foo, COMDAT
    mov rax, rcx
    mov r10, -4124868134247632016 ; c6c1864950d77370H
    mov ecx, 42 ; 0000002aH
    rex_jmp QWORD PTR __guard_xfg_dispatch_icall_fptr
    ?foo@@YAXP6AXH@Z@Z ENDP ; foo

    -fsanitize=kcfi

    Introduced to llvm-project in 2022-11 (milestone: 16.0.0.Glad as a reviewer). GCC featurerequest.

    The instrumentation does the following:

    • Stores a hash of the function prototype before the functionentry.
    • Loads the hash at an indirect call site and traps if it does notmatch the expected prototype.
    • Records the trap location in a section named.kcfi_traps. The Linux kernel uses the section to checkwhether a trap is caused by KCFI.
    • Defines a weak absolute symbol__kcfi_typeid_<func> if the function has a Cidentifier name and is address-taken.
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    __cfi_bar:
    .rept 11; nop; .endr
    movl$27004076, %eax # imm = 0x19C0CAC
    bar:
    retq

    __cfi_foo:
    .rept 11; nop; .endr
    movl$2992198919, %eax # imm = 0xB2595507
    foo:
    movq%rdi, %rax
    movl$42, %edi
    movl$4267963220, %r10d # imm = 0xFE63F354
    addl-4(%rax), %r10d
    je.Ltmp0
    .Ltmp1:
    ud2
    .section.kcfi_traps,"ao",@progbits,.text
    .Ltmp2:
    .long.Ltmp1-.Ltmp2
    .text
    .Ltmp0:
    jmpq*%rax # TAILCALL

    The large number of NOPs is to leave room for FineIBT patching atrun-time.

    An instrumented indirect call site cannot call an uninstrumentedtarget. This property appears to be satisfied in the Linux kernel.

    See cfi:Switch to -fsanitize=kcfi for the Linux kernel implementation.

    FineIBT

    This is a hybrid scheme that combines instrumentation and hardwarefeatures. It uses both an indirect branch target indicator (ENDBR) and aprototype hash check. For an analysis, see https://dustri.org/b/paper-notes-fineibt.html.

    The compromise testb 0x11, fs:0x48 can be disabled atrun-time. fs:0x48 utilizes an unused fieldtcbhead_t:unused_vgetcpu_cache insysdeps/x86_64/nptl/tls.h. This allows an attacker todisable FineIBT by zeroing this byte.

    x86/ibt:Implement FineIBT is the Linux kernel implementation which patches-fsanitize=kcfi output into FineIBT.

    Hardware-assisted

    Intel Indirect BranchTracking

    This is part of Intel Control-flow Enforcement Technology. Whenenabled, the CPU guarantees that every indirect branch lands on aspecial instruction (ENDBR, endbr32 for x86-32 andendbr64 for x86-64). In case of an violation, acontrol-protection (#CP) exception will be raised. A jump/callinstruction with the notrack prefix skips the check.

    With -fcf-protection={branch,full}, the compiler insertsENDBR at the start of a basic block which may be reached indirectly.This is conservative:

    • every address-taken basic block needs ENDBR
    • every non-internal-linkage function needs endbr as itmay be reached via PLT
    • every function compiled for the large code model needs ENDBR as itmay be reached via a large code model branch
    • landing pads for exception handling need ENDBR

    __attribute__((nocf_check)) can disable ENDBR insertionfor a function.

    ld.lld added support in 2020-01:

    • PLT entries need endbr
    • If all relocatable files with .note.gnu.property (whichcontains NT_GNU_PROPERTY_TYPE_0 notes) have set theGNU_PROPERTY_X86_FEATURE_1_IBT bit, or -z ibtis specified, the output will have the bit and synthesize ENDBRcompatible PLT entries.

    The GNU ld implementation uses a second PLT scheme(.plt.sec). I was sad about it, but in the end I decided tofollow suit for ld.lld. Some tools (e.g. objdump) use heuristics todetect foo@plt and using an alternative form will requirethem to adapt. mold nevertheless settled on its own scheme.

    swapcontext may return with an indirect branch. Thisfunction can be annotated with__attribute__((indirect_branch)) so that the call site willbe followed by an endbr.

    glibc can be built with --enable-cet. By default it usesthe NT_GNU_PROPERTY_TYPE_0 notes to decide whether toenable CET. If any of the executable and all shared objects doesn't havethe GNU_PROPERTY_X86_FEATURE_1_IBT bit, rtld will callarch_prctl with ARCH_CET_DISABLE to disableIBT. SHSTK is similar. A dlopen'ed shared object triggers the check aswell.

    As of 2023-01, the Linux kernel support is work in progress. It doesnot support ARCH_CET_STATUS/ARCH_CET_DISABLEyet.

    qemu does not support Indirect Branch Tracking yet.

    Don'tconservatively mark non-local-linkage functions

    The current scheme as well as Arm BTI conservatively adds an indirectbranch indicator to every non-internal-linkage function, which increasesthe attack surface. AlternativeCET ABI mentioned that we can use NOTRACK in PLTentries, and use new relocation types to indicate address-significantfunctions. Supporting dlsym/dlclose needsspecial tweaking.

    Technically the address significance part can be achieved with theexisting SHT_LLVM_ADDRSIG, then we can avoid introducing anew relocation type for each architecture. I am somewhat unhappy thatSHT_LLVM_ADDRSIG optimizes for size and does not makebinary manipulation convenient (objcopy andld -r sets sh_link=0 ofSHT_LLVM_ADDRSIG and invalidates the section). See Explain GNU stylelinker options. Since we just deal with x86, introducing newx86-32/x86-64 relocation types is not bad.

    When LTO is concerned, this imposes more difficulties as LLVMCodeGendoes not know (https://reviews.llvm.org/D140363):

    • whether a function is visibile to a native relocatable file(VisibleToRegularObj)
    • or whether the address of a function is taken in at least one IRmodule.

    The two properties address the two problems:

    • If the link mixes bitcode files and ELF relocatable files, for afunction in a bitcode file, F.hasAddressTaken() doesn'tindicate that its address is not taken by an ELF relocatable file.
    • For ThinLTO, a function may have falseF.hasAddressTaken() for the definition in one module andtrue F.hasAddressTaken() for a reference in anothermodule.

    Both properties need additional bookkeeping between LLVMLTO andLLVMCodeGen. The first property additionally needs the linker tocommunicate information to LLVMLTO.

    If we make ThinLTO properly combine the address-taken property (closeto!GV.use_empty() && !GV.hasAtLeastLocalUnnamedAddr()),and provide VisibleToRegularObj to LLVMCodeGen, we can usethe following condition to decide whether ENDBR is needed with anappropriate code model:

    AddressTaken || (!F.hasLocalLinkage() && (VisibleToRegularObj || !F.hasHiddenVisibility()))

    (Note: some large x86-64 executables are facing relocation overflowpressure. Range extension thunks may be a future direction. If NOTRACKis not used, we will need to conservatively mark local linkagefunctions.)

    Armv8.5 Branch TargetIdentification

    This is similar to Intel Indirect Branch Tracking. When enabled, theCPU ensures that every indirect branch lands on a special instruction,otherwise a Branch Target exception is raised. The most common landinginstructions are bti {c,j,jc} with different branchinstruction compatibility. paciasp and pacibspare implicitly bti c.

    This feature is more fine-grained: every memory page may set a bitVM_ARM64_BTI (set with the mmap flag PROT_BTI)to indicate that indirect calls from it do not need protection. Thisallows uninstrumented code to mapped simply by skippingPROT_BTI.

    The value of -mbranch-protection= can benone (no hardening), standard(bti with a non-leaf protection scope), or +separated bti and pac-ret[+b-key,+leaf]. Ifbti is enabled, the compiler insertsbti {c,j,jc} at the start of a basic block which may bereached indirectly. For a non-internal-linkage function, its entry maybe reached by a PLT or range extension thunk, so it is conservativelymarked as needing bti c.

    ld.lld added support in https://reviews.llvm.org/D62609 (with substantialchanges afterwards). If all relocatable files with.note.gnu.property have set theGNU_PROPERTY_AARCH64_FEATURE_1_BTI bit, or-z force-bti is specified, the output will have the bit andsynthesize BTI compatible PLT entries.

    qemu has supported Branch Target Identification since 2019.

    Compiler warnings

    clang -Wcast-function-type warns when a function pointeris cast to an incompatible function pointer. Calling such a castfunction pointer likely leads to-fsanitize=cfi/-fsanitize=kcfi runtime errors.For practical reasons, GCC made a choice to allow some commonviolations.

    Clang 16.0.0 makes -Wcast-function-type stricter and warns morecases. -Wno-cast-function-type-strict restores the previousstate that ignores many cases including some ABI equivalent cases.

    Summary

    Forward-edge CFI

    • check that the target may be landed indirectly but does not checkthe function signature: Control Flow Guard, Intel Indirect BranchTracking, Armv8.5 Branch Target Identification
    • check that the target may be landed indirectly and check thefunction signature: eXtended Flow Guard, -fsanitize=cfi,-fsanitize=kcfi, FineIBT

    Links:

    • RISC-V CFI: https://github.com/riscv/riscv-cfi


沪ICP备19023445号-2号
友情链接