IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    All about UndefinedBehaviorSanitizer

    MaskRay发表于 2023-01-30 00:02:25
    love 0

    UndefinedBehaviorSanitizer(UBSan) is a undefined behavior detector for C/C++. It consists of acode instrumentation and a runtime (inllvm-project/compiler-rt).

    Clang and GCC have independent implementations. Clang implementedthe instrumentation first in 2009-12, initially named-fcatch-undefined-behavior. GCC 4.9 implemented-fsanitize=undefined in 2013-08.

    Available checks

    There are many undefined behavior checkswhich can be detected. One may specify -fsanitize=xxx wherexxx is the name of the check individually, or more commonly, specify-fsanitize=undefined to enable all of them.

    Some checks are not undefined behavior per language standards, butare often code smell and lead to surprising results. They areimplicit-unsigned-integer-truncation,implicit-integer-sign-change,unsigned-integer-overflow, etc.

    1
    2
    % clang -fsanitize=undefined -xc /dev/null '-###'
    ... -fsanitize=alignment,array-bounds,bool,builtin,enum,float-cast-overflow,function,integer-divide-by-zero,nonnull-attribute,null,pointer-overflow,return,returns-nonnull-attribute,shift-base,shift-exponent,signed-integer-overflow,unreachable,vla-bound,vptr" "-fsanitize-recover=alignment,array-bounds,bool,builtin,enum,float-cast-overflow,function,integer-divide-by-zero,nonnull-attribute,null,pointer-overflow,returns-nonnull-attribute,shift-base,shift-exponent,signed-integer-overflow,vla-bound,vptr" ...

    GCC implementsslightly fewer checks than Clang.

    Modes

    UBSan provides 3 modes to allow tradeoffs between code size anddiagnostic verboseness.

    • Default mode
    • Minimal runtime (-fsanitize-minimal-runtime)
    • Trap mode (-fsanitize-trap=undefined)

    Let's use a C function to compare the behaviors.

    1
    2
    3
    4
    5
    6
    7
    8
    #include <stdio.h>
    int foo(int a) { return a * 2; }
    int main() {
    int x;
    scanf("%d", &x);
    printf("a %d\n", foo(x));
    printf("b %d\n", foo(x));
    }

    Default mode

    With the umbrella option -fsanitize=undefined or thespecific -fsanitize=signed-integer-overflow, Clang emitsthe following LLVM IR for foo:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    define dso_local i32 @foo(i32 noundef %a) local_unnamed_addr #0 {
    entry:
    %0 = add i32 %a, 1073741824
    %1 = icmp sgt i32 %0, -1
    br i1 %1, label %cont, label %handler.mul_overflow, !prof !4, !nosanitize !5

    handler.mul_overflow: ; preds = %entry
    %2 = zext i32 %a to i64, !nosanitize !5
    tail call void @__ubsan_handle_mul_overflow(ptr nonnull @1, i64 %2, i64 2) #3, !nosanitize !5
    br label %cont, !nosanitize !5

    cont: ; preds = %handler.mul_overflow, %entry
    %3 = shl i32 %a, 1
    ret i32 %3
    }

    The add and icmp instructions check whetherthe argument is in the range [-0x40000000, 0x40000000) (no overflow). Ifyes, return the non-overflow result. Otherwise, the emitted code callsthe callback __ubsan_handle_mul_overflow with argumentsdescribing the source location. __ubsan_handle_mul_overflowis implemented in the runtime (inllvm-project/compiler-rt).

    By default, all UBSan checks except return,unreachableare recoverable (i.e. non-fatal). After__ubsan_handle_mul_overflow (or another callback) prints anerror, the program keeps executing. The source location is marked andfurther errors about this location are suppressed.This deduplication feature makes logs less noisy. In one programinvocation, we may observe errors from potentially multiple sourcelocations.

    We can let Clang exit the program upon a UBSan error (with error code1, customized by UBSAN_OPTIONS=exitcode=2). Just specify-fno-sanitize-recover (alias for-fno-sanitize-recover=all),-fno-sanitize-recover=undefined, or-fno-sanitize-recover=signed-integer-overflow. A non-returncallback will be emitted in place of__ubsan_handle_mul_overflow. Note that the emitted LLVM IRterminates the basic block with an unreachable instruction.

    1
    2
    3
    4
    handler.mul_overflow:                             ; preds = %entry
    %2 = zext i32 %a to i64, !nosanitize !5
    tail call void @__ubsan_handle_mul_overflow_abort(ptr nonnull @1, i64 %2, i64 2) #4, !nosanitize !5
    unreachable, !nosanitize !5

    Linking

    The UBSan callbacks are provided by the runtime. For a link action,we need to inform the compiler driver that the runtime should be linked.This can be done by specifying -fsanitize=undefined as in acompile action. (Actually, any specific UBSan check that needs runtimecan be used, e.g. -fsanitize=signed-integer-overflow.)

    1
    2
    clang -c -O2 -fno-omit-frame-pointer -fsanitize=undefined a.c
    clang -fsanitize=undefined a.o -o a

    When linking an executable, with the default-static-libsan mode, Clang Driver passes--whole-archive $resource_dir/lib/$triple/libclang_rt.ubsan.a --no-whole-archiveto the linker.

    Minimal runtime

    The default mode provides verbose diagnostics which help programmersidentify the undefined behavior. On the other hand, the detailed loghelps attackers and the code size may be a concern in someconfigurations.

    UBSan provides a minimal runtime mode to log very little information.Specify -fsanitize-minimal-runtime for both compile actionsand link actions to enable the mode.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    % clang -fsanitize=undefined -fsanitize-minimal-runtime a.c -o a
    % ./a <<< 1073741824
    ubsan: mul-overflow by 0x000055565bab6839
    a -2147483648
    b -2147483648
    % clang -fsanitize=undefined -fno-sanitize-recover -fsanitize-minimal-runtime a.c -o a
    % ./a <<< 1073741824
    ubsan: mul-overflow by 0x000056443f9a7839
    [1] 3857513 IOT instruction ./a <<< 1073741824
    % clang -fsanitize=undefined -fsanitize-minimal-runtime a.c '-###' |& grep --color=auto ubsan_minimal
    ... "--whole-archive" "/tmp/Rel/lib/clang/17/lib/x86_64-unknown-linux-gnu/libclang_rt.ubsan_minimal.a" "--no-whole-archive" "--dynamic-list=/tmp/Rel/lib/clang/17/lib/x86_64-unknown-linux-gnu/libclang_rt.ubsan_minimal.a.syms" ...

    In the emitted LLVM IR, a different set of callbacks__ubsan_handle_*_minimal are used. They take no argumentand therefore make instrumented code much smaller.

    1
    2
    3
    handler.mul_overflow:                             ; preds = %entry
    tail call void @__ubsan_handle_mul_overflow_minimal() #4, !nosanitize !6
    br label %cont, !nosanitize !6

    Trap mode

    When -fsanitize=signed-integer-overflow is in effect, wecan specify -fsanitize-trap=signed-integer-overflow so thatClang will emit a trap instruction instead of a callback. This willgreatly decrease the size bloat from instrumentation compared to thedefault mode.

    Usually, we specify the umbrella options-fsanitize-trap=undefined or -fsanitize-trap(alias for -fsanitize-trap=all).

    1
    clang -S -emit-llvm -O2 -fsanitize=undefined -fsanitize-trap=undefined a.c

    Instead of calling a callback, the emitted LLVM IR will call the LLVMintrinsic llvm.ubsantrap which lowers to a trap instructionaborting the program. As a side benefit of avoiding UBSan callbacks, wedon't need a runtime. Therefore, -fsanitize=undefined canbe omitted for link actions. Note: -fsanitize-trap=xxxoverrides -fsanitize-recover=xxx.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    define dso_local i32 @foo(i32 noundef %a) local_unnamed_addr #0 {
    entry:
    %0 = add i32 %a, 1073741824
    %1 = icmp sgt i32 %0, -1
    br i1 %1, label %cont, label %trap, !nosanitize !5

    trap: ; preds = %entry
    tail call void @llvm.ubsantrap(i8 12) #4, !nosanitize !5
    unreachable, !nosanitize !5

    cont: ; preds = %entry
    %2 = shl nsw i32 %a, 1
    ret i32 %2
    }

    llvm.ubsantrap has an integer argument. On somearchitectures this can change the encoding of the trap instruction withdifferent error types.

    On x86-64, llvm.ubsantrap lowers to the 4-byteud1l ubsan_type(%eax),%eax (UD1 with an address-sizeoverride prefix; ud1l is the AT&T syntax). We can writea SIGILL handler to disassembly si->si_addr to knowwhich ubsan check failure it is (https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/CodeGenFunction.h#L112).AArch64 is similar. However, many other architectures do not providesufficient encoding space. For example, PowerPC providestrap (alias for tw 31,0,0). Intw TO,RA,RB, when we specify RA=RB=0, only 4bits of the 5-bit TO can be used. That is insufficient.

    Used together with othersanitizers

    UBSan can be used together with many other sanitizers.

    1
    2
    3
    4
    5
    6
    clang -fsanitize=address,undefined a.c
    clang -fsanitize=fuzzer,undefined -fno-sanitize-recover a.c
    clang -fsanitize=hwaddress,undefined a.c
    clang -fsanitize=memory,undefined a.c
    clang -fsanitize=thread,undefined a.c
    clang -fsanitize=cfi,undefined -flto -fvisibility=hidden a.c

    Typically one may want to test multiple sanitizers for a project. Theability to run with another sanitizer decreases the number ofconfigurations. In practice people prefer-fsanitize=address,undefined and-fsanitize=hwaddress,undefined over other combinations. Ofcourse, more instrumentations mean more overhead and size bloat.

    Using UBSan with libFuzzer is interesting. We can run-fsanitize=fuzzer,address,undefined -fno-sanitize-recoverto make libFuzzer abort when an AddressSanitizer orUndefinedBehaviorSanitizer error is detected.

    Miscellaneous

    In Clang, unlike many other sanitizers, UndefinedBehaviorSanitizer isperformed as part of Clang CodeGen instead of a LLVM pass in theoptimization pipeline.

    TODO Interfere with optimizations

    Applications

    compiler-rt/lib/ubsan is written in C++. It usesfeatures that are unavailable in some environments (e.g. some operatingsystem kernels). Adopting UBSan in kernels typically requiresreimplementing the runtime (usually in C).

    In the Linux kernel, UBSAN:run-time undefined behavior sanity checker introducedCONFIG_UBSAN and a runtime implementation.

    NetBSD implementedµUBSan in 2018.



沪ICP备19023445号-2号
友情链接