IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    All about UndefinedBehaviorSanitizer

    MaskRay发表于 2023-03-24 19:00:24
    love 0

    UndefinedBehaviorSanitizer(UBSan) is an undefined behavior detector for C/C++. It consists of codeinstrumentation and a runtime. Both components have multiple independentimplementations.

    Clang implementedthe first few checks in 2009-12, initially named-fcatch-undefined-behavior. In 2012-fsanitize=undefined was added and-fcatch-undefined-behavior was removed.GCC 4.9 implemented-fsanitize=undefined in 2013-08.

    The runtime used by Clang lives inllvm-project/compiler-rt/lib/ubsan. GCC from time to timesyncs its downstream fork of the sanitizers part of compiler-rt(libsanitizer). The end of the article lists somealternative runtime implementations.

    Available checks

    There are many undefined behavior checkswhich can be detected. One can specify -fsanitize=xxx,yyywhere xxx and yyy are the names of the individual checks, or morecommonly, specify -fsanitize=undefined to enable all ofthem.

    Some checks are not undefined behavior per language standards, butare often code smell and lead to surprising results. They areimplicit-unsigned-integer-truncation,implicit-integer-sign-change,unsigned-integer-overflow, etc.

    We can use -### to get the list of default UBSan checks.

    1
    2
    % clang -fsanitize=undefined -xc /dev/null '-###'
    ... -fsanitize=alignment,array-bounds,bool,builtin,enum,float-cast-overflow,function,integer-divide-by-zero,nonnull-attribute,null,pointer-overflow,return,returns-nonnull-attribute,shift-base,shift-exponent,signed-integer-overflow,unreachable,vla-bound,vptr" "-fsanitize-recover=alignment,array-bounds,bool,builtin,enum,float-cast-overflow,function,integer-divide-by-zero,nonnull-attribute,null,pointer-overflow,returns-nonnull-attribute,shift-base,shift-exponent,signed-integer-overflow,vla-bound,vptr" ...

    GCC implements slightlyfewer checks than Clang.

    Modes

    UBSan provides 3 modes to allow tradeoffs between code size anddiagnostic verboseness.

    • Default mode
    • Minimal runtime (-fsanitize-minimal-runtime)
    • Trap mode (-fsanitize-trap=undefined): no runtime

    Let's use the following program to compare the instrumentation andruntime behaviors.

    1
    2
    3
    4
    5
    6
    7
    8
    #include <stdio.h>
    int foo(int a) { return a * 2; }
    int main() {
    int x;
    scanf("%d", &x);
    printf("a %d\n", foo(x));
    printf("b %d\n", foo(x));
    }

    Default mode

    With the umbrella option -fsanitize=undefined or thespecific -fsanitize=signed-integer-overflow, Clang emitsthe following LLVM IR for foo:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    define dso_local i32 @foo(i32 noundef %a) local_unnamed_addr #0 {
    entry:
    %0 = add i32 %a, 1073741824
    %1 = icmp sgt i32 %0, -1
    br i1 %1, label %cont, label %handler.mul_overflow, !prof !4, !nosanitize !5

    handler.mul_overflow: ; preds = %entry
    %2 = zext i32 %a to i64, !nosanitize !5
    tail call void @__ubsan_handle_mul_overflow(ptr nonnull @1, i64 %2, i64 2) #3, !nosanitize !5
    br label %cont, !nosanitize !5

    cont: ; preds = %handler.mul_overflow, %entry
    %3 = shl i32 %a, 1
    ret i32 %3
    }

    The add and icmp instructions check whetherthe argument is in the range [-0x40000000, 0x40000000) (no overflow). Ifyes, the cont branch is taken and the non-overflow resultis returned. Otherwise, the emitted code calls the callback__ubsan_handle_mul_overflow (implemented by the runtime)with arguments describing the source location.

    By default, all UBSan checks except return,unreachableare recoverable (i.e. non-fatal). After__ubsan_handle_mul_overflow (or another callback) prints anerror, the program keeps executing. The source location is marked andfurther errors about this location are suppressed.This deduplication feature makes logs less noisy. In one programinvocation, we may observe errors from potentially multiple sourcelocations.

    We can let Clang exit the program upon a UBSan error (with error code1, customized by UBSAN_OPTIONS=exitcode=2). Just specify-fno-sanitize-recover (alias for-fno-sanitize-recover=all),-fno-sanitize-recover=undefined, or-fno-sanitize-recover=signed-integer-overflow. A non-returncallback will be emitted in place of__ubsan_handle_mul_overflow. Note that the emitted LLVM IRterminates the basic block with an unreachable instruction.

    1
    2
    3
    4
    handler.mul_overflow:                             ; preds = %entry
    %2 = zext i32 %a to i64, !nosanitize !5
    tail call void @__ubsan_handle_mul_overflow_abort(ptr nonnull @1, i64 %2, i64 2) #4, !nosanitize !5
    unreachable, !nosanitize !5

    Linking

    The UBSan callbacks are provided by the runtime. For a link action,we need to inform the compiler driver that the runtime should be linked.This can be done by specifying -fsanitize=undefined.(Actually, any specific UBSan check that needs runtime can be used, e.g.-fsanitize=signed-integer-overflow.)

    1
    2
    clang -c -O2 -fsanitize=undefined a.c
    clang -fsanitize=undefined a.o -o a

    Some checks (function,vptr) call C++ specific callbacksimplemented in libclang_rt.ubsan_standalone_cxx.a. We needto use clang++ -fsanitize=undefined for the link action (oruseclang -fsanitize=undefined -fsanitize-link-c++-runtime).

    1
    2
    3
    4
    5
    6
    % clang -fsanitize=undefined a.o '-###' |& grep --color ubsan
    ... "--whole-archive" ".../libclang_rt.ubsan_standalone.a" "--no-whole-archive" "--dynamic-list=.../libclang_rt.ubsan_standalone.a.syms" ...
    % clang++ -fsanitize=undefined a.o '-###' |& grep --color ubsan
    ... "--whole-archive" ".../libclang_rt.ubsan_standalone.a" "--no-whole-archive" "--dynamic-list=.../libclang_rt.ubsan_standalone.a.syms" "--whole-archive" ".../libclang_rt.ubsan_standalone_cxx.a" "--no-whole-archive" "--dynamic-list=.../libclang_rt.ubsan_standalone_cxx.a.syms" ...
    % clang++ -fsanitize=undefined -shared-libsan a.o '-###' |& grep --color ubsan
    ... ".../libclang_rt.ubsan_standalone.so" ...

    When linking an executable, with the default-static-libsan mode on many targets, Clang Driver passes--whole-archive $resource_dir/lib/$triple/libclang_rt.ubsan.a --no-whole-archiveto the linker. GCC and some platforms prefer shared runtime/dynamicruntime. See Allabout sanitizer interceptors.

    Some sanitizers (address, memory,thread, etc) ship a copy of UBSan runtime files.

    Minimal runtime

    The default mode provides verbose diagnostics which help programmersidentify the undefined behavior. On the other hand, the detailed loghelps attackers and the code size may be a concern in someconfigurations.

    UBSan provides a minimal runtime mode to log very little information.Specify -fsanitize-minimal-runtime for both compile actionsand link actions to enable the mode.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    % clang -fsanitize=undefined -fsanitize-minimal-runtime a.c -o a
    % ./a <<< 1073741824
    ubsan: mul-overflow by 0x000055565bab6839
    a -2147483648
    b -2147483648
    % clang -fsanitize=undefined -fno-sanitize-recover -fsanitize-minimal-runtime a.c -o a
    % ./a <<< 1073741824
    ubsan: mul-overflow by 0x000056443f9a7839
    [1] 3857513 IOT instruction ./a <<< 1073741824

    In the emitted LLVM IR, a different set of callbacks__ubsan_handle_*_minimal are used. They take no argumentand therefore make instrumented code much smaller.

    1
    2
    3
    handler.mul_overflow:                             ; preds = %entry
    tail call void @__ubsan_handle_mul_overflow_minimal() #4, !nosanitize !6
    br label %cont, !nosanitize !6

    The UBSan minimal runtime uses a separate set of runtime files(libclang_rt.ubsan_minimal.*) to decrease the runtime size.

    1
    2
    % clang -fsanitize=undefined -fsanitize-minimal-runtime a.c '-###' |& grep --color=auto ubsan_minimal
    ... "--whole-archive" "/tmp/Rel/lib/clang/17/lib/x86_64-unknown-linux-gnu/libclang_rt.ubsan_minimal.a" "--no-whole-archive" "--dynamic-list=/tmp/Rel/lib/clang/17/lib/x86_64-unknown-linux-gnu/libclang_rt.ubsan_minimal.a.syms" ...

    Trap mode

    When -fsanitize=signed-integer-overflow is in effect, wecan specify -fsanitize-trap=signed-integer-overflow so thatClang will emit a trap instruction instead of a callback. This willgreatly decrease the size bloat from instrumentation compared to thedefault mode.

    Usually, we specify the umbrella options-fsanitize-trap=undefined or -fsanitize-trap(alias for -fsanitize-trap=all).

    1
    clang -S -emit-llvm -O2 -fsanitize=undefined -fsanitize-trap=undefined a.c

    (-fsanitize-undefined-trap-on-error is a deprecatedalias for -fsanitize-trap=undefined.)

    Instead of calling a callback, the emitted LLVM IR will call the LLVMintrinsic llvm.ubsantrap which lowers to a trap instructionaborting the program. As a side benefit of avoiding UBSan callbacks, wedon't need a runtime. Therefore, -fsanitize=undefined canbe omitted for link actions. Note: -fsanitize-trap=xxxoverrides -fsanitize-recover=xxx.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    define dso_local i32 @foo(i32 noundef %a) local_unnamed_addr #0 {
    entry:
    %0 = add i32 %a, 1073741824
    %1 = icmp sgt i32 %0, -1
    br i1 %1, label %cont, label %trap, !nosanitize !5

    trap: ; preds = %entry
    tail call void @llvm.ubsantrap(i8 12) #4, !nosanitize !5
    unreachable, !nosanitize !5

    cont: ; preds = %entry
    %2 = shl nsw i32 %a, 1
    ret i32 %2
    }

    llvm.ubsantrap has an integer argument. On somearchitectures this can change the encoding of the trap instruction withdifferent error types.

    On x86-64, llvm.ubsantrap lowers to the 4-byteud1l ubsan_type(%eax),%eax (UD1 with an address-sizeoverride prefix; ud1l is the AT&T syntax). AArch64provides BRK with 16-bit immediate.

    However, many other architectures do not provide sufficient encodingspace for their trap instructions. For example, PowerPC providestrap (alias for tw 31,0,0). Intw TO,RA,RB, when we specify RA=RB=0, only 4bits of the 5-bit TO can be used, which isinsufficient.

    For AArch64 and x86-64, we can register a signal handler todisassemble the instruction at si->si_addr. It candistinguish UBSan check errors from other faults. We can even decode theUBSan type numbers from #define LIST_SANITIZER_CHECKSand give a better diagnostic.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    #include <signal.h>
    #include <stdint.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>

    static void handler(int signo, siginfo_t *si, void *uc) {
    #ifdef __aarch64__
    if (signo == SIGTRAP && si && si->si_addr) {
    uint32_t insn;
    memcpy(&insn, si->si_addr, sizeof(insn));
    if ((insn & 0xffe0001f) == 0xd4200000 && ((insn >> 13) & 255) == 'U') {
    char buf[64];
    int len = snprintf(buf, sizeof buf, "TRAP(%u) at %p, possibly UBSan error\n", (insn>>5 & 255), si->si_addr);
    write(STDOUT_FILENO, buf, len);
    }
    }
    #endif
    #ifdef __x86_64__
    if (signo == SIGILL && si && si->si_code == ILL_ILLOPN && si->si_addr) {
    auto *pc = reinterpret_cast<const unsigned char *>(si->si_addr);
    if (pc[0] == 0x67 && pc[1] == 0x0f && pc[2] == 0xb9 && pc[3] == 0x40) {
    char buf[64];
    int len = snprintf(buf, sizeof buf, "UD1 at %p, possibly UBSan error\n", pc);
    write(STDOUT_FILENO, buf, len);
    }
    }
    #endif
    abort();
    }

    int foo(int a) { return a * 2; }

    int main() {
    struct sigaction sa = {};
    sa.sa_flags = SA_SIGINFO;
    sa.sa_sigaction = &handler;
    #if defined(__aarch64__)
    sigaction(SIGTRAP, &sa, nullptr);
    #elif defined(__x86_64__)
    sigaction(SIGILL, &sa, nullptr);
    #endif

    int x;
    scanf("%d", &x);
    printf("a %d\n", foo(x));
    printf("b %d\n", foo(x));
    }
    1
    2
    3
    4
    % clang++ -fsanitize=undefined -fsanitize-trap=undefined a.cc -o a
    % ./a <<< 1073741824
    TRAP(12) at 0xaaaaea87eb48, possibly UBSan error
    [1] 529341 abort (core dumped) ./a <<< 1073741824

    TODO: arm64:Support Clang UBSAN trap codes for better reporting

    Together with othersanitizers

    UBSan can be used together with many other sanitizers.

    1
    2
    3
    4
    5
    6
    clang -fsanitize=address,undefined a.c
    clang -fsanitize=fuzzer,undefined -fno-sanitize-recover a.c
    clang -fsanitize=hwaddress,undefined a.c
    clang -fsanitize=memory,undefined a.c
    clang -fsanitize=thread,undefined a.c
    clang -fsanitize=cfi,undefined -flto -fvisibility=hidden a.c

    Typically one may want to test multiple sanitizers for a project. Theability to use UBSan with another sanitizer decreases the number ofconfigurations. In practice people prefer-fsanitize=address,undefined and-fsanitize=hwaddress,undefined over other combinations. Ofcourse, adding UBSan on top of another instrumentation means moreoverhead and size bloat.

    Using UBSan with libFuzzer is interesting. We can run-fsanitize=fuzzer,address,undefined -fno-sanitize-recoverto make libFuzzer abort when an AddressSanitizer orUndefinedBehaviorSanitizer error is detected. Note: original libFuzzerdevelopers have stopped activework on libFuzzer and switched to Centipede.

    Runtime options

    As mentioned, most UBSan checks are recoverable by default. Specifyhalt_on_error=1 to get a behavior similar to compile-time-fno-sanitize-recover=undefined.

    1
    2
    3
    % UBSAN_OPTIONS=halt_on_error=1 ./a <<< 1073741824
    a.c:2:27: runtime error: signed integer overflow: 1073741824 * 2 cannot be represented in type 'int'
    SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior a.c:2:27 in

    We can define __ubsan_default_options to set the defaultoptions.

    1
    2
    3
    extern "C" const char *__ubsan_default_options() {
    return "print_stacktrace=1";
    }

    Issue suppression

    The GNU function attribute__attribute__((no_sanitize("undefined"))) can disableinstrumentation for a function. (GCC additionally supports a deprecatedattribute __attribute__((no_sanitize_undefined))).)

    If $resource_dir/share/ubsan_ignorelist.txt is present,it will be used as the default system ignorelist (which can beoverridden with -fsanitize-system-ignorelist=). See Sanitizerspecial case list for its format. This file instructs Clang CodeGento disable instrumentations for specified functions or files. One canspecify -fsanitize-ignorelist= multiple times to use moreignorelists.

    In a large code base, enabling a check entails fixing all existingissues or working around them with the no_sanitize functionattribute. An ignorelist offers a mechanism to incrementally enable acheck. This is useful for toolchain maintainers who need to fight withnew code.

    For example, once all source files except [a-m]* arefree of UBSan errors (or suppressed with function attributes), we canuse the following patterns in ubsan_ignorelist.txt.

    1
    2
    3
    [alignment]
    src:[a-m]*
    src:./[a-m]*

    ./ is for included files.

    Suppression can be done at runtime as well. UBSan supports a runtimeoption suppressions:UBSAN_OPTIONS=suppressions=a.supp.

    Stack traces

    By default the diagnostic does not include a stack trace. Specifyprint_stacktrace=1 to get one. If the program is compiledwith -g1 or -g, andllvm-symbolizer is in a PATH directory, we canget a symbolized stack trace.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    % UBSAN_OPTIONS=print_stacktrace=1 ./a <<< 1073741824
    a.c:2:27: runtime error: signed integer overflow: 1073741824 * 2 cannot be represented in type 'int'
    #0 0x55f582e9d2f1 in foo /tmp/c/a.c:2:27
    #1 0x55f582e9d2f1 in main /tmp/c/a.c:6:20
    #2 0x7fe67838f189 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #3 0x7fe67838f244 in __libc_start_main csu/../csu/libc-start.c:381:3
    #4 0x55f582e75d80 in _start (/tmp/c/a+0x17d80)

    SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior a.c:2:27 in
    a -2147483648
    b -2147483648

    Like other sanitizers, UBSan runtime supports stack unwinding withDWARF Call Frame Information and frame pointers. Many targets enable.eh_frame (a variant of DWARF Call Frame Information) bydefault (-fasynchronous-unwind-tables). If it is disabled,ensure -fno-omit-frame-pointer is in effect (many targetsdefault to -fomit-frame-pointer with -O1 andabove).

    Miscellaneous

    In Clang, unlike many other sanitizers, UndefinedBehaviorSanitizer isperformed as part of Clang CodeGen instead of a LLVM pass in theoptimization pipeline.

    With the default mode and the minimal runtime, UBSan instrumentationinserts callbacks. This makes the instrumented function non-leaf.

    Selected applications

    compiler-rt/lib/ubsan is written in C++. It usesfeatures that are unavailable in some environments (e.g. some operatingsystem kernels). Adopting UBSan in kernels typically requiresreimplementing the runtime (usually in C).

    In the Linux kernel, UBSAN:run-time undefined behavior sanity checker introducedCONFIG_UBSAN and a runtime implementation in 2016.

    NetBSD implementedµUBSan in 2018.

    Android's doc: https://source.android.com/docs/security/test/sanitizers#undefinedbehaviorsanitizer



沪ICP备19023445号-2号
友情链接