UndefinedBehaviorSanitizer(UBSan) is a undefined behavior detector for C/C++. It consists of acode instrumentation and a runtime (inllvm-project/compiler-rt
).
Clang and GCC have independent implementations. Clang implementedthe instrumentation first in 2009-12, initially named-fcatch-undefined-behavior
. GCC 4.9 implemented-fsanitize=undefined
in 2013-08.
There are many undefined behavior checkswhich can be detected. One may specify -fsanitize=xxx
wherexxx is the name of the check individually, or more commonly, specify-fsanitize=undefined
to enable all of them.
Some checks are not undefined behavior per language standards, butare often code smell and lead to surprising results. They areimplicit-unsigned-integer-truncation
,implicit-integer-sign-change
,unsigned-integer-overflow
, etc.
1 | % clang -fsanitize=undefined -xc /dev/null '-###' |
GCC implementsslightly fewer checks than Clang.
UBSan provides 3 modes to allow tradeoffs between code size anddiagnostic verboseness.
-fsanitize-minimal-runtime
)-fsanitize-trap=undefined
)Let's use a C function to compare the behaviors. 1
2
3
4
5
6
7
8
int foo(int a) { return a * 2; }
int main() {
int x;
scanf("%d", &x);
printf("a %d\n", foo(x));
printf("b %d\n", foo(x));
}
With the umbrella option -fsanitize=undefined
or thespecific -fsanitize=signed-integer-overflow
, Clang emitsthe following LLVM IR for foo
:
1 | define dso_local i32 @foo(i32 noundef %a) local_unnamed_addr #0 { |
The add
and icmp
instructions check whetherthe argument is in the range [-0x40000000, 0x40000000) (no overflow). Ifyes, return the non-overflow result. Otherwise, the emitted code callsthe callback __ubsan_handle_mul_overflow
with argumentsdescribing the source location. __ubsan_handle_mul_overflow
is implemented in the runtime (inllvm-project/compiler-rt
).
By default, all UBSan checks except return,unreachable
are recoverable (i.e. non-fatal). After__ubsan_handle_mul_overflow
(or another callback) prints anerror, the program keeps executing. The source location is marked andfurther errors about this location are suppressed.This deduplication feature makes logs less noisy. In one programinvocation, we may observe errors from potentially multiple sourcelocations.
We can let Clang exit the program upon a UBSan error (with error code1, customized by UBSAN_OPTIONS=exitcode=2
). Just specify-fno-sanitize-recover
(alias for-fno-sanitize-recover=all
),-fno-sanitize-recover=undefined
, or-fno-sanitize-recover=signed-integer-overflow
. A non-returncallback will be emitted in place of__ubsan_handle_mul_overflow
. Note that the emitted LLVM IRterminates the basic block with an unreachable
instruction.1
2
3
4handler.mul_overflow: ; preds = %entry
%2 = zext i32 %a to i64, !nosanitize !5
tail call void @__ubsan_handle_mul_overflow_abort(ptr nonnull @1, i64 %2, i64 2) #4, !nosanitize !5
unreachable, !nosanitize !5
The UBSan callbacks are provided by the runtime. For a link action,we need to inform the compiler driver that the runtime should be linked.This can be done by specifying -fsanitize=undefined
as in acompile action. (Actually, any specific UBSan check that needs runtimecan be used, e.g. -fsanitize=signed-integer-overflow
.)
1 | clang -c -O2 -fno-omit-frame-pointer -fsanitize=undefined a.c |
When linking an executable, with the default-static-libsan
mode, Clang Driver passes--whole-archive $resource_dir/lib/$triple/libclang_rt.ubsan.a --no-whole-archive
to the linker.
The default mode provides verbose diagnostics which help programmersidentify the undefined behavior. On the other hand, the detailed loghelps attackers and the code size may be a concern in someconfigurations.
UBSan provides a minimal runtime mode to log very little information.Specify -fsanitize-minimal-runtime
for both compile actionsand link actions to enable the mode.
1 | % clang -fsanitize=undefined -fsanitize-minimal-runtime a.c -o a |
In the emitted LLVM IR, a different set of callbacks__ubsan_handle_*_minimal
are used. They take no argumentand therefore make instrumented code much smaller.
1 | handler.mul_overflow: ; preds = %entry |
When -fsanitize=signed-integer-overflow
is in effect, wecan specify -fsanitize-trap=signed-integer-overflow
so thatClang will emit a trap instruction instead of a callback. This willgreatly decrease the size bloat from instrumentation compared to thedefault mode.
Usually, we specify the umbrella options-fsanitize-trap=undefined
or -fsanitize-trap
(alias for -fsanitize-trap=all
). 1
clang -S -emit-llvm -O2 -fsanitize=undefined -fsanitize-trap=undefined a.c
Instead of calling a callback, the emitted LLVM IR will call the LLVMintrinsic llvm.ubsantrap
which lowers to a trap instructionaborting the program. As a side benefit of avoiding UBSan callbacks, wedon't need a runtime. Therefore, -fsanitize=undefined
canbe omitted for link actions. Note: -fsanitize-trap=xxx
overrides -fsanitize-recover=xxx
.
1 | define dso_local i32 @foo(i32 noundef %a) local_unnamed_addr #0 { |
llvm.ubsantrap
has an integer argument. On somearchitectures this can change the encoding of the trap instruction withdifferent error types.
On x86-64, llvm.ubsantrap
lowers to the 4-byteud1l ubsan_type(%eax),%eax
(UD1 with an address-sizeoverride prefix; ud1l
is the AT&T syntax). We can writea SIGILL handler to disassembly si->si_addr
to knowwhich ubsan check failure it is (https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/CodeGenFunction.h#L112).AArch64 is similar. However, many other architectures do not providesufficient encoding space. For example, PowerPC providestrap
(alias for tw 31,0,0
). Intw TO,RA,RB
, when we specify RA=RB=0
, only 4bits of the 5-bit TO
can be used. That is insufficient.
UBSan can be used together with many other sanitizers.
1 | clang -fsanitize=address,undefined a.c |
Typically one may want to test multiple sanitizers for a project. Theability to run with another sanitizer decreases the number ofconfigurations. In practice people prefer-fsanitize=address,undefined
and-fsanitize=hwaddress,undefined
over other combinations. Ofcourse, more instrumentations mean more overhead and size bloat.
Using UBSan with libFuzzer is interesting. We can run-fsanitize=fuzzer,address,undefined -fno-sanitize-recover
to make libFuzzer abort when an AddressSanitizer orUndefinedBehaviorSanitizer error is detected.
In Clang, unlike many other sanitizers, UndefinedBehaviorSanitizer isperformed as part of Clang CodeGen instead of a LLVM pass in theoptimization pipeline.
TODO Interfere with optimizations
compiler-rt/lib/ubsan
is written in C++. It usesfeatures that are unavailable in some environments (e.g. some operatingsystem kernels). Adopting UBSan in kernels typically requiresreimplementing the runtime (usually in C).
In the Linux kernel, UBSAN:run-time undefined behavior sanity checker introducedCONFIG_UBSAN
and a runtime implementation.
NetBSD implementedµUBSan in 2018.