UndefinedBehaviorSanitizer(UBSan) is an undefined behavior detector for C/C++. It consists of codeinstrumentation and a runtime. Both components have multiple independentimplementations.
Clang implementedthe first few checks in 2009-12, initially named-fcatch-undefined-behavior
. In 2012-fsanitize=undefined
was added and-fcatch-undefined-behavior
was removed.GCC 4.9 implemented-fsanitize=undefined
in 2013-08.
The runtime used by Clang lives inllvm-project/compiler-rt/lib/ubsan
. GCC from time to timesyncs its downstream fork of the sanitizers part of compiler-rt(libsanitizer
). The end of the article lists somealternative runtime implementations.
There are many undefined behavior checkswhich can be detected. One can specify -fsanitize=xxx,yyy
where xxx and yyy are the names of the individual checks, or morecommonly, specify -fsanitize=undefined
to enable all ofthem.
Some checks are not undefined behavior per language standards, butare often code smell and lead to surprising results. They areimplicit-unsigned-integer-truncation
,implicit-integer-sign-change
,unsigned-integer-overflow
, etc.
We can use -###
to get the list of default UBSan checks.1
2% clang -fsanitize=undefined -xc /dev/null '-###'
... -fsanitize=alignment,array-bounds,bool,builtin,enum,float-cast-overflow,function,integer-divide-by-zero,nonnull-attribute,null,pointer-overflow,return,returns-nonnull-attribute,shift-base,shift-exponent,signed-integer-overflow,unreachable,vla-bound,vptr" "-fsanitize-recover=alignment,array-bounds,bool,builtin,enum,float-cast-overflow,function,integer-divide-by-zero,nonnull-attribute,null,pointer-overflow,returns-nonnull-attribute,shift-base,shift-exponent,signed-integer-overflow,vla-bound,vptr" ...
GCC implements slightlyfewer checks than Clang.
UBSan provides 3 modes to allow tradeoffs between code size anddiagnostic verboseness.
-fsanitize-minimal-runtime
)-fsanitize-trap=undefined
): no runtimeLet's use the following program to compare the instrumentation andruntime behaviors. 1
2
3
4
5
6
7
8
int foo(int a) { return a * 2; }
int main() {
int x;
scanf("%d", &x);
printf("a %d\n", foo(x));
printf("b %d\n", foo(x));
}
With the umbrella option -fsanitize=undefined
or thespecific -fsanitize=signed-integer-overflow
, Clang emitsthe following LLVM IR for foo
:
1 | define dso_local i32 @foo(i32 noundef %a) local_unnamed_addr #0 { |
The add
and icmp
instructions check whetherthe argument is in the range [-0x40000000, 0x40000000) (no overflow). Ifyes, the cont
branch is taken and the non-overflow resultis returned. Otherwise, the emitted code calls the callback__ubsan_handle_mul_overflow
(implemented by the runtime)with arguments describing the source location.
By default, all UBSan checks except return,unreachable
are recoverable (i.e. non-fatal). After__ubsan_handle_mul_overflow
(or another callback) prints anerror, the program keeps executing. The source location is marked andfurther errors about this location are suppressed.This deduplication feature makes logs less noisy. In one programinvocation, we may observe errors from potentially multiple sourcelocations.
We can let Clang exit the program upon a UBSan error (with error code1, customized by UBSAN_OPTIONS=exitcode=2
). Just specify-fno-sanitize-recover
(alias for-fno-sanitize-recover=all
),-fno-sanitize-recover=undefined
, or-fno-sanitize-recover=signed-integer-overflow
. A non-returncallback will be emitted in place of__ubsan_handle_mul_overflow
. Note that the emitted LLVM IRterminates the basic block with an unreachable
instruction.1
2
3
4handler.mul_overflow: ; preds = %entry
%2 = zext i32 %a to i64, !nosanitize !5
tail call void @__ubsan_handle_mul_overflow_abort(ptr nonnull @1, i64 %2, i64 2) #4, !nosanitize !5
unreachable, !nosanitize !5
The UBSan callbacks are provided by the runtime. For a link action,we need to inform the compiler driver that the runtime should be linked.This can be done by specifying -fsanitize=undefined
.(Actually, any specific UBSan check that needs runtime can be used, e.g.-fsanitize=signed-integer-overflow
.)
1 | clang -c -O2 -fsanitize=undefined a.c |
Some checks (function,vptr
) call C++ specific callbacksimplemented in libclang_rt.ubsan_standalone_cxx.a
. We needto use clang++ -fsanitize=undefined
for the link action (oruseclang -fsanitize=undefined -fsanitize-link-c++-runtime
).
1 | % clang -fsanitize=undefined a.o '-###' |& grep --color ubsan |
When linking an executable, with the default-static-libsan
mode on many targets, Clang Driver passes--whole-archive $resource_dir/lib/$triple/libclang_rt.ubsan.a --no-whole-archive
to the linker. GCC and some platforms prefer shared runtime/dynamicruntime. See Allabout sanitizer interceptors.
Some sanitizers (address
, memory
,thread
, etc) ship a copy of UBSan runtime files.
The default mode provides verbose diagnostics which help programmersidentify the undefined behavior. On the other hand, the detailed loghelps attackers and the code size may be a concern in someconfigurations.
UBSan provides a minimal runtime mode to log very little information.Specify -fsanitize-minimal-runtime
for both compile actionsand link actions to enable the mode.
1 | % clang -fsanitize=undefined -fsanitize-minimal-runtime a.c -o a |
In the emitted LLVM IR, a different set of callbacks__ubsan_handle_*_minimal
are used. They take no argumentand therefore make instrumented code much smaller.
1 | handler.mul_overflow: ; preds = %entry |
The UBSan minimal runtime uses a separate set of runtime files(libclang_rt.ubsan_minimal.*
) to decrease the runtime size.1
2% clang -fsanitize=undefined -fsanitize-minimal-runtime a.c '-###' |& grep --color=auto ubsan_minimal
... "--whole-archive" "/tmp/Rel/lib/clang/17/lib/x86_64-unknown-linux-gnu/libclang_rt.ubsan_minimal.a" "--no-whole-archive" "--dynamic-list=/tmp/Rel/lib/clang/17/lib/x86_64-unknown-linux-gnu/libclang_rt.ubsan_minimal.a.syms" ...
When -fsanitize=signed-integer-overflow
is in effect, wecan specify -fsanitize-trap=signed-integer-overflow
so thatClang will emit a trap instruction instead of a callback. This willgreatly decrease the size bloat from instrumentation compared to thedefault mode.
Usually, we specify the umbrella options-fsanitize-trap=undefined
or -fsanitize-trap
(alias for -fsanitize-trap=all
). 1
clang -S -emit-llvm -O2 -fsanitize=undefined -fsanitize-trap=undefined a.c
(-fsanitize-undefined-trap-on-error
is a deprecatedalias for -fsanitize-trap=undefined
.)
Instead of calling a callback, the emitted LLVM IR will call the LLVMintrinsic llvm.ubsantrap
which lowers to a trap instructionaborting the program. As a side benefit of avoiding UBSan callbacks, wedon't need a runtime. Therefore, -fsanitize=undefined
canbe omitted for link actions. Note: -fsanitize-trap=xxx
overrides -fsanitize-recover=xxx
.
1 | define dso_local i32 @foo(i32 noundef %a) local_unnamed_addr #0 { |
llvm.ubsantrap
has an integer argument. On somearchitectures this can change the encoding of the trap instruction withdifferent error types.
On x86-64, llvm.ubsantrap
lowers to the 4-byteud1l ubsan_type(%eax),%eax
(UD1 with an address-sizeoverride prefix; ud1l
is the AT&T syntax). AArch64provides BRK
with 16-bit immediate.
However, many other architectures do not provide sufficient encodingspace for their trap instructions. For example, PowerPC providestrap
(alias for tw 31,0,0
). Intw TO,RA,RB
, when we specify RA=RB=0
, only 4bits of the 5-bit TO
can be used, which isinsufficient.
For AArch64 and x86-64, we can register a signal handler todisassemble the instruction at si->si_addr
. It candistinguish UBSan check errors from other faults. We can even decode theUBSan type numbers from #define LIST_SANITIZER_CHECKS
and give a better diagnostic.
1 |
|
1 | % clang++ -fsanitize=undefined -fsanitize-trap=undefined a.cc -o a |
TODO: arm64:Support Clang UBSAN trap codes for better reporting
UBSan can be used together with many other sanitizers.
1 | clang -fsanitize=address,undefined a.c |
Typically one may want to test multiple sanitizers for a project. Theability to use UBSan with another sanitizer decreases the number ofconfigurations. In practice people prefer-fsanitize=address,undefined
and-fsanitize=hwaddress,undefined
over other combinations. Ofcourse, adding UBSan on top of another instrumentation means moreoverhead and size bloat.
Using UBSan with libFuzzer is interesting. We can run-fsanitize=fuzzer,address,undefined -fno-sanitize-recover
to make libFuzzer abort when an AddressSanitizer orUndefinedBehaviorSanitizer error is detected. Note: original libFuzzerdevelopers have stopped activework on libFuzzer and switched to Centipede.
As mentioned, most UBSan checks are recoverable by default. Specifyhalt_on_error=1
to get a behavior similar to compile-time-fno-sanitize-recover=undefined
.
1 | % UBSAN_OPTIONS=halt_on_error=1 ./a <<< 1073741824 |
We can define __ubsan_default_options
to set the defaultoptions. 1
2
3extern "C" const char *__ubsan_default_options() {
return "print_stacktrace=1";
}
The GNU function attribute__attribute__((no_sanitize("undefined")))
can disableinstrumentation for a function. (GCC additionally supports a deprecatedattribute __attribute__((no_sanitize_undefined)))
.)
If $resource_dir/share/ubsan_ignorelist.txt
is present,it will be used as the default system ignorelist (which can beoverridden with -fsanitize-system-ignorelist=
). See Sanitizerspecial case list for its format. This file instructs Clang CodeGento disable instrumentations for specified functions or files. One canspecify -fsanitize-ignorelist=
multiple times to use moreignorelists.
In a large code base, enabling a check entails fixing all existingissues or working around them with the no_sanitize
functionattribute. An ignorelist offers a mechanism to incrementally enable acheck. This is useful for toolchain maintainers who need to fight withnew code.
For example, once all source files except [a-m]*
arefree of UBSan errors (or suppressed with function attributes), we canuse the following patterns in ubsan_ignorelist.txt
.1
2
3[alignment]
src:[a-m]*
src:./[a-m]*
./
is for included files.
Suppression can be done at runtime as well. UBSan supports a runtimeoption suppressions
:UBSAN_OPTIONS=suppressions=a.supp
.
By default the diagnostic does not include a stack trace. Specifyprint_stacktrace=1
to get one. If the program is compiledwith -g1
or -g
, andllvm-symbolizer
is in a PATH
directory, we canget a symbolized stack trace.
1 | % UBSAN_OPTIONS=print_stacktrace=1 ./a <<< 1073741824 |
Like other sanitizers, UBSan runtime supports stack unwinding withDWARF Call Frame Information and frame pointers. Many targets enable.eh_frame
(a variant of DWARF Call Frame Information) bydefault (-fasynchronous-unwind-tables
). If it is disabled,ensure -fno-omit-frame-pointer
is in effect (many targetsdefault to -fomit-frame-pointer
with -O1
andabove).
In Clang, unlike many other sanitizers, UndefinedBehaviorSanitizer isperformed as part of Clang CodeGen instead of a LLVM pass in theoptimization pipeline.
With the default mode and the minimal runtime, UBSan instrumentationinserts callbacks. This makes the instrumented function non-leaf.
compiler-rt/lib/ubsan
is written in C++. It usesfeatures that are unavailable in some environments (e.g. some operatingsystem kernels). Adopting UBSan in kernels typically requiresreimplementing the runtime (usually in C).
In the Linux kernel, UBSAN:run-time undefined behavior sanity checker introducedCONFIG_UBSAN
and a runtime implementation in 2016.
NetBSD implementedµUBSan in 2018.
Android's doc: https://source.android.com/docs/security/test/sanitizers#undefinedbehaviorsanitizer