IT博客汇 | All about LeakSanitizer

All about LeakSanitizer

MaskRay发表于 2023-02-13 00:31:19

Clang and GCC 4.9implemented LeakSanitizer in 2013. LeakSanitizer(LSan) is a memory leak detector. It intercepts memory allocationfunctions and by default detects memory leaks at atexittime. The implementation is purely in the runtime(compiler-rt/lib/lsan) and no instrumentation isneeded.

LSan has very little architecture-specific code and supports many64-bit targets. Some 32-bit targets (e.g. Linux arm/x86-32) aresupported as well, but there may be high false negatives because of thesmall pointer size. Every supported operating system needs to providesome way to "stop the world".

Usage

LSan can be used in 3 ways.

Standalone (-fsanitize=leak)
AddressSanitizer (-fsanitize=address)
HWAddressSanitizer (-fsanitize=hwaddress)

The most common way to use LSan isclang -fsanitize=address (or GCC). For LSan-supportedtargets (#define CAN_SANITIZE_LEAKS 1), theAddressSanitizer (ASan) runtime enables LSan by default.

% cat a.c
#include <stdlib.h>
int main() {
  void **p = malloc(42); // leak (categorized as "Direct leak")
  *p = malloc(43);       // leak (categorized as "Indirect leak")
  p = 0;
}
% clang -fsanitize=address a.c -o a
% ./a

=================================================================
==594015==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 42 byte(s) in 1 object(s) allocated from:
    #0 0x55fffef9482e  (/tmp/c/a+0xea82e)
    #1 0x55fffefcf1d1  (/tmp/c/a+0x1251d1)
    #2 0x7f7301626189  (/lib/x86_64-linux-gnu/libc.so.6+0x27189) (BuildId: c4f6727c560b1c33527ff9e0ca0cef13a7db64d2)

Indirect leak of 43 byte(s) in 1 object(s) allocated from:
    #0 0x55fffef9482e  (/tmp/c/a+0xea82e)
    #1 0x55fffefcf1df  (/tmp/c/a+0x1251df)
    #2 0x7f7301626189  (/lib/x86_64-linux-gnu/libc.so.6+0x27189) (BuildId: c4f6727c560b1c33527ff9e0ca0cef13a7db64d2)

SUMMARY: AddressSanitizer: 85 byte(s) leaked in 2 allocation(s).

As a runtime-only feature, -fsanitize=leak is mainly forlink actions. When linking an executable, with the default-static-libsan mode on many targets, Clang Driver passes--whole-archive $resource_dir/lib/$triple/libclang_rt.lsan.a --no-whole-archiveto the linker. GCC and some platforms prefer shared runtime/dynamicruntime. See Allabout sanitizer interceptors.

% clang -fsanitize=leak a.o '-###' |& grep --color lsan
... --whole-archive" "/tmp/Rel/lib/clang/17/lib/x86_64-unknown-linux-gnu/libclang_rt.lsan.a" "--no-whole-archive" ...
% gcc -fsanitize=leak a.o '-###' |& grep --color lsan
... /usr/lib/gcc/x86_64-linux-gnu/12/liblsan_preinit.o --push-state --no-as-needed -llsan ...

-fsanitize=leak affects compile actions as well, butonly for the following C/C++ preprocessor feature.

1
2
3

#if __has_feature(leak_sanitizer)
...
#endif

Implementation overview

Standalone LSan intercepts malloc-family functions. It uses thetemporized SizeClassAllocator{32,64} with chunk metadata.The interceptors record the allocation information (requested size,stack trace).

AddressSanitizer already intercepts malloc-family functions. Itschunk metadata has extra bits for LSan.

By default, the common options detect_leaks andleak_check_at_exit are enabled. The runtime installs a hookwith atexit which will perform the leak check.Alternatively, the user can call __lsan_do_leak_check torequest a leak check before the exit time.

Upon a leak check, the runtime performs a job very similar to amark-and-sweep garbage collection algorithm. It suspends all threads("stop the world") and scans root regions (find allocations reachable bythe process). Root regions include:

ignored allocations due to __lsan_ignore_object or__lsan_disable
global regions (__lsan::ProcessGlobalRegions). OnLinux, these refer to memory mappings due to writablePT_LOAD program headers. This ensures that allocationsreachable by a global variable are not leaks.
for each thread, registers (default use_registers=1),stack (default use_stack=1), thread-local storage (defaultuse_tls=1), and additional pointers in the threadcontext
root regions due to __lsan_register_root_region
operating system specific allocations. This is currentlymacOS-specific: libdispatch and Foundation memory regions, etc.

The runtimes uses a flood fill algorithm to find reachableallocations from a region. The runtime is conservative and scans allaligned words which look like a pointer. (It is not feasible todetermine whether a pointer-like word is used as an integer/floatingpoint number, not as a pointer.) If the word looks like a pointer into aheap (many 64-bit targets use[0x600000000000, 0x640000000000)), the runtime checkswhether it refers to an active chunk. In Valgrind's terms, a referencecan be a "start-pointer" (a pointer to the start of the chunk) or an"interior-pointer" (a pointer to the middle of the chunk).

Finally, the runtime iterates over all active allocations and reportsleaks for unmarked allocations.

Metadata

Each allocation reserves 2 bits to record its state:leaked/reachable/ignored. For better diagnostics, "leaked" can be director indirect. If a chunk is marked as leaked, all chunks reachable fromit are marked as indirectly leaked. (*p = malloc(43); inthe very beginning example is an indirect leak.)

enum ChunkTag {
  kDirectlyLeaked = 0,  // default
  kIndirectlyLeaked = 1,
  kReachable = 2,
  kIgnored = 3
};

Standalone LSan uses this chunk metadata struct:

struct ChunkMetadata {
  u8 allocated : 8;  // Must be first.
  ChunkTag tag : 2;
#if SANITIZER_WORDSIZE == 64
  uptr requested_size : 54;
#else
  uptr requested_size : 32;
  uptr padding : 22;
#endif
  u32 stack_trace_id;
};

ASan just stores a 2-bit ChunkTag in its existing chunkmetadata (__asan::Chunkheader::lsan_tag). Similarly, HWASanstores a 2-bit ChunkTag in its existing chunk metadata(__hwasan::Metadata::lsan_tag).

Allocator

Both standalone LSan and ASan use the temporizedSizeClassAllocator{32,64} (primary allocator) with athread-local cache (SizeClassAllocator{32,64}LocalCache).The cache defines many size classes and maintains free lists for theseclasses. Upon an allocation request, if the free list for the requestedsize class is empty, the cache will call Refill to grabmore chunks from SizeClassAllocator{32,64}. If the freelist is not empty, the cache hands over a chunk from the free list tothe user.

SizeClassAllocator64 has a total allocator space ofkSpaceSize bytes. The space is split into multiple regionsof the same size (kSpaceSize), each serving a single sizeclass. When the cache calls Refill,SizeClassAllocator64 takes a lock of the region and callsmmap to allocate a new memory mapping (and ifkMetadataSize!=0, another memory mapping for metadata). Foran active allocation, it is very efficient to compute its index in theregion and its associated metadata.

Stop the world

On Linux, the runtime invokes the clone syscall to create a tracerthread. The tracer thread calls ptrace withPTRACE_ATTACH to stop every other thread.

On Fuchsia, the runtime calls__sanitizer_memory_snapshot to stop the world.

Runtime options

Specify the environment variable LSAN_OPTIONS to toggleruntime behaviors.

For standalone LSan, exitcode=23 is the default. Theruntime calls _exit(23) upon leaks. For ASan-integratedLSan, exitcode=1 is the default.

LSAN_OPTIONS=use_registers=0:use_stack=0:use_tls=0 canremove some default root regions.

leak_check_at_exit=0 disables registering anatexit hook for leak checking.

report_objects=1 reports the addresses of individualleaked objects.

detect_leaks=0 disables all leak checking, includinguser-requested ones due to __lsan_do_leak_check or__lsan_do_recoverable_leak_check. This is similar todefiningextern "C" int __lsan_is_turned_off() { return 1; } in theprogram. Using standalone LSan with detect_leaks=0 has aperformance characteristic similar to using pureSizeClassAllocator{32,64} and has nearly no extra overheadexcept the stack trace. If we can get rid of the stack unwindingoverhead, we will have a simple one benchmarkingSizeClassAllocator{32,64}'s performance.

If __lsan_default_options is defined, the return valuewill be parsed like LSAN_OPTIONS.

Issue suppression

Call __lsan_ignore_object to ignore an allocation.__lsan_disable ignores all allocations for the currentthread until __lsan_enable is called.

The runtime scans each ignored allocation. An allocation reachable byan ignored allocation is not considered as a leak.

__lsan_register_root_region registers a region as aroot. The runtime scans the intersection of the region and valid memorymappings (/proc/self/maps on Linux).LSAN_OPTIONS=use_root_regions=0 can disable the registeredregions.

We can provide a suppression file withLSAN_OPTIONS=suppressions=a.supp. The file contains onesuppression rule per line, each rule being of the formleak:<pattern>. For a leak, the runtime checks everyframe in the stack trace. A frame has a module name associated with thecall site address (executable or shared object), and when symbolizationis available, a source file name and a function name. If any modulename/source file name/function name matches a pattern (substringmatching using glob), the leak is suppressed.

Note: symbolization requires debug information and a symbolizer(internal symbolizer (not built by default) or anllvm-symbolizer in a PATH directory).

Let's see an example.

cat > a.c <<'eof'
#include <stdlib.h>
#include <stdio.h>
void *foo();
int main() {
  foo();
  printf("%p\n", malloc(42));
}
eof
cat > b.c <<'eof'
#include <stdlib.h>
void *foo() { return malloc(42); }
eof
clang -fsanitize=leak -fpic -g -shared b.c -o b.so
clang -fsanitize=leak -g a.c ./b.so -o a

which llvm-symbolizer  # available
LSAN_OPTIONS=suppressions=<(printf 'leak:a') ./a      # suppresses both leaks (by module name)
LSAN_OPTIONS=suppressions=<(printf 'leak:a.c') ./a    # suppresses both leaks (by source file name)
LSAN_OPTIONS=suppressions=<(printf 'leak:b.c') ./a    # suppresses the leak in b.so (by source file name)
LSAN_OPTIONS=suppressions=<(printf 'leak:main') ./a   # suppresses both leaks (by function name)
LSAN_OPTIONS=suppressions=<(printf 'leak:foo') ./a    # suppresses the leak in b.so (by function name)

Miscellaneous

Standalone LeakSanitizer can be used with SanitizerCoverage:clang -fsanitize=leak -fsanitize-coverage=func,trace-pc-guard a.c

The testsuite has been moved. Usegit log -- compiler-rt/lib/lsan/lit_tests/TestCases/pointer_to_self.ccto do archaeology for old tests.

Similar tools

Valgrind's Memcheck defaults to --leak-check=yes andperforms leak checking. The feature has been available since the initialrevision (git log -- vg_memory.c) in 2002.

Google open sourced HeapLeakChecker as part of gperftools. It is partof TCMalloc and used with debugallocation.cc.Multi-threading allocations need to grab a global lock and are muchslower than a sanitizer allocator.heap_check_max_pointer_offset (default: 2048) specifies thelargest offset it scans an allocation for pointers. The default makes itvulnerable to false positives.

heaptrack is a heapmemory profiler which supports leak checking.