llvm-project 16 was just released. I added some lld/ELF notes to https://github.com/llvm/llvm-project/blob/release/16.x/lld/docs/ReleaseNotes.rst.Here I will elaborate on some changes.
ELFCOMPRESS_ZSTD
compressed input sections are nowsupported. (D129406)--compress-debug-sections=zstd
is now available tocompress debug sections with zstd (ELFCOMPRESS_ZSTD
). (D133548)--no-warnings
/-w
is now available tosuppress warnings. (D136569)DT_RISCV_VARIANT_CC
is now produced if at least oneR_RISCV_JUMP_SLOT
relocation references a symbol with theSTO_RISCV_VARIANT_CC
bit. (D107951)DT_STATIC_TLS
is now set for AArch64/PPC32/PPC64initial-exec TLS models when producing a shared object.--no-undefined-version
is now the default; symbolsnamed in version scripts that have no matching symbol in the output willbe reported. Use --undefined-version
to revert to the oldbehavior. (D135402)-V
is now an alias for -v
to supportgcc -fuse-ld=lld -v
on many targets.-r
no longer defines __global_pointer$
or_TLS_MODULE_BASE_
.STB_WEAK
and STB_GNU_UNIQUE
in differentCOMDATs) is now supported. (D136381)SHT_RISCV_ATTRIBUTES
section now merges allinput components instead of picking the first input component. (D138550)-fno-plt
GD/LD TLS modelscall *___tls_get_addr@GOT(%reg)
are now supported. Previousoutput might have runtime crash.Link speed has greatly improved compared to lld 15.0.0.
In this release cycle, I made input section initialization andrelocation scanning parallel. (D130810 D133003)
1 | % hyperfine --warmup 1 --min-runs 20 "numactl -C 20-27 "{/tmp/out/custom15,/tmp/out/custom16}"/bin/ld.lld @response.txt --threads=8" |
(--threads=4
=> 1.38x (0.7009s => 0.5096s))
Linking a -DCMAKE_BUILD_TYPE=Debug
build of clang:1
2
3
4
5
6
7
8
9
10
11
12% hyperfine --warmup 2 --min-runs 25 "numactl -C 20-27 /tmp/out/custom"{15,16}"/bin/ld.lld @response.txt --threads=8"
Benchmark 1: numactl -C 20-27 /tmp/out/custom15/bin/ld.lld @response.txt --threads=8
Time (mean ± σ): 3.664 s ± 0.039 s [User: 6.657 s, System: 3.434 s]
Range (min … max): 3.605 s … 3.781 s 25 runs
Benchmark 2: numactl -C 20-27 /tmp/out/custom16/bin/ld.lld @response.txt --threads=8
Time (mean ± σ): 3.141 s ± 0.028 s [User: 7.071 s, System: 3.150 s]
Range (min … max): 3.086 s … 3.212 s 25 runs
Summary
'numactl -C 20-27 /tmp/out/custom16/bin/ld.lld @response.txt --threads=8' ran
1.17 ± 0.02 times faster than 'numactl -C 20-27 /tmp/out/custom15/bin/ld.lld @response.txt --threads=8'--threads=1
=> 1.06x (7.620s =>7.202s), --threads=4
=> 1.11x (4.138s => 3.727s))
Linking a default build of chrome: 1
2
3
4
5
6
7
8
9
10
11
12% hyperfine --warmup 2 --min-runs 25 "numactl -C 20-27 /tmp/out/custom"{15,16}"/bin/ld.lld @response.txt --threads=8"
Benchmark 1: numactl -C 20-27 /tmp/out/custom15/bin/ld.lld @response.txt --threads=8
Time (mean ± σ): 4.159 s ± 0.018 s [User: 6.703 s, System: 2.705 s]
Range (min … max): 4.136 s … 4.207 s 25 runs
Benchmark 2: numactl -C 20-27 /tmp/out/custom16/bin/ld.lld @response.txt --threads=8
Time (mean ± σ): 3.700 s ± 0.027 s [User: 7.417 s, System: 2.556 s]
Range (min … max): 3.660 s … 3.771 s 25 runs
Summary
'numactl -C 20-27 /tmp/out/custom16/bin/ld.lld @response.txt --threads=8' ran
1.12 ± 0.01 times faster than 'numactl -C 20-27 /tmp/out/custom15/bin/ld.lld @response.txt --threads=8'--threads=4
=> 1.11x (4.387s => 3.940s))