IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    lld 16 ELF changes

    MaskRay发表于 2023-03-19 06:29:33
    love 0

    llvm-project 16 was just released. I added some lld/ELF notes to https://github.com/llvm/llvm-project/blob/release/16.x/lld/docs/ReleaseNotes.rst.Here I will elaborate on some changes.

    • Link speed improved greatly compared with lld 15.0. Notably inputsection initialization and relocation scanning are now parallel. (D130810) (D133003)
    • ELFCOMPRESS_ZSTD compressed input sections are nowsupported. (D129406)
    • --compress-debug-sections=zstd is now available tocompress debug sections with zstd (ELFCOMPRESS_ZSTD). (D133548)
    • --no-warnings/-w is now available tosuppress warnings. (D136569)
    • DT_RISCV_VARIANT_CC is now produced if at least oneR_RISCV_JUMP_SLOT relocation references a symbol with theSTO_RISCV_VARIANT_CC bit. (D107951)
    • DT_STATIC_TLS is now set for AArch64/PPC32/PPC64initial-exec TLS models when producing a shared object.
    • --no-undefined-version is now the default; symbolsnamed in version scripts that have no matching symbol in the output willbe reported. Use --undefined-version to revert to the oldbehavior. (D135402)
    • -V is now an alias for -v to supportgcc -fuse-ld=lld -v on many targets.
    • -r no longer defines __global_pointer$ or_TLS_MODULE_BASE_.
    • A corner case of mixed GCC and Clang object files(STB_WEAK and STB_GNU_UNIQUE in differentCOMDATs) is now supported. (D136381)
    • The output SHT_RISCV_ATTRIBUTES section now merges allinput components instead of picking the first input component. (D138550)
    • For x86-32, -fno-plt GD/LD TLS modelscall *___tls_get_addr@GOT(%reg) are now supported. Previousoutput might have runtime crash.
    • Armv4(T) thunks are now supported. (D139888) (D141272)

    Speed

    Link speed has greatly improved compared to lld 15.0.0.

    In this release cycle, I made input section initialization andrelocation scanning parallel. (D130810 D133003)

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    % hyperfine --warmup 1 --min-runs 20 "numactl -C 20-27 "{/tmp/out/custom15,/tmp/out/custom16}"/bin/ld.lld @response.txt --threads=8"
    Benchmark 1: numactl -C 20-27 /tmp/out/custom15/bin/ld.lld @response.txt --threads=8
    Time (mean ± σ): 676.3 ms ± 4.4 ms [User: 751.5 ms, System: 472.3 ms]
    Range (min … max): 667.9 ms … 686.1 ms 20 runs

    Benchmark 2: numactl -C 20-27 /tmp/out/custom16/bin/ld.lld @response.txt --threads=8
    Time (mean ± σ): 453.5 ms ± 5.3 ms [User: 760.5 ms, System: 448.9 ms]
    Range (min … max): 445.4 ms … 462.8 ms 20 runs

    Summary
    'numactl -C 20-27 /tmp/out/custom16/bin/ld.lld @response.txt --threads=8' ran
    1.49 ± 0.02 times faster than 'numactl -C 20-27 /tmp/out/custom15/bin/ld.lld @response.txt --threads=8'

    (--threads=4 => 1.38x (0.7009s => 0.5096s))

    Linking a -DCMAKE_BUILD_TYPE=Debug build of clang:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    % hyperfine --warmup 2 --min-runs 25 "numactl -C 20-27 /tmp/out/custom"{15,16}"/bin/ld.lld @response.txt --threads=8"
    Benchmark 1: numactl -C 20-27 /tmp/out/custom15/bin/ld.lld @response.txt --threads=8
    Time (mean ± σ): 3.664 s ± 0.039 s [User: 6.657 s, System: 3.434 s]
    Range (min … max): 3.605 s … 3.781 s 25 runs

    Benchmark 2: numactl -C 20-27 /tmp/out/custom16/bin/ld.lld @response.txt --threads=8
    Time (mean ± σ): 3.141 s ± 0.028 s [User: 7.071 s, System: 3.150 s]
    Range (min … max): 3.086 s … 3.212 s 25 runs

    Summary
    'numactl -C 20-27 /tmp/out/custom16/bin/ld.lld @response.txt --threads=8' ran
    1.17 ± 0.02 times faster than 'numactl -C 20-27 /tmp/out/custom15/bin/ld.lld @response.txt --threads=8'
    (--threads=1 => 1.06x (7.620s =>7.202s), --threads=4 => 1.11x (4.138s => 3.727s))

    Linking a default build of chrome:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    % hyperfine --warmup 2 --min-runs 25 "numactl -C 20-27 /tmp/out/custom"{15,16}"/bin/ld.lld @response.txt --threads=8"
    Benchmark 1: numactl -C 20-27 /tmp/out/custom15/bin/ld.lld @response.txt --threads=8
    Time (mean ± σ): 4.159 s ± 0.018 s [User: 6.703 s, System: 2.705 s]
    Range (min … max): 4.136 s … 4.207 s 25 runs

    Benchmark 2: numactl -C 20-27 /tmp/out/custom16/bin/ld.lld @response.txt --threads=8
    Time (mean ± σ): 3.700 s ± 0.027 s [User: 7.417 s, System: 2.556 s]
    Range (min … max): 3.660 s … 3.771 s 25 runs

    Summary
    'numactl -C 20-27 /tmp/out/custom16/bin/ld.lld @response.txt --threads=8' ran
    1.12 ± 0.01 times faster than 'numactl -C 20-27 /tmp/out/custom15/bin/ld.lld @response.txt --threads=8'
    (--threads=1 => )



沪ICP备19023445号-2号
友情链接