IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    lld 17 ELF changes

    MaskRay发表于 2023-07-31 04:01:32
    love 0

    LLVM 17 will be released. As usual, I maintain lld/ELF and have addedsome notes to https://github.com/llvm/llvm-project/blob/release/17.x/lld/docs/ReleaseNotes.rst.Here I will elaborate on some changes.

    • When --threads= is not specified, the number ofconcurrency is now capped to 16. A large --thread= can harmperformance, especially with some system malloc implementations likeglibc's. (D147493)
    • --remap-inputs= and --remap-inputs-file=are added to remap input files. (D148859)
    • --lto= is now available to supportclang -funified-lto (D123805)
    • --lto-CGO[0-3] is now available to controlCodeGenOpt::Level independent of the LTO optimizationlevel. (D141970)
    • --check-dynamic-relocations= is now correct 32-bittargets when the addend is larger than 0x80000000. (D149347)
    • --print-memory-usage has been implemented for memoryregions. (D150644)
    • SHF_MERGE, --icf=, and--build-id=fast have switched to 64-bit xxh3. (D154813)
    • Quoted output section names can now be used in linker scripts.(#60496 <https://github.com/llvm/llvm-project/issues/60496>_)
    • MEMORY can now be used without a SECTIONScommand. (D145132)
    • REVERSE can now be used in input section descriptionsto reverse the order of input sections. (D145381)
    • Program header assignment can now be used withinOVERLAY. This functionality was accidentally lost in 2020.(D150445)
    • Operators ^ and ^= can now be used inlinker scripts.
    • LoongArch is now supported.
    • DT_AARCH64_MEMTAG_* dynamic tags are now supported. (D143769)
    • AArch32 port now supports BE-8 and BE-32 modes for big-endian. (D140201) (D140202) (D150870)
    • R_ARM_THM_ALU_ABS_G* relocations are now supported. (D153407)
    • .ARM.exidx sections may start at non-zero outputsection offset. (D148033)
    • Arm Cortex-M Security Extensions is now implemented. (D139092)
    • BTI landing pads are now added to PLT entries accessed by rangeextension thunks or relative vtables. (D148704) (D153264)
    • AArch64 short range thunk has been implemented to mitigate theperformance loss of a long range thunk. (D148701)
    • R_AVR_8_LO8/R_AVR_8_HI8/R_AVR_8_HLO8/R_AVR_LO8_LDI_GS/R_AVR_HI8_LDI_GShave been implemented. (D147100) (D147364)
    • --no-power10-stubs now works for PowerPC64.
    • DT_PPC64_OPT is now supported. (D150631)
    • PT_RISCV_ATTRIBUTES is added to include theSHT_RISCV_ATTRIBUTES section. (D152065)
    • R_RISCV_PLT32 is added to support C++ relative vtables.(D143115)
    • RISC-V global pointer relaxation has been implemented. Specify--relax-gp to enable the linker relaxation. (D143673)
    • The symbol value of foo is correctly handled when--wrap=foo and RISC-V linker relaxation are used. (D151768)
    • x86-64 large data sections are now placed away from code sections toalleviate relocation overflow pressure. (D150510)

    When using glibc malloc with a largerstd::thread::hardware_concurrency (say, more than 16),parallel relocation scanning can be quite slower without the--threads=16 throttling.

    I usually try to make extensions, unless too LLVM internal specific(e.g. --lto-*), accepted by the binutils community. The feature request for--remap-inputs= and --remap-inputs-file= was asuccess story, implemented by GNU ld 2.41.

    PT_RISCV_ATTRIBUTES output is still not quite right. Ialso question about its usefulness. Unfortunately, at this stage, it'sdifficult to getrid of it.

    This cycle has a surprising number of new features, and I have spentlots of spare time reviewing them to ensure that they are robust andproperly tested. Most stuff is completely unrelated to my day job.

    There are quite a few AArch32 changes from Arm engineers, primarilyabout big-endian support and Cortex-M Security Extensions.

    I was firm that the RISC-V global pointer relaxation needs to beopt-in. I had a GNU ld --relax-gp patch last year andutilitized this opportunity (ld.lld feature proposal) to move forwardGNU ld --relax-gp. It's unfortunately opt-out, but havingan option is a step forward.

    Speed

    Unlike previous versions, there is just a minor performanceimprovement compared with lld 15.0.0. I added a simplified version of64-bit xxh3 into the LLVMSupport library and utilized it inlld.

    Linking a -DCMAKE_BUILD_TYPE=Debug build of clang 16:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    % hyperfine --warmup 2 --min-runs 25 "numactl -C 20-27 "{/tmp/out/custom-16/bin/ld.lld,/tmp/out/custom-17/bin/ld.lld}" @response.txt --threads=8"
    Benchmark 1: numactl -C 20-27 /tmp/out/custom-16/bin/ld.lld @response.txt --threads=8
    Time (mean ± σ): 3.159 s ± 0.035 s [User: 7.089 s, System: 3.076 s]
    Range (min … max): 3.095 s … 3.250 s 25 runs

    Benchmark 2: numactl -C 20-27 /tmp/out/custom-17/bin/ld.lld @response.txt --threads=8
    Time (mean ± σ): 3.131 s ± 0.027 s [User: 6.851 s, System: 3.101 s]
    Range (min … max): 3.080 s … 3.198 s 25 runs

    Summary
    'numactl -C 20-27 /tmp/out/custom-17/bin/ld.lld @response.txt --threads=8' ran
    1.01 ± 0.01 times faster than 'numactl -C 20-27 /tmp/out/custom-16/bin/ld.lld @response.txt --threads=8'

    This influence to the total link time is small. However, if I testthe time proportion of the hash function in the total link time, I cansee that the proportion has been reduced to nearly one third. On someworkload and some machines this effect may be larger.



沪ICP备19023445号-2号
友情链接