IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    A compact relocation format for ELF

    MaskRay发表于 2024-03-10 00:56:46
    love 0

    ELF's design emphasizes natural size and alignment guidelines for itscontrol structures. This principle, outlined in Proceedings of theSummer 1990 USENIX Conference, ELF: An Object File to MitigateMischievous Misoneism, promotes ease of random access forstructures like program headers, section headers, and symbols.

    All data structures that the object file format defines follow the"natural" size and alignment guidelines for the relevant class. Ifnecessary, data structures contain explicit padding to ensure 4-bytealignment for 4-byte objects, to force structure sizes to a multiple offour, etc. Data also have suitable alignment from the beginning of thefile. Thus, for example, a structure containing an Elf32_Addr memberwill be aligned on a 4-byte boundary within the file. Other classeswould have appropriately scaled definitions. To illustrate, the 64-bitclass would define Elf64 Addr as an 8-byte object, aligned on an 8-byteboundary. Following the strictest alignment for each object allows theformat to work on any machine in a class. That is, all ELF structures onall 32-bit machines have congruent templates. For portability, ELF usesneither bit-fields nor floating-point values, because theirrepresentations vary, even among pro- cessors with the same byte order.Of course the pro- grams in an ELF file may use these types, but theformat itself does not.

    While beneficial for many control structures, the natural sizeguideline presents significant drawbacks for relocations. Sincerelocations are typically processed sequentially, they don't gain thesame random-access advantages. The large 24-byte Elf64_Rela structurehighlights the drawback. For a detailed comparison of relocationformats, see Exploringobject file formats#Relocations.

    Furthermore, Elf32_Rel and Elf32_Relasacrifice flexibility to maintain a smaller size, limiting relocationtypes to a maximum of 255. This constraint has become noticeable forAArch32 and RISC-V. While the 24-bit symbol index field is less elegant,it hasn't posed significant issues in real-world use cases.

    In contrast, the WebAssemblyobject file format uses LEB128 encoding for relocations and otherconstrol structures, offering a significant size advantage over ELF.

    Inspired by WebAssembly, I will explore real-world scenarios whererelocation size is critical and propose an alternative format (RELLEB)that addresses ELF's limitations.

    Use cases

    Dynamic relocations

    A substantial part of position-independent executables (PIEs) anddynamic shared objects (DSOs) is occupied by dynamic relocations. WhileRELR (acompact relative relocation format) offers size-saving benefits forrelative relocations, other dynamic relocations can benefit from acompact relocation format.

    ld.lld --pack-dyn-relocs=android was an earlier designthat applies to all dynamic relocations at the cost of complexity.

    Additionally, Apple linkers and dyld use LEB128 encoding for bindopcodes.

    Marker relocations

    Marker relocations are utilized to indicate certain linkeroptimization/relaxation is applicable. While many marker relocations areused scarcely, RISC-V relocatable files are typically filled up withR_RISCV_RELAX relocations. Their size contribution is quitesubstantial.

    .llvm_addrsig

    On many Linux targets, Clang emits a special section called.llvm_addrsig (type SHT_LLVM_ADDRSIG, LLVMaddress-significance table) by default to allowld.lld --icf=safe. The .llvm_addrsig sectionstores symbol indexes in ULEB128 format, independent of relocations.Consequently, tools like ld -r and objcopy risk invalidatethe section due to symbol table modifications.

    Ideally, using relocations would allow certain operations. However,the size concern of REL/RELA in ELF hinders this approach. In contrast,lld's Mach-O port chosea relocation-based representation for__DATA,__llvm_addrsig.

    .llvm.call-graph-profile

    LLVM leverages a special section called.llvm.call-graph-profile (typeSHT_LLVM_CALL_GRAPH_PROFILE) for both instrumentation- andsample-based profile-guided optimization (PGO). lld utilizesthis information ((from_symbol, to_symbol, weight) tuples) tooptimize section ordering within an input section description, enhancingcache utilization and minimizing TLB thrashing.

    Similar to .llvm_addrsig, the.llvm.call-graph-profile section initially faced the symbolindex invalidation problem, which was solved by switching torelocations. I opted for REL over RELA to reduce code size.

    .debug_names

    DWARF v5 accelerated name-based access with the introduction of the.debug_names section. However, in aclang -g -gpubnames generated relocatable file, the.rela.debug_names section can consume a significant portion(approximately 10%) of the file size. This size increase has sparkeddiscussions within the LLVM community about potentially alteringthe file format for linking purposes.

    The availability of a more compact relocation format would likelyalleviate the need for such format changes.

    Compressed relocations

    While the standard SHF_COMPRESSED feature is commonlyused for debug sections, its application can easily extend to relocationsections. I have developed a Clang/lld prototype that demonstrates thisby compressing SHT_RELA sections.

    The compressed SHT_RELA section occupiessizeof(Elf64_Chdr) + size(compressed) bytes. Theimplementation retains uncompressed content if compression would resultin a larger size.

    In scenarios with numerous smaller relocation sections (such as whenusing -ffunction-sections -fdata-sections), the 24-byteElf64_Chdr header can introduce significant overhead. Thisobservation raises the question of whether encodingElf64_Chdr fields using ULEB128 could further optimize filesizes. With larger monolithic sections (.text,.data, .eh_frame), compression ratio would behigher as well.

    1
    2
    3
    4
    5
    6
    7
    8
    # configure-llvm is my wrapper of cmake that specifies some useful options.
    configure-llvm s2-custom0 -DLLVM_TARGETS_TO_BUILD=host -DLLVM_ENABLE_PROJECTS='clang;lld'
    configure-llvm s2-custom1 -DLLVM_TARGETS_TO_BUILD=host -DLLVM_ENABLE_PROJECTS='clang;lld' -DCMAKE_{C,CXX}_FLAGS=-Xclang=--compress-relocations=zstd
    ninja -C /tmp/out/s2-custom0 lld
    ninja -C /tmp/out/s2-custom1 lld

    ruby -e 'p Dir.glob("/tmp/out/s2-custom0/**/*.o").sum{|f| File.size(f)}' # 137725952
    ruby -e 'p Dir.glob("/tmp/out/s2-custom1/**/*.o").sum{|f| File.size(f)}' # 117747320

    Despite the overhead of-ffunction-sections -fdata-sections, the compressiontechnique yields a significant reduction of 14.5%!

    However, dropping in-place relocation processing is a downside.

    RELLEB relocation format

    The 1990 ELF paper ELF: An Object File to Mitigate MischievousMisoneism says "ELF allows extension and redefinition for othercontrol structures." Inspired by WebAssembly, let's explore RELLEB, anew and more compact relocation format designed to replace RELA. Ouremphasis is on simplicity over absolute minimal encoding. See the end ofthe article for a detailed format description.

    A SHT_RELLEB section (preferred name:.relleb<name>) holds compact relocation entries thatdecode to Elf32_Relr or Elf64_Relr dependingon the object file class (32-bit or 64-bit). The content begins with aULEB128-encoded relocation count, followed by a sequence of relocationentries. Individual entries encode r_offset,r_type, r_symidx using ULEB128/SLEB128, withan optional member encoding r_addend.

    Here are key design choices:

    Relocation count (ULEB128):

    This allows for efficient retrieval of the relocation count withoutdecoding the entire section. While a uint32_t (like SHT_HASH)could be used, ULEB128 aligns with subsequent entries and offers aslight size advantage in most cases when the number of symbols can beencoded in one to three bytes.

    Delta encoding for r_offset (ULEB128):

    Section offsets can be large, and relocations are typically ordered.Storing the difference between consecutive offsets offers compressionpotential. In most cases, a single byte will suffice. While there areexceptions (general dynamic TLS model of s390/s390x uses a local"out-of-order" pair:R_390_PLT32DBL(offset=o) R_390_TLS_GDCALL(offset=o-2)), weare optimizing for the common case. Switching to SLEB128 would increasethe total .o size by 0.1%.

    For ELFCLASS32, r_offsets members are calculated usingmodular arithmetic modulo 4294967296.

    Delta encoding for r_type (SLEB128):

    Some psABIs utilize relocation types greater than 128. AArch64'sstatic relocation types begin at 257 and dynamic relocation types beginat 1024, necessitating two bytes with ULEB128/SLEB128 encoding in theabsence of delta encoding. Delta encoding allows all but the firstrelocation's type to be encoded in a single byte. An alternative designis to define a base type in the header and encode types relative to thebase type, which would introduce slight complexity.

    If the AArch32 psABI could be redesigned, allocating[0,64) for Thumb relocation types and [64,*)for ARM relocation types would optimize delta encoding even further.

    While sharing a single type code for multiple relocations would beefficient, it would require reordering relocations. This conflicts withorder requirements imposed by several psABIs and could complicate linkerimplementations.

    Symbol index and addend presence (SLEB128):

    ULEB128 optimizes for the common case when the symbol index isencodable in one or two bytes. Using SLEB128 and delta encoding insteadof ULEB128 for the symbol index field would increase the total size by0.4%. Potential gains for dynamic relocations with consecutive indexesare outweighed by the loss in static relocations and added complexity,hence avoiding delta encoding. We indicate addend omission using thesign bit (see below).

    Delta encoding for addend (SLEB128):

    Consecutive static relocations often have identical addends,especially on architectures with frequent zero addends (AArch64,PowerPC, RISC-V, etc). Addend omission optimizes this scenario.Additionally, it benefits architectures like AArch32 and x86, whichoften have identical negative addends (call instructions) forconsecutive relocations. Dynamic relocations (exceptR_*_RELATIVE and R_*_IRELATIVE) typically havezero addends, also benefiting from our scheme.

    The sign bit of the previous member flags whether the addend haschanged relative to the prior entry. When the addend changes, we use anaddend delta. This offers a slight size advantage (about 0.20%) andoptimizes for cases like:

    1
    2
    3
    4
    .quad .data + 0x78
    .quad .data + 0x80
    .quad .data + 0x88
    ...

    Note: when the bitwise NOT code path is taken, the zero delta addendis not utilized.

    1
    2
    // Many R_X86_64_PLT32(g-4)
    void f() { g(); g(); g(); ... }

    There is no RELLEB replacement for .rela.plt. In glibc,there is complexity due to __rela_iplt_start.

    I have developed a prototype at https://github.com/MaskRay/llvm-project/tree/demo-relleb.RELLEB demonstrates superrior size reduction compared to theSHF_COMPRESSED SHT_RELA approach.

    1
    2
    3
    4
    5
    configure-llvm s2-custom2 -DLLVM_TARGETS_TO_BUILD=host -DLLVM_ENABLE_PROJECTS='clang;lld' -DCMAKE_{C,CXX}_FLAGS=-mrelleb
    # Change /usr/local/bin/ld to point to an updated lld
    ninja -C /tmp/out/s2-custom2 lld

    ruby -e 'p Dir.glob("/tmp/out/s2-custom2/**/*.o").sum{|f| File.size(f)}' # 114305360

    RELLEB yields a significant reduction of 20.5%! The total relocationsection size has decreased from 28768176 to 5318839, 18.4% or theoriginal.

    It would be interesting to explore the potential gains of combiningzstd compression with RELLEB.

    1
    2
    3
    4
    configure-llvm s2-custom3 -DLLVM_TARGETS_TO_BUILD=host -DLLVM_ENABLE_PROJECTS='clang;lld' -DCMAKE_{C,CXX}_FLAGS='-mrelleb -Xclang --compress-relocations=zstd'
    ninja -C /tmp/out/s2-custom3 lld

    ruby -e 'p Dir.glob("/tmp/out/s2-custom3/**/*.o").sum{|f| File.size(f)}' # 113061168

    While the 26.4% reduction in RELLEB section size suggests room forfurther optimization, the overall decrease of only 1.088% in.o file sizes indicates that the current compact relocationformat offers a reasonable compromise. (In the absence of the addendpresence and delta addend technique, the overall decrease is about1.5%.)

    I debated whether to name the new section SHT_RELOC(.reloc<name>) or SHT_RELLEB(.relleb<name>). Ultimately, I choseSHT_RELLEB because its unique nature minimizes potentialconfusion, whereas SHT_RELOC could be confused withSHT_REL and SHT_RELA.

    RELLEB for dynamicrelocations

    RELLEB excels with static relocations, but what about the dynamiccase? A truly optimal dynamic relocation format would differsubstantially. Since dynamic relocations are often adjacent in offsetsby 4 or 8 and include fewer types, a generalized RELR format would beideal. Here's a possible encoding:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    // For R_*_RELATIVE
    Encode(RELR bytes)
    Encode(R_*_RELATIVE)
    RELR

    // For R_*_GLOB_DAT or the symbolic relocation type
    Encode(R_*_GLOB_DAT)
    Use RELR to encode offsets
    Encode symbol indexes separately

    // For R_*_JUMP_SLOT
    Encode(R_*_JUMP_SLOT)
    Use RELR to encode offsets
    Encode symbol indexes separately

    While RELLEB is a step up from REL/RELA for dynamic relocations, it'snot perfect. Android's packed relocation format, despite its complexityand lack of bitmap encoding, provides greater space savings.

    I've implemented -z relleb in my prototype for testing.Let's link clang-16-debug using several methods and compare: RELA,--pack-dyn-relocs=relr,--pack-dyn-relocs=android+relr, and--pack-dyn-relocs=relr -z relleb.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    % llvm-readelf -S clang | grep ' \.rel.*\.'
    [ 8] .rela.dyn RELA 0000000000f28998 f28998 e177d0 18 A 3 0 8
    [ 9] .rela.plt RELA 0000000001d40168 1d40168 0025b0 18 AI 3 26 8
    % llvm-readelf -S clang.relr | grep ' \.rel.*\.'
    [ 8] .rela.dyn RELA 0000000000f28998 f28998 016e90 18 A 3 0 8
    [ 9] .relr.dyn RELR 0000000000f3f828 f3f828 02ccd0 08 A 0 0 8
    [10] .rela.plt RELA 0000000000f6c4f8 f6c4f8 0025b0 18 AI 3 27 8
    % llvm-readelf -S clang.relr+android | grep ' \.rel.*\.'
    [ 8] .rela.dyn ANDROID_RELA 0000000000f28998 f28998 001707 01 A 3 0 8
    [ 9] .relr.dyn RELR 0000000000f2a0a0 f2a0a0 02ccd0 08 A 0 0 8
    [10] .rela.plt RELA 0000000000f56d70 f56d70 0025b0 18 AI 3 27 8
    % llvm-readelf -S clang.relr+relleb | grep ' \.rel.*\.'
    [ 8] .relleb.dyn 0x14: <unknown> 0000000000f28998 f28998 003bcd 01 A 3 0 8
    [ 9] .relr.dyn RELR 0000000000f2c568 f2c568 02ccd0 08 A 0 0 8
    [10] .rela.plt RELA 0000000000f59238 f59238 0025b0 18 AI 3 27 8

    Analysis

    • RELR significantly optimizes relative relocations, offering thelargest size reduction.
    • RELLEB further improves the non-relative portion, achieving a decent16.3% reduction. However, it's overshadowed by Android packedrelocations (6.3%).
    • While Android packed relocations have a smaller footprint, theiradvantage is less pronounced since .relr.dyn still accountsfor a significant portion of the size.

    RELLEB proposal for thegeneric ABI

    In https://www.sco.com/developers/gabi/latest/ch4.sheader.html,make the following changes.

    In Figure 4-9: Section Types,sh_type, append a row

    SHT_RELLEB | 20

    Add text:

    SHT_RELLEB - The section holds compact relocation entries withexplicit addends. An object file may have multiple relocation sections.''Relocation'' below for details.

    In Figure 4-16: Special Sections, append a row

    .rellebname | SHT_RELLEB | see below

    Change the text below:

    .relname, .relaname, and .rellebname

    These sections hold relocation information, as described in''Relocation''. If the file has a loadable segment that includesrelocation, the sections' attributes will include the SHF_ALLOC bit;otherwise, that bit will be off. Conventionally, name is supplied by thesection to which the relocations apply. Thus a relocation section for.text normally would have the name .rel.text, .rela.text, or.relleb.text.

    In Figure 4-23: Relocation Entries, add:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    typedef struct {
    Elf32_Addr r_offset;
    Elf32_Word r_type;
    Elf32_Word r_symidx;
    Elf32_Sxword r_addend;
    } Elf32_Relleb;

    typedef struct {
    Elf64_Addr r_offset;
    Elf64_Word r_type;
    Elf64_Word r_symidx;
    Elf64_Sxword r_addend;
    } Elf64_Relleb;

    Add text above "A relocation section references two othersections":

    A SHT_RELLEB section holds compact relocation entriesthat decode to Elf32_Relr or Elf64_Relrdepending on the object file class (32-bit or 64-bit). Its contentbegins with a ULEB128-encoded relocation count, followed by entriesencoding r_offset, r_type,r_symidx, and r_addend. Note that ther_info member in traditional REL/RELA formats has beensplit into separate r_type and r_symidxmembers, allowing uint32_t relocation types for ELFCLASS32as well.

    • Delta offset: (ULEB128-encoded) The difference inr_offset relative to the previous entry. The difference istruncated to 32-bit/64-bit for ELFCLASS32/ELFCLASS64, respectively.
    • Delta type: (SLEB128-encoded) The difference in relocation typerelative to the previous entry.
    • Symbol table index and addend presence: (SLEB128-encoded) If ther_offset is equal to the previous r_addend,the encoded value represents the symbol index; otherwise, the bitwiseNOT of the encoded value indicates the symbol index.
    • Delta addend: (SLEB128-encoded, only if indicated in the previousmember) The difference in r_addend relative to the previousentry. The difference is truncated to 32-bit/64-bit forELFCLASS32/ELFCLASS64, respectively.

    For the first relocation entry, the previous offset, type, and addendmembers are treated as zero.

    In Figure 5-10: Dynamic Array Tags, d_tag, add:

    DT_RELLEB | 38 | d_ptr | optional | optional

    Add text below:

    • DT_RELLEB - This element is similar toDT_RELA, except its table uses the RELLEB format. If thiselement is present, the dynamic structure must also have aDT_RELLEBSZ element.


沪ICP备19023445号-2号
友情链接