IT博客汇 | A new relocation format for ELF

A new relocation format for ELF

MaskRay发表于 2024-03-09 04:16:28

UNDER CONSTRUCTION still working on experiments

ELF's design emphasizes natural size and alignment guidelines for itscontrol structures. This principle, outlined in Proceedings of theSummer 1990 USENIX Conference, ELF: An Object File to MitigateMischievous Misoneism, promotes ease of random access forstructures like program headers, section headers, and symbols.

All data structures that the object file format defines follow the"natural" size and alignment guidelines for the relevant class. Ifnecessary, data structures contain explicit padding to ensure 4-bytealignment for 4-byte objects, to force structure sizes to a multiple offour, etc. Data also have suitable alignment from the beginning of thefile. Thus, for example, a structure containing an E1£32_Addr memberwill be aligned on a 4-byte boundary within the file. Other classeswould have appropriately scaled definitions. To illustrate, the 64-bitclass would define E1£64 Addr as an 8-byte object, aligned on an 8-byteboundary. Following the strictest alignment for each object allows theformat to work on any machine in a class. That is, all ELF structures onall 32-bit machines have congruent templates. For portability, ELF usesneither bit-fields nor floating-point values, because theirrepresentations vary, even among pro- cessors with the same byte order.Of course the pro- grams in an ELF file may use these types, but theformat itself does not.

While beneficial for many control structures, the natural sizeguideline does have drawbacks. Relocations, which are typicallyprocessed sequentially, don't gain the same random-access advantages.The large 24-byte Elf64_Rela structure highlights this. Exploringobject file formats#Relocations compares relocations from differentobject file formats.

Furthermore, Elf32_Rel and Elf32_Relasacrifice flexibility to maintain a smaller size, limiting relocationtypes to a maximum of 255. This constraint has become noticeable forAArch32 and RISC-V. The 24-bit symbol index field is also less elegant,but real-world use cases haven't run into a problem due to thisbound.

In contrast, the WebAssemblyobject file format uses LEB128 encoding for relocations and otherconstrol structures. This approach offers a significant size advantageover ELF.

I will explore some real-world scenarios where relocation size playsa significant role and propose an alternative format inspired byWebAssembly.

Use cases

Dynamic relocations

A substantial part of position-independent executables (PIEs) anddynamic shared objects (DSOs) is occupied by dynamic relocations. WhileRELR (acompact relative relocation format) offers size-saving benefits forrelative relocations, other dynamic relocations can benefit from acompact relocation format.

ld.lld --pack-dyn-relocs=android was an earlier designthat applies to all dynamic relocations at the cost of complexity.

Additionally, Apple linkers and dyld use LEB128 encoding for bindopcodes.

`.llvm_addrsig`

On many Linux targets, Clang emits a special section called.llvm_addrsig (type SHT_LLVM_ADDRSIG, LLVMaddress-significance table) by default to allowld.lld --icf=safe. The .llvm_addrsig sectionstores symbol indexes in ULEB128 format, independent of relocations.Consequently, tools like ld -r and objcopy risk invalidatethe section due to symbol table modifications.

Ideally, using relocations would allow certain operations. However,the size concern of REL/RELA in ELF hinders this approach. In contrast,lld's Mach-O port chosea relocation-based representation for__DATA,__llvm_addrsig.

.llvm.call-graph-profile

LLVM leverages a special section called.llvm.call-graph-profile (typeSHT_LLVM_CALL_GRAPH_PROFILE) for both instrumentation- andsample-based profile-guided optimization (PGO). lld utilizesthis information ((from_symbol, to_symbol, weight) tuples) tooptimize section ordering within an input section description, enhancingcache utilization and minimizing TLB thrashing.

Similar to .llvm_addrsig, the.llvm.call-graph-profile section initially faced the symbolindex invalidation problem, which was solved by switching torelocations. I opted for REL over RELA to reduce code size.

`.debug_names`

DWARF v5 accelerated name-based access with the introduction of the.debug_names section. However, in aclang -g -gpubnames generated relocatable file, the.rela.debug_names section can consume a significant portion(approximately 10%) of the file size. This size increase has sparkeddiscussions within the LLVM community about potentially alteringthe file format for linking purposes.

The availability of a more compact relocation format would likelyalleviate the need for such format changes.

Compressed relocations

While the standard SHF_COMPRESSED feature is commonlyused for debug sections, its application can easily extend to relocationsections. I have developed a Clang/lld prototype that demonstrates thisby compressing SHT_RELA sections.

The compressed SHT_RELA section occupiessizeof(Elf64_Chdr) + size(compressed) bytes. Theimplementation retains uncompressed content if compression would resultin a larger size.

In scenarios with numerous smaller relocation sections (such as whenusing -ffunction-sections -fdata-sections), the 24-byteElf64_Chdr header can introduce significant overhead. Thisobservation raises the question of whether encodingElf64_Chdr fields using ULEB128 could further optimize filesizes. With larger monolithic sections (.text,.data, .eh_frame), compression ratio would behigher as well.

# configure-llvm is my wrapper of cmake that specifies some useful options.
configure-llvm s2-custom0 -DLLVM_TARGETS_TO_BUILD=host -DLLVM_ENABLE_PROJECTS='clang;lld'
configure-llvm s2-custom1 -DLLVM_TARGETS_TO_BUILD=host -DLLVM_ENABLE_PROJECTS='clang;lld' -DCMAKE_{C,CXX}_FLAGS=-Xclang=--compress-relocations=zstd
ninja -C /tmp/out/s2-custom0 lld
ninja -C /tmp/out/s2-custom1 lld

ruby -e 'p Dir.glob("/tmp/out/s2-custom0/**/*.o").sum{|f| File.size(f)}'  # 137715840
ruby -e 'p Dir.glob("/tmp/out/s2-custom1/**/*.o").sum{|f| File.size(f)}'  # 117747320

Despite the overhead of-ffunction-sections -fdata-sections, the compressiontechnique yields a significant reduction of 14.5%!

RELLEB

The 1990 ELF paper ELF: An Object File to Mitigate MischievousMisoneism says "ELF allows extension and redefinition for othercontrol structures." Let's explore a new relocation format similar toWebAssembly's.

A SHT_RELLEB section begins with the number ofrelocations encoded in ULEB128. A sequence of relocation entries followthe header.

typedef struct {
  Elf32_Addr r_offset; // the difference relative to the previous offset is encoded in ULEB128
  Elf32_Word r_type;   // the difference relative to the previous type is encoded in SLEB128
  Elf32_Word r_symidx; // encoded in ULEB128
  Elf32_Sxword r_addend; // encoded in SLEB128
} Elf32_Relleb;

typedef struct {
  Elf64_Addr r_offset; // the difference relative to the previous offset is encoded in ULEB128
  Elf64_Word r_type;   // the difference relative to the previous type is encoded in SLEB128
  Elf64_Word r_symidx; // encoded in ULEB128
  Elf64_Sxword r_addend; // encoded in SLEB128
} Elf64_Relleb;

Here's the core concept:

SLEB128 and delta encoding for r_offset:

Since section offsets can be large and relocations are typicallyordered, storing the difference between consecutive relocation offsetsoffers potential for compression. In most cases, a single byte shouldsuffice for encoding this offset delta. There are exceptions. Forexample, the general dynamic TLS model of s390/s390x uses a local"out-of-order" pair:R_390_PLT32DBL(offset=o) R_390_TLS_GDCALL(offset=o-2). Thisis not common and can be fixed by a redesign to TLSDESC. I think itmakes sense to optimize for the most common cases and use ULEB128instead of SLEB128.

SLEB128 and delta encoding for r_type:

While many psABIs define all relocation types smaller than 128(encodable in a single byte using ULEB128), AArch64 utilizes largervalues. Its static relocation types begin at 257, necessitating twobytes with ULEB128/SLEB128 encoding. Delta encoding offers an advantagein this scenario, allowing all but the first relocation's type to beencoded in a single byte. For x86-64, switching to ULEB128 for typeencoding would result in minimal size reductions (~0.0003%).

An alternative design is to define a base type code in the header andlet relocation entries' type relative to the base, which wouldintroduces slight complexity.

ULEB128 for r_symidx:

Using SLEB128 and delta encoding instead of ULEB128 for the symbolindex field would increase the total size by 0.4%. While potentiallybeneficial for dynamic relocations with consecutive indexes, this gainmight be negligible in practice. So I decide to stick with ULEB128.

configure-llvm s2-custom2 -DLLVM_TARGETS_TO_BUILD=host -DLLVM_ENABLE_PROJECTS='clang;lld' -DCMAKE_{C,CXX}_FLAGS=-mllvm=-small-relocs
# with a hack to use lld

ruby -e 'p Dir.glob("/tmp/out/s2-custom2/**/*.o").sum{|f| File.size(f)}'  # 115480056