My journey with the LLVM project began with a deep dive into theworld of lld and binary utilities. Countless hours were spent unravelingthe intricacies of object file formats and shaping LLVM's relevantcomponents. Though my interests have since broadened, object fileformats remain a personal fascination, often drawing me into discussionsaround potential changes within LLVM.
This article compares several prominent object file formats, drawingupon my experience and insights.
At the heart of each format lies the representation of essentialcomponents like symbols, sections, and relocations. For each controlstructure, We'll begin with ELF, a widely used format, before venturinginto the landscapes of other notable formats.
The appendix of All aboutProcedure Linkage Table contains some history of these object fileformats. I have move some the notes here with some additions.
The a.out format was designed for PDP-11 (1970). The quantities were16-bit, but can be naturally extended to 32-bit or 64-bit.
In Proceedings of the Summer 1990 USENIX Conference, ELF: AnObject File to Mitigate Mischievous Misoneism by James Q. Arnoldprovided some description.
For 32-bit machines, the a.out format was extended in several ways.Most obviously, 16-bit quantities were enlarged to 32-bit values. Thesymbol table changed to allow names of unlimited length. Relocationentries also changed significantly. Larger programs and differentrelocation conventions made it necessary to associate a relocation entrywith an explicit address, instead of relying on the implicitcorrespondence between program sections and relocation records.
Many UNIX and UNIX-like operating systems, including SunOS, HP-UX,BSD, and Linux, used a.out before switching to ELF.
The most noticeable extension is dynamic shared library support.(This feature is distinct from static shared library, where each sharedlibrary needs a fixed address in the address space.) There are twoflavors:
A linker supporting shared libraries.
) and https://github.com/NetBSD/src/commit/3d68d0acaed0a32f929b2c174146c62940005a18(A linker supporting shared libraries (run-time part).
)added shared library support similar to the SunOS scheme.FreeBSDa.out(5) provides a nice description.
If you follow recent years' Linux kernel news, there were somediscussions when Linux eventually removeda.out support in 2022.
a.out supports three fixed loadable sections TEXT, DATA, and BSS,which is too restrictive. COFF introduces custom section support andallows up to 32767 sections. The ELF paper contains some remarks:
Common Object File Format (COFF), was designed primarily to supportelectronic switching systems (the telephone network). Its distinguishingfeatures were multiple sections (text, data, uninitialized memory,reserved memory, overlays, etc.), some support for multiple targetprocessors, defined structures for symbol tables and relocations, anddebugging information tailored for the C language.
According to scnhdr.h
in System V Release 2 for NS32xxx,COFF was designed no later than 1982. Then, System V Release 3 adoptedCOFF, which motivated a lot of follow-ups.
Prominent drawbacks:
Carnegie Mellon University developed the Mach kernel as a proof ofthe microkernel concept. The operating system used a format derived froma.out, named the Mach object file format. The abbreviation, Mach-O, isoften used instead. The NeXTSTEP operating system and then Darwinadopted Mach-O.
Dynamic shared library support on Mach-O came later than other objectfile formats. In a NeXTSTEP manual released in 1995, I can findMH_FVMLIB
(fixed virtual memory library, which appears tobe a static shared library scheme) but not MH_DYLIB
(usedby modern macOS for .dylib files).
Frustrations and inherent constraints of COFF, coupled with aself-imposed byte order dilemma, AT&T introduced a groundbreakingformat: Executable and Linking Format (ELF). ELF revisited fixed contentand hard-wired concepts in previous object file formats, removedunnecessary elements, and made control structures more flexible.
This pivotal shift was embraced by System V Release 4, marking a newera in object file format design. In the 1990s, many UNIX and UNIX-likeoperating systems, including Solaris, IRIX, HP-UX, Linux, and FreeBSD,switched to ELF.
The minimum of a symbol control structure needs to encode the name,section, and value. We can require that every symbol is defined inrelation to some section. We can use a section index of zero torepresent an undefined symbol.
In a minimum object file format with only few hard-coded sections(a.out), the section field can be omitted. A type field can be used todecide whether the symbol can reference a function or a data object.
1 | // ELFCLASS32, 16 bytes |
The symbol name is represented as a 32-bit index into the stringtable. A 32-bit integer suffices, while a 16-bit integer would be toosmall.
st_shndx
uses a size-saving trick. The 16-bit memberencodes a section index. If the member is SHN_XINDEX
(0xffff), then the actual value is contained in the associated sectionof type SHT_SYMTAB_SHNDX
. This is a very nice trick becausethe number of sections are almost always smaller than 0xff00. Inpathologic cases, there can be more sections, where a section of typeSHT_SYMTAB_SHNDX
is needed.
st_info
specifies the symbol's type (4 bits) and binding(4 bits) attributes. Types are allocated very conservatively and usuallyimply different linker behaviors. The inherently different linkerbehaviors for symbol types are not that many. So 4 bits seem small, theyare sufficient in practice. As we will learn, this is significantlysmaller than COFF's type and storage class representation. A symbol'sbinding is for the local/weak/global distinction. The reserved 4 bitscan accommodate more values, but only GNU reserves one value(STB_GNU_UNIQUE
) (a misfeature in my opinion).
In COFF, function symbols can use an auxiliary symbol record toencode the size of function (x_fsize
;TotalSize
in PE). In ELF, st_size
is a fixedmember, used for copy relocations and symbolizers. If we eliminate copyrelocations and don't need the symbolization heuristics, this field willbecome garbage.
Here is a demonstration if we remove st_size
.1
2
3
4
5
6
7
8// 16 bytes
struct Elf64_Sym_minimized {
Elf64_Word st_name; // index into the string table
unsigned char st_info; // type and binding
unsigned char st_other; // visibility and others
Elf64_Half st_shndx; // section index
Elf64_Addr st_value;
} Elf64_Sym;
1 | // a.out (System V), 16 bytes |
a.out uses a nlist
to represent a symbol table entry. Inthe original format, the name cannot contain more than 8 bytes. Unix'sappreciation of shorter identifier names is related to this:)
To support longer names, extensions allow n_name
to beinterpreted as an index (n_strx
) into the string table.This member then becomes a size-saving trick by inlining a short name (8bytes or less) into the structure. Some variants, like binutils' 64-bita.out format, use an index exclusively and removedn_name
.
n_type
, broken down into three sub-fields, describeswhether a symbol is defined or undefined, external or local, and thesymbol type. The values listed on the FreeBSD manpage are also used onPDP-11. Later System V releases extended the list with a lot of valuesfor stabs (a debugging format). Some values seem specific toFORTRAN.
For a defined symbol, n_type
describes whether it isrelative to TEXT, DATA, or BSS.
1 | // COFF (System V Release 3), 18 bytes in the absence padding |
COFF adopts a.out's approach to save space in symbol names. Thislikely made sense when most symbols were shorter. However, with today'soften lengthy symbol names, this inlining technique complicates code andincreases the control structure size (from 4 to 8 bytes).
The section number is a 16-bit signed integer, supporting up to32,767 sections. Positive values indicate a section index, while specialvalues include:
N_UNDEF
(0): Undefined symbol (distinct from a.out'sn_type representation).N_ABS
(-1): Symbol has an absolute value.N_DEBUG
(-2): Special debugging symbol (value ismeaningless).COFF's n_type
and n_sclass
encode C' typeand storage class information. PE assigns longer names to these typesand storage classes longer names, e.g.,IMAGE_SYM_TYPE_CHAR/IMAGE_SYM_TYPE_SHORT
,IMAGE_SYM_CLASS_AUTOMATIC/IMAGE_SYM_CLASS_EXTERNAL
. Whilevalues are mostly consistent, minor differences exist:
IMAGE_SYM_TYPE_VOID
(1) is different from System VRelease 3's#define T_ARG 1 /* function argument (only used by compiler) */
.IMAGE_SYM_CLASS_WEAK_EXTERNAL
(105) is differentfrom System V Release 3's#define C_ALIAS 105 /* duplicate tag */
.Symbols with C_EXT
(IMAGE_SYM_CLASS_EXTERNAL
) are global and added to thelinker's global symbol table, akin to ELF's STB_GLOBAL
symbol binding.
If we discard the influence of stabs, n_type
andn_class
just provide a wasteful counterpart to ELF'sst_info
.
n_numaux
relates to Auxiliary Symbol Records, allowingextra information but introducing non-uniform symbol table entries.While seemingly beneficial, their use cases are limited and could oftenbe encoded using separate sections. In PE, an auxiliary symbol recordcan represent weak definitions, but weak references are not supported.They can also provide extra information to section symbols.
ECOFF defines Local Symbol Entry (SYMR) and External Symbol Entry(EXTR).
1 | typedef struct { |
1 | // Mach-O, 16 bytes |
Mach-O's nlist_64
is not that different from a.out's,with n_other
changed to n_sect
to indicate thesection index. The 8-bit n_sect field restricts representable sectionsto 255 without out-of-band data (discussed later). If we extendn_sect
to 32-bit, with alignment padding the structure sizewill increase to 24 bytes, the same as Elf64_Sym
.
Like a.out, the N_EXT
bit of n_type
indicates an external symbol. The N_PEXT
bit indicates aprivate external symbol.
Key bits in n_desc
are N_WEAK_DEF
,N_WEAK_REF
, and N_ALT_ENTRY
.
1 | // ELF, 40 bytes |
The section name is represented as a 32-bit index into the stringtable. If we use a 16-bit integer, a large number of section names witha symbol suffix (e.g. .text.foo
.text.bar
)could make the index overflow.
sh_type
categorizes the section's contents andsemantics. It avoids hard-coding magic names in many scenarios.Technically a 16-bit type could work pretty well but was deemedinsufficient for flexibility.
sh_flags
describe miscellaneous attributes, e.g.writable and executable permissions, and whether the section shouldappear in a loadable segment. This member is 32-bit inElf32_Shdr
while 64-bit in Elf64_Shdr
. Inpractice no architecture defines flags for bits 32 to 63, therefore thismember is somewhat wasteful.
Location and size. sh_offset
gives the byte offset fromthe beginning of the file to the first byte in the section. To supportobject files larger than 4GiB, this member has to be 64-bit.sh_size
gives the section's size in bytes. A section typeof SHT_NOBITS
occupies no space in the file. To supportsections larger than 4GiB, this member has to be 64-bit.
Address and alignment. sh_addr
describes the address atwhich the section's first byte should reside for an executable or sharedobject. It should be zero for relocatable files.sh_addralign
holds the address alignment. In practice thismember must be a power of 2 even if the generic ABI does not require so.This member is 64-bit in ELF64, which allows an alignment up to2**63
. In practice, an alignment larger than the page size(or the largest huge page size, if huge pages are enabled) does not makesense, and a maxiumm value of 2**31 is sufficient. Therefore, we coulduse a log2 value to hold the alignment.
Connection information. sh_link
holds a section index.sh_info
holds either a section index or a symbol index. Ifyou recall that st_shndx
is 16 bits for very solid reason,you will know that the two fields are somewhat wasteful.
For a table of fixed-size entries, sh_entsize
holds theentry size in bytes. In some use cases this member is not a power oftwo. In practice, one byte suffices.
While ELF's section header structure is designed for flexibility,potential optimizations could reduce its size without significant lossof functionality. By using smaller data types for sh_flags
,sh_link
, sh_info
, and sh_entsize
based on practical needs, we could make the structure significantlysmaller. 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27// 32 bytes
struct Elf32_Shdr_minimized {
Elf32_Wordsh_name;
Elf32_Wordsh_type; // Making this uint16_t and reordering it can decrease the size to 28 bytes
Elf32_Wordsh_flags;
Elf32_Addrsh_addr;
Elf32_Offsh_offset;
Elf32_Wordsh_size;
uint8_tsh_addralign;
uint8_tsh_entsize;
Elf32_Halfsh_link;
Elf32_Halfsh_info;
};
// 40 bytes
struct Elf64_Shdr_minimized {
Elf64_Word sh_name;
Elf64_Word sh_flags;
Elf64_Addr sh_addr;
Elf64_Off sh_offset;
Elf64_Xword sh_size;
Elf64_Half sh_type;
uint8_t sh_addralign;
uint8_t sh_entsize;
Elf64_Half sh_link;
Elf64_Half sh_info;
};
Reducing sh_type
into 2 bytes loses flexibility a bit.If this deems insufficient, we could take 3 bits fromsh_addralign
(by turning it into a bitfield) and give themto sh_type
.
1 | // COFF (System V Release 3), 40 bytes, when sizeof(long) == 4 |
PE's section control structure demonstrates a minor modificationcompared to COFF, s_paddr => VirtualSize
.
The presented structure measures as 40 bytes when long
is 4 bytes. If we extends_paddr, s_vaddr, s_size, s_scnptr, s_relptr, s_lnnoptr
to8 bytes, the structure will be of 64 bytes.
The section name supports up to 8 bytes. A longer name would requirean extension similar to the symbol control structure.
Encoding both s_paddr
and s_vaddr
iswasteful. ELF encodes the physical address in the segment and thereforeremoves the member from its section structure.
COFF embeds the location and size of relocations into the sectionstructure. This is actually pretty nice. A 16-bit s_nreloc
may appear restritive but is sufficient for relocatable files. Inpractice, the number of relocations can exceed 65536 for a singlesection using relocatable linking.
Line number entries are used to relate addresses to source file linenumbers. s_lnnoptr
and s_nlnno
are forobsoleted line number information. They are embedded in the sectionstructure, which is very inflexible.
1 | // Mach-O, 80 bytes |
A Mach-O binary is divided into segments, each housing one or moresections. The section structure encodes the section name and the segmentname, both can be up to 16 bytes. This representation allows the sectionnames to be read without a string table, but restrictive for descriptivenames. Section semantics are derived from the name (unlike ELF).
The segment name is redundantly encoded within the section structure.We could derive the segment from the section name and flags, e.g.,S_ATTR_SOME_INSTRUCTIONS => __TEXT
,S_ZEROFILL => ZeroFill __DATA
.
There is a severe limitation: maximum of 255 sections due tonlist::n_sect
being a uint8_t
. This isapparently too restrictive. Thankfully, an innovative feature.subsections_via_symbols
overcomes the limitation. Thefeature uses a monolithic section with "atoms" dividing it into pieces(subsections). This is more size-efficient than ELF's-ffunction-sections -fdata-sections
. However, there areassembler limitations, relocation processing complexity, and potentialloss of ability to ensure that two non-local symbols are notreordered.
Like COFF, Mach-O embeds the location and size of relocations intothe section structure.
The three trailing reserved members are a bad idea. They increase thesize considerably. A better approach would be to just change the versionnumber when the structure needs to grow. It's unlikely that an olderconsumer can interepret a new section with new members set for newsemantics.
1 | // ELFCLASS32, 8 bytes |
r_info
specifies the symbol table index with respect towhich the relocation must be made, and the type of relocation toapply.
There are two variants, REL and RELA. Let's quote the genericABI:
As specified previously, only Elf32_Rela and Elf64_Rela entriescontain an explicit addend. Entries of type Elf32_Rel and Elf64_Relstore an implicit addend in the location to be modified. Depending onthe processor architecture, one form or the other might be necessary ormore convenient. Consequently, an implementation for a particularmachine may use one form exclusively or either form depending oncontext.
Relocatable files need a lot of relocatable types while executablesand shared objects need only a few. The former is often called staticrelocations while the latter is called dynamic relocations.
Of the few dynamic relocation types, most do not need the addendmember. lld provides an option -z rel
to useSHT_REL/DT_REL
dynamic relocations.
If we disregard the REL dynamic relocation scenario, then all modernarchitectures use RELA exclusively. Most architectures encode theimmediate with only few bits, which are inadequate for many relocatablefile uses.
ELFCLASS64, with its 64-bit members, doubles the size compared toELFCLASS32's 32-bit members. Since relocations often comprise asubstantial portion of object files, this size difference can lead touser concerns. However, in practice, a 24-bit symbol index is oftensufficient, even in 64-bit contexts. Therefore, if a 64-bitarchitecture's relocation type requirements are less than 256,ELFCLASS32 can be a viable and more size-efficient option.
1 | // a.out (System V Release 2), 8 bytes |
r_symbolnum
mirrors ELF's ELF32_R_SYM
.
The other bitfields, resembling ELF's ELF32_R_TYPE
, butsplit into distinct fields:
r_pcrel
r_length
Reserving dedicated semantics for individual bits can limitadaptability. COFF and ELF opted to remove bitfields in favor of a typeto provide greater flexibility.
1 | // COFF, 10 bytes on disk, 12 bytes with alignment padding |
This format resembles ELF's Elf32_Rel
.
r_vaddr
gives the virtual address of the location atwhich to apply the relocation action. If we interpretr_vaddr
as an offset (as PE does) and restrict section sizeto 32 bits, we could reuse this structure for 64-bit architectures.
r_symndx
is a 32-bit symbol table index.
r_type
is a 16-bit relocation type, limited in numbercompared to ELF.
COFF generally supports fewer relocation types than ELF. System VRelease 3 defines very few relocations for each architecture. Inbinutils, include/coff/*.h
files define relocations formore architectures.
1 | // Mach-O, 16 bytes |
Mach-O's relocation structure closely mirrors a.out's with adaptedr_symbolnum
meaning. When r_extern == 0
(local), the r_symbolnum
member references a section indexinstead of a symbol index. This is to support custom sections, breakingthe three-section limitation (text, data, and bss) of traditionala.out.
As aforementioned, dedicating bits to bitfields(r_pcrel
, r_length
, andr_scattered
greatly restricted the number of relocationtypes.
Related to the relocation type limitation, a.long foo - .
in a data section requires a pair ofrelocations, SUBTRACTOR
and/UNSIGNED
. I havesome notes on Port LLVM XRayto Apple systems.
TODO
ELFCLASS32 structures are already compact, offering limited sizereduction potential. ELFCLASS64 structures, while flexible, can beoptimized by sacrificing some flexibility (64-bit quantities). The64-bit symbol control structure is compact, but section and relocation'sare quite wasteful if we can sacrifice some flexibility.
As the ELF paper acknowledges, "Relocatable and executable files donot necessarily have the same constraints, and we considered using twofile formats. Eventually, we decided the two activities were similarenough that a single format would suffice." There are more toolsinspecting executables than relocatable files. So, naturally, we mightwant to change just relocatable files. Can we use ELFCLASS32 relocatablefiles for 64-bit architectures?
Well, x86-64 and AArch64 make a clear distinct of ELFCLASS32 andELFCLASS64. ELFCLASS32 is for ILP32 (x32, aarch64_ilp32) whileELFCLASS64 is for LP64. However, the discontinued Itanium architecturesets a precedent that ELFCLASS32 can be used for LP64 programs. Quotingits psABI (Intel Itanium Processorspecific Application BinaryInterface (ABI)).
For Itanium architecture ILP32 relocatable (i.e. of type ET_REL)objects, the file class value in e_ident[EI_CLASS] must be ELFCLASS32.For LP64 relocatable objects, the file class value may be eitherELFCLASS32 or ELFCLASS64, and a conforming linker must be able toprocess either or both classes. ET_EXEC or ET_DYN object file types mustuse ELFCLASS32 for ILP32 and ELFCLASS64 for LP64 programs.
Addresses appearing in ELFCLASS32 relocatable objects for LP64programs are implicitly extended to 64 bits by zero-extending.
Note: Some constructs legal in LP64 programs, e.g. absolute 64-bitaddresses outside the 32-bit range, may require use of an ELFCLASS64relocatable object file.
Given the prior art, it seems promising to allow ELFCLASS32 when thecode size concerns people. Ideally there should be a marker todistinguish ILP32 and LP64-using-ELFCLASS32 object files.
The primary changes reside in the assembler and linker. It's alsoimportant to ensure that binary manipulation programs (like objcopy) anddump tools are happy with them.
Further optimization potential lies in exploring the use ofElf32_Rel
instead of Elf32_Rela
for evensmaller relocations.
This approach is independent of whether ELFCLASS32 is adopted and canbe applied to both ELFCLASS32 and ELFCLASS64. The ELF paper is clear,"ELF allows extension and redefinition for other control structures."However, caution is warranted due to the significant impact on theecosystem as many tools rely on the existing structures.
One promising example is Elf32_Shdr_minimized
, a customstructure reduced to 32 bytes from the standardElf32_Shdr
's 40 bytes. While I would be nervous, but if wereduce sh_type
to a uint16_t
, the structuresize can reduce to 28 bytes.
WebAssembly