When linking an oversized executable, it is possible to encountererrors such asrelocation truncated to fit: R_X86_64_PC32 against `.text'
(GNU ld) or relocation R_X86_64_PC32 out of range
(ld.lld).These diagnostics are a result of the relocation overflow check, afeature in the linker.
This article aims to explain why such issues can occur and providesinsights on how to mitigate them.
Certain groups prefer static linking or mostly static linking for thesake of deployment convenience and performance. In scenarios where thedistributed program contains a significant amount of code (related:software bloat), employing full or mostly static linking can result invery large executable files. Consequently, certain relocations may beclose to the distance limit, and even a minor disruption can triggerrelocation overflow linker errors.
In this section, we will deviate slightly from the main topic todiscuss static linking. Static linking involves copying dependenciesdirectly into the final executable and is often preferred in certainscenarios due to its advantages in deployment convenience andperformance.
By including all dependencies within the executable itself, it canrun without relying on external shared objects. This eliminates thepotential risks associated with updating dependencies separately.
Static linking also offers performance benefits in severalaspects:
O(|libs|^2*|libname|)
. The existing implementations aredesigned to handle tens of shared objects, rather than a thousand ormore.Furthermore, the current lack of techniques to partition anexecutable into a few larger shared objects, as opposed to numeroussmaller shared objects, exacerbates the overhead issue.
These performance benefits make static linking an attractive optionin certain scenarios, where the convenience and performance gainsoutweigh the potential drawbacks.
We will use the following C program to illustrate the concepts.1
2
3int var;
int callee();
int caller() { return callee() + var; }
The generated x86-64 assembly may appear as follows, with commentsindicating the relocation types associated with each instruction:1
2
3
4
5
6
7
8.globl caller
caller:
call callee@PLT # R_X86_64_PLT32
add eax, dword ptr [rip + var] # R_X86_64_PC32
.bss
.globl var
var: .long 0
Both R_X86_64_PLT32
and R_X86_64_PC32
havea value range of [-2**31,2**31)
. If the referenced symbolis too far away from the relocated location, we may get a relocationoverflow.
In practice, relocation overflows due to code referencing code arenot common. The more frequent occurrences of overflows involve thefollowing categories (where we use .text
to represent codesections, .rodata
for read-only data, and so on):
.text <-> .rodata
.text <-> .eh_frame
: .eh_frame
has32-bit offsets. 64-bit code offsets are possible, but I don't know if animplementation exists..text <-> .bss
.rodata <-> .bss
In many programs, .text <-> .data/.bss
relocationshave the stringent constraints. Overflows due to.text <-> .rodata
relocations are possible but rare(although I have encountered such issues in the past). Typically,.rodata
tends to be larger than the combined.data
and .bss
.
.rodata <-> .bss
overflows are generallyinfrequent. However, caution must be exercised when working withmetadata .quad label-.
instead of.long label-.
. Such issues can be easily addressed on thecompiler side.
The current section layout of ld.lld is as follows: 1
2
3
4.rodata
.text
.data
.bss
One notable distinction from GNU ld is that .rodata
precedes .text
. This ordering decreases the distancebetween .text
and .data
/.bss
,thereby alleviating relocation overflow pressure for references from.text
to .data
/.bss
.
For handling .eh_frame
, I suggest compiling with the -fno-asynchronous-unwind-tables
option. .eh_frame
is used by runtime support for C++exceptions. Many programs don't utilitize exceptions, making.eh_frame
non-mandatory. For profiling purposes, thelimitations of the .eh_frame
format have become apparent,and it is not the most suitable unwinding format.
The x86-64 psABI defines multiple code models. The small code modelis the one we are most familiar with. In the small code model, symbolsare required to be located within the range[0, 2**31 - 2**24)
.
The medium code model introduces large data sections such as.lrodata
, .ldata
, .lbss
, and usesmovabs
instructions to access them. GCC providess severalvariants for .ldata
, including .ldata.rel
,.ldata.rel.local
, .ldata.rel.ro
, and.ldata.rel.ro.local
.
Linkers are expected to recognize these large data sections and placethem in appropriate locations. GNU ld uses the following section layoutin its internal linker scripts: 1
2
3
4
5
6
7
8.text
.rodata # if -z separate-code, MAXPAGESIZE alignment
RELRO # DATA_SEGMENT_ALIGN
.data # DATA_SEGMENT_RELRO_END
.bss
.lbss
.lrodata # MAXPAGESIZE alignment
.ldata # MAXPAGESIZE alignment
GNU ld places .lbss
, .lrodata
,.ldata
after .bss
and inserts 2 MAXPAGESIZEalignments for .lrodata
and .ldata
.
.lbss
is placed immediately after .bss
,creating a single BSS, i.e. a read-write PT_LOAD
programheader with p_filesz<p_memsz
. The file image in thesegment is smaller than the memory image. When the dynamic loadercreates the memory image for the PT_LOAD
segment, it willset the byte range [p_filesz,p_memsz)
to zeros. However,there is a missing optimization that the [p_filesz,p_memsz)
portion occupies zero bytes in the object file, preventing overlapbetween .lrodata
and BSS in the file image.
For ld.lld, I am contemplating the following section layout:1
2
3
4
5
6
7
8.lrodata
.rodata
.text # if --ro-segment, MAXPAGESIZE alignment
RELRO # MAXPAGESIZE alignment
.data # MAXPAGESIZE alignment
.bss
.ldata # MAXPAGESIZE alignment
.lbss
GCC generates both regular and large data sections with-mcmodel=medium
. This is decided by a section sizethreshold (-mlarge-data-threshold
). In practice, programsalways mix object files built from small and medium/large code models(just think of prebuilt object files including libc). The large datasections built with -mcmodel=large
do not exert relocationpressure on sections in object files with-mcmodel=small
However, GCC only generates regular data sections with-mcmodel=large
. -mlarge-data-threshold
isignored. As a result, the data sections built with-mcmodel=large
may exert relocation pressure on sections inobject files with -mcmodel=small
.
I propose that we make -mcmodel=large
respect-mlarge-data-threshold
and generate large data sections aswell. I posted a GCC patch and Uros Bizjak askedme to discuss the issue with the x86-64 psABI group. So I created Large datasections for the large code model.
Likewise, the psABI defines multiple code models. The small codemodel allows for a maximum text segment size of 2GiB and a maximumcombined span of text and data segments of 4GiB. For smallposition-independent code (pic), there is an additional restriction onthe size of the Global Offset Table (GOT), which must be smaller than32KiB. The maximum combined span of text and data segments is largerthan that of x86-64.
Linked object file sizes for AArch64 and x86-64 are comparable, butAArch64 linked object files are more resistant to relocationoverflows.
For .text <-> .rodata
and.text <-> .bss
references, x86-64 usesR_X86_64_REX_GOTPCRELX
/R_X86_64_PC32
relocations, which have a smaller range [-2**31,2**31)
. Incontrast, AArch64 employs R_AARCH64_ADR_PREL_PG_HI21
relocations, which has a doubled range of [-2**32,2**32)
.This larger range makes it unlikely for AArch64 to encounter relocationoverflow issues before the binary becomes excessively oversized forx86-64.
1 | bl callee // R_AARCH64_CALL26 |
Note: if callee
is far away from caller
,the linker will generate a range extensionthunk.
GCC and Clang don't implement -mcmodel=large
for PIC.This makes sense as we haven't identified a use case yet.
There are several strategies to mitigate relocation overflowissues.
INSERT BEFORE
and INSERT AFTER
to reorder output sections.For large executables, it is possible to encounter DWARF32limitation. I will address this topic in another article.