GNU ld's output section layout is determined by a linker script,which can be either internal (default) or external (specified with-T
or -dT
). Within the linker script,SECTIONS
commands define how input sections are mapped intooutput sections.
Input sections not explicitly placed by SECTIONS
commands are termed "orphansections".
Orphan sections are sections present in the input files which are notexplicitly placed into the output file by the linker script. The linkerwill still copy these sections into the output file by either finding,or creating a suitable output section in which to place the orphanedinput section.
GNU ld's default behavior is to create output sections to hold theseorphan sections and insert these output sections into appropriateplaces.
Orphan section placement is crucial because GNU ld's built-in linkerscripts, while understanding common sections like.text
/.rodata
/.data
, are unawareof custom sections. These custom sections should still be included inthe final output file.
PT_LOAD
segments needed.GNU ld's orphan section placement algorithm is primarily specifiedwithin ld/ldlang.c:lang_place_orphans
andld/ldelf.c:ldelf_place_orphan
.lang_place_orphans
is a linker pass that is betweenINSERT
processing and SHF_MERGE
sectionmerging.
The algorithm utilizes a structure (orphan_save
) toassociate desired BFD flags (e.g., SEC_ALLOC, SEC_LOAD
)with specific section names (e.g., .text, .rodata
) and areference to the last associated output section.
For each output section that holds orphan sections:
orphan_save
elementbased on the section's flags.orphan_save
element, the orphan section is placed after it.The associated output section is initialized to the specific sectionnames (e.g., .text, .rodata
), if present.For example, custom code section mytext
(withSHF_ALLOC | SHF_EXECINSTR
) would typically be placed after.text
, and custom data section mydata (withSHF_ALLOC | SHF_WRITE
) after .data
.
1 | static struct orphan_save hold[] = { |
For each orphan section, GNU ld maps it to the output section andfinds the orphan_save
element with matching flags. If theassociated output section is not null, the orphan section will be placedafter that output section. Otherwise, the orphan section will be after asimilar section using a few heuristics, e.g.
Noteworthy details:
.interp
and .rodata
have the same BFDflags, but they are anchors for different sections.SHT_NOTE
sections go after .interp
, whileother read-only sections go after .rodata
.
The LLVM linker lld implements a large subset of the GNU ld linkerscript. However, due to the lack of an official specification and thecomplexity of GNU ld, there can be subtle differences in behavior.
While lld strives to provide a similar linker script behavior, itoccasionally makes informed decisions to deviate where deemedbeneficial. We balance compatibility with practicality andinterpretability.
Users should be aware of these potential discrepancies whentransitioning from GNU ld to lld, especially when dealing with intricatelinker script features.
Rank-based sorting
lld assigns a rank to each output section, calculated using variousflags like RF_NOT_ALLOC, RF_EXEC, RF_RODATA
, etc. Orphanoutput sections are then sorted by these ranks.
1 | enum RankFlags { |
Finding the most similar section
For each orphan section, lld identifies the output section with themost similar rank. The similarity is determined by counting the numberof leading zeros in the XOR of the two ranks.
1 | // We want to find how similar two ranks are. |
Placement decision
The orphan section is placed either before or after the most similarsection, based on a complex rule involving:
In essence:
PHDRS
/MEMORY
commands exist, it's placed before the similar section.1 | auto isOutputSecWithInputSections = [](SectionCommand *cmd) { |
Special case: last section
If the orphan section happens to be the last one, it's placed at thevery end of the output, mimicking GNU ld's behavior for cases where thelinker script fully specifies the beginning but not the end of thefile.
Special case: skipping symbol assignments
It is common to surround an output section description withencapsulation symbols. lld has a special case to not place orphansbetween foo
and a following symbol assignment.
Backward scan example:
1 | begin_previous = .; |
Forward scan example:
1 | similar0 : { *(similar0) } |
However, an assignment to the location counter can serve as a barrierto stop the forward scan.
1 | begin_previous = .; |
Special case: initial location counter
In addition, if there is a location counter assignment before thefirst output section, orphan sections cannot be inserted before theinitial location counter assignment. This is to recognize the commonpattern that the initial location counter assignments specifies the loadaddress.
1 | sym0 = .; |
Analysis
By employing this rank-based approach, lld provides an elegantimplementation that does not hard code specific section names (e.g.,.text
/.rodata
/.data
). In GNU ld,if you rename special section names.text
/.rodata
/.data
in the linkerscript, the output could become subtle different.
To maximize portability of linker scripts across different linkers,it's essential to establish clear boundaries for PT_LOAD segments. Thiscan be achieved by:
MAXPAGESIZE
alignment todistinctly separate sections within the linker script.PT_LOAD
segment includes at least one input section,preventing ambiguous placement decisions by the linker.By adhering to these guidelines, you can reduce reliance onlinker-specific orphan section placement algorithms, promotingconsistency across GNU ld and lld.
For projects that require absolute control over section placement,GNU ld version 2.26 and later provides--orphan-handling=[place|warn|error|discard]
. This allowsyou to choose how orphan sections are handled: