UNDER CONSTRUCTION
This article describes some ABI and toolchain notes about systemswithout a Memory Management Unit (MMU), with a focus on FDPIC and thefuture RISC-V FDPIC ABI.
fs/Kconfig.binfmt
defines a few loaders.
BINFMT_ELF
defaults to y and depends onMMU
.BINFMT_ELF_FDPIC
defaults to y whenBINFMT_ELF
is not selected. A few architecture supportBINFMT_ELF_FDPIC
for NOMMU. ARM supports FDPIC even with aMMU.BINFMT_FLAT
is provided for a few architectures.BINFMT_AOUT
, removed in 2022, had been supported foralpha/arm/x86-32.
uClinux uses an object file format called Binary Flat format (BFLT).https://myembeddeddev.blogspot.com/2010/02/uclinux-flat-file-format.htmlhas an introduction.
The Linux kernel has supported the format before the git era(fs/binfmt_flat.c
). Both version 2(OLD_FLAT_VERSION
in the kernel) and version 4 aresupported. Shared library support was removedin April 2022.
BFLT is not for relocatable files. An image file is typicallyconverted from ELF using elf2flt.ld-elf2flt
is a ld wrapper that invokeself2flt
when the option -elf2flt
is seen.
GCC's m68k port supports -msep-data
since 2005, aspecial -fPIC
mode, which assumes that text and datasegments are placed in different areas of memory. This option is usedfor XIP (eXecute In Place). (gcc/config/bfin added-msep-data
in 2006.)
-mno-pic-data-is-text-relative
gcc/config/arm
added this option in 2013. This seemssimilar to -msep-data
, but I'll need to do moreresearch.
gcc/config/s390
ported this option in 2017. kpatch (livekernel patching) uses this option fors390x.
The Linux kernel has supported the format before the git era(fs/binfmt_elf_fdpic.c
). Only ET_DYN
executables are supported. Each port that supports FDPIC defines anEI_OSABI
value to be checked by the loader.
Several architectures define a FDPIC ABI.
Here is a summary.
The read-only sections, which can be shared, are commonly referred toas the "text segment", whereas the writable sections are non-shared andcommonly referred to as the "data segment". Functions and certain datasymbols (.rodata
) reside in the text segment, while otherdata symbols and the GOT reside in the data segment. Special entriescalled "canonical function descriptors" also reside in the GOT.
A call-clobbered register is reserved as the FDPIC register, used toaccess the data segment. Upon entry to a function, the FDPIC registerholds the address of _GLOBAL_OFFSET_TABLE_
. The textsegment can be referenced using PC-relative addressing. We will seelater that sometimes it's unknown whether a non-preemptible symbolresides in the text segment or the data segment, in which caseGOT-indirect addressing with the FDPIC register has to be used.
A function call is called external if the destination may reside inanother module, which has a different data segment and therefore needs adifferent FDPIC register value. Therefore, an external function callneeds to update the FDPIC register as well as changing the programcounter (PC). The FDPIC register can be spilled into a stack slot or acall-saved register, if the caller needs to reference the data segmentlater.
Calling a function pointer, including calling a PLT entry, also setsboth the FDPIC register and PC. When the address of a function is taken,the address of its canonical function descriptor is obtained, not thatof the entry point. The descriptor, resides in the GOT, containspointers to both the function's entry point and its FDPIC registervalue. The two GOT entries are relocated by a dynamic relocation of typeR_*_FUNCDESC_VALUE
(e.g. R_FRV_FUNCDESC_VALUE
).
If the symbol is preemptible, the code sequence loads a GOT entry.When the symbol is a function, the GOT entry is relocated by a dynamicrelocation R_*_FUNCDESC
and will contain the address of thefunction descriptor address.
Let's checkout examples taking addresses of functions and variables.1
2
3
4
5
6
7
8
9
10
11__attribute__((visibility("hidden"))) void hidden_fun();
void fun();
__attribute__((visibility("hidden"))) extern int hidden_var;
extern int var;
__attribute__((visibility("hidden"))) const int ro_hidden_var = 42;
void *get_hidden_fun() { return hidden_fun; }
void *get_fun() { return fun; }
void *get_hidden_var() { return &hidden_var; }
void *get_var() { return &var; }
const int *get_ro_hidden_var() { return &ro_hidden_var; }
Canonical function descriptors are stored in the GOT, and theiraccess depends on whether the referenced function is preemptible ornot.
R_*_FUNCDESC
dynamic relocation, holds thefinal address of the function descriptor.1 | // arm-linux-gnueabihf-gcc -c -fpic -mfdpic -Wa,--fdpic |
Unfortunately, when linking a DSO, anR_ARM_GOTOFFFUNCDESC
relocation referencing a hidden symbolresults in a linker error. This error likely arises because thegenerated R_ARM_FUNCDESC_VALUE
dynamic relocation requiresa dynamic symbol. While this can be implemented using anSTB_LOCAL STT_SECTION
dynamic symbol, GNU ld currentlylacks support for this approach.
1 | % arm-linux-gnueabihf-gcc -fpic -mfdpic -O2 -Wa,--fdpic q.c -shared |
Let's try sh4.sh4-linux-gnu-gcc -fpic -mfdpic -O2 q.c -shared -nostdlib
allows taking the address of a hidden function but not a protectedfunction.
Then, let's see a global variable initialized by the address of afunction and a C++ virtual table. 1
2
3struct A { virtual void foo(); };
void call(A *a) { a->foo(); }
auto *var_call = call;
1 | // arm-linux-gnueabihf-g++ -c -fpic -mfdpic -Wa,--fdpic |
GOT-indirect addressing is required for accessing data symbols undertwo conditions:
int var;
) and externally declared(extern int var;
) non-const variables.const A a; extern const int var;
extern constinit const int *const extern_const;
. Constantinitialization may require a relocation, e.g.constinit const int *const extern_const = &var;
1 | get_hidden_var: // non-preemptible data with potential data segment placement |
The dynamic relocations R_*_RELATIVE
andR_*_GLOB_DAT
do not use the standard+ load_base
semantics. It seems that musl fdpic doesn'tsupport the special R_*_RELATIVE
.
If the referenced data symbol is non-preemptible and guaranteed to bein the text segment, we can use PC-relative addressing. However, thisscenario is remarkably rare in practice. The most likely use case islike the following:
1 | const int ro_array[] = {1, 2, 3, 4}; // text segment |
GCC's arm port does not seem to utilize PC-relative addressing. Wecan try GCC's SuperH port:
1 | // sh4-linux-gnu-gcc -S -fpic -mfdpic -O2 q.c |
It optimizes get_hidden_var
but notget_ro_hidden_var
.
ARM FDPIC ABI defines static TLS relocationsR_ARM_TLS_GD32_FDPIC, R_ARM_TLS_LDM32_FDPIC, R_ARM_TLS_IE32_FDPIC
to be relative to GOT, as opposed to their non-FDPIC counterpartrelative to PC.
-mfdpic
, which only makes sense for-fpie
/-fpic
, enables FDPIC code generation.Like -mno-pic-data-is-text-relative
, external data accessesuse a different base register, r9 for arm. In addition, externalfunction calls save and restore r9.
gas's arm port needs --fdpic
to assemble FDPIC-relatedrelocation types. -mfdpic -mtls-dialect=gnu2
is notsupported. The ARM FDPIC ABI uses ldr
to load a 32-bitconstant embedded in the text segment. The offset is used to materializethe address of a GOT entry (canonical function descriptor, address ofthe canonical function descriptor, or address of data).
Several proposals exist for defining FDPIC-like ABIs to work forMMU-less systems.
Undoubtly, GP should be used as the FDPIC register.
Loading a constant near code (like ARM) is not efficient. Instead,consider a two-instruction sequence:
c.add a0, gp
to compute the address of the GOTentry.Maciej's code sequence supports both function and data access throughindirect GP-relative addressing. We can easily enhance it by addingR_RISCV_RELAX
to enable linker relaxation and improveperformance. Additionally, for consistency with similar notations onx86-64 and AArch64 ("gotpcrel"), let's adopt "gotgprel" notation.
1 | .L0: |
For data access, the code sequence is followed by instructions like:1
2lb a0,0(rY)
sb a1,0(rY)
Function descriptors and data have different semantics, requiring tworelocation types. Stefan O'Rear proposes:
R_RISCV_FUNCDESC_GOTGPREL_HI
: Find or create two GOTentries for the canonical function descriptor.R_RISCV_GOTGPREL_HI
: For or create a GOT for thesymbol, and return an offset from the FDPIC register.Drawing inspiration from ARM FDPIC, two additional relocation typesare needed for TLS. This results in a 4-type scheme.
Addressing performance concerns is crucial. Stefan suggests an"indirect-to-relative optimization and relaxation scheme":
R_RISCV_PIC_ADD
: Tags c.add rX, gp
toenable optimizationR_RISCV_INTERMEDIATE_LOAD
: Tagsld rY, <gotgprel_lo12>(rX)
to enableoptimizationIndirect GP-relative addressing can be optimized to directGP-relative addressing under specific conditions:
1 | # Indirect GP-relative to direct GP-relative |
GOT-indirect addressing can be optimized to PC-relative fornon-preemptible data in the text segment.
1 | # Indirect GP-relative to PC-relative |
GOT-indirect addressing can be optimized to absolute addressing fornon-preemptible data in the text segment.
1 | # Indirect GP-relative to absolute |
This can be used for SHN_ABS
and unresolved undefinedweak symbols. With -no-pie
linking, regular symbols areelligible for this optimization as well. However, linkers may choose notto implement this since the added complexity might outweigh thebenefits.
To handle TLSDESC, we introduce a new relocation type:R_RISCV_TLSDESC_GPREL_HI
. This type instructs the linker tofind or create two GOT entries unless optimized to local-exec orinit-exec. The combined hi20 and lo12 offsets compute the GP-relativeoffset to the first GOT entry.
1 | label: |
Existing relocation types, R_RISCV_TLSDESC_LOAD_LO12
andR_RISCV_TLSDESC_ADD_LO12
, are extended to work withR_RISCV_TLSDESC_GPREL_HI
.
1 | # TLSDESC to initial-exec optimization |
For initial-exec TLS model, we need a new pseudoinstruction, say,la.tls.ie.fd rX, sym
. It expands to:
1 | lui rX, 0 # R_RISCV_TLS_GOTGPREL_HI20(sym) |
Stefan's scheme uses R_RISCV_PIC_ADDR_LO12_I
. I think wecan just repurpose R_RISCV_PCREL_LO12_I
.