IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    MMU-less systems and FDPIC

    MaskRay发表于 2024-02-20 19:16:21
    love 0

    UNDER CONSTRUCTION

    This article describes some ABI and toolchain notes about systemswithout a Memory Management Unit (MMU), with a focus on FDPIC and thefuture RISC-V FDPIC ABI.

    Linux binfmt loaders

    fs/Kconfig.binfmt defines a few loaders.

    • BINFMT_ELF defaults to y and depends onMMU.
    • BINFMT_ELF_FDPIC defaults to y whenBINFMT_ELF is not selected. A few architecture supportBINFMT_ELF_FDPIC for NOMMU. ARM supports FDPIC even with aMMU.
    • BINFMT_FLAT is provided for a few architectures.

    BINFMT_AOUT, removed in 2022, had been supported foralpha/arm/x86-32.

    BFLT

    uClinux uses an object file format called Binary Flat format (BFLT).https://myembeddeddev.blogspot.com/2010/02/uclinux-flat-file-format.htmlhas an introduction.

    The Linux kernel has supported the format before the git era(fs/binfmt_flat.c). Both version 2(OLD_FLAT_VERSION in the kernel) and version 4 aresupported. Shared library support was removedin April 2022.

    BFLT is not for relocatable files. An image file is typicallyconverted from ELF using elf2flt.ld-elf2flt is a ld wrapper that invokeself2flt when the option -elf2flt is seen.

    GCC's m68k port supports -msep-data since 2005, aspecial -fPIC mode, which assumes that text and datasegments are placed in different areas of memory. This option is usedfor XIP (eXecute In Place). (gcc/config/bfin added-msep-data in 2006.)

    -mno-pic-data-is-text-relative

    gcc/config/arm added this option in 2013. This seemssimilar to -msep-data, but I'll need to do moreresearch.

    gcc/config/s390 ported this option in 2017. kpatch (livekernel patching) uses this option fors390x.

    FDPIC

    The Linux kernel has supported the format before the git era(fs/binfmt_elf_fdpic.c). Only ET_DYNexecutables are supported. Each port that supports FDPIC defines anEI_OSABI value to be checked by the loader.

    Several architectures define a FDPIC ABI.

    • FujitsuFR-V: TheFR-V FDPIC ABI, initial version in 2004
    • Blackfin:Blackfin FDPIC ABI
    • SuperH: TheSH FDPIC ABI, initial version in 2008
    • ARM FDPICABI, initial version in 2013.

    Here is a summary.

    The read-only sections, which can be shared, are commonly referred toas the "text segment", whereas the writable sections are non-shared andcommonly referred to as the "data segment". Functions and certain datasymbols (.rodata) reside in the text segment, while otherdata symbols and the GOT reside in the data segment. Special entriescalled "canonical function descriptors" also reside in the GOT.

    A call-clobbered register is reserved as the FDPIC register, used toaccess the data segment. Upon entry to a function, the FDPIC registerholds the address of _GLOBAL_OFFSET_TABLE_. The textsegment can be referenced using PC-relative addressing. We will seelater that sometimes it's unknown whether a non-preemptible symbolresides in the text segment or the data segment, in which caseGOT-indirect addressing with the FDPIC register has to be used.

    A function call is called external if the destination may reside inanother module, which has a different data segment and therefore needs adifferent FDPIC register value. Therefore, an external function callneeds to update the FDPIC register as well as changing the programcounter (PC). The FDPIC register can be spilled into a stack slot or acall-saved register, if the caller needs to reference the data segmentlater.

    Calling a function pointer, including calling a PLT entry, also setsboth the FDPIC register and PC. When the address of a function is taken,the address of its canonical function descriptor is obtained, not thatof the entry point. The descriptor, resides in the GOT, containspointers to both the function's entry point and its FDPIC registervalue. The two GOT entries are relocated by a dynamic relocation of typeR_*_FUNCDESC_VALUE (e.g. R_FRV_FUNCDESC_VALUE).

    If the symbol is preemptible, the code sequence loads a GOT entry.When the symbol is a function, the GOT entry is relocated by a dynamicrelocation R_*_FUNCDESC and will contain the address of thefunction descriptor address.

    Let's checkout examples taking addresses of functions and variables.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    __attribute__((visibility("hidden"))) void hidden_fun();
    void fun();
    __attribute__((visibility("hidden"))) extern int hidden_var;
    extern int var;
    __attribute__((visibility("hidden"))) const int ro_hidden_var = 42;

    void *get_hidden_fun() { return hidden_fun; }
    void *get_fun() { return fun; }
    void *get_hidden_var() { return &hidden_var; }
    void *get_var() { return &var; }
    const int *get_ro_hidden_var() { return &ro_hidden_var; }

    Function access in FDPIC

    Canonical function descriptors are stored in the GOT, and theiraccess depends on whether the referenced function is preemptible ornot.

    • For non-preemptible functions: the address of the descriptor isdirectly computed by adding an offset to the FDPIC register.
    • For preemptible functions: a GOT entry is loaded first. This entry,relocated by a R_*_FUNCDESC dynamic relocation, holds thefinal address of the function descriptor.
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    // arm-linux-gnueabihf-gcc -c -fpic -mfdpic -Wa,--fdpic
    get_hidden_fun: // non-preemptible function
    ldr r0, .L3 // r0 = &.got[n] - FDPIC
    add r0, r0, r9 // r0 = &.got[n]; the address of the canonical function descriptor
    ...
    .L3:
    // Linker resolves this to &.got[n] - FDPIC. .got[n], relocated by R_ARM_FUNCDESC_VALUE, is the canonical function descriptor.
    .word hidden_fun(GOTOFFFUNCDESC) // R_ARM_GOTOFFFUNCDESC(hidden_fun)

    get_fun: // preemptible function
    ldr r3, .L6 // r3 = &.got[n] - FDPIC
    ldr r0, [r9, r3] // r0 = &.got[n]; the address of the canonical function descriptor
    ...
    .L6:
    // Linker resolves this to &.got[n] - FDPIC. .got[n], relocated by R_ARM_FUNCDESC, will contain the address of the canonical function descriptor.
    .word fun(GOTFUNCDESC) // R_ARM_GOTFUNCDESC(fun)

    Unfortunately, when linking a DSO, anR_ARM_GOTOFFFUNCDESC relocation referencing a hidden symbolresults in a linker error. This error likely arises because thegenerated R_ARM_FUNCDESC_VALUE dynamic relocation requiresa dynamic symbol. While this can be implemented using anSTB_LOCAL STT_SECTION dynamic symbol, GNU ld currentlylacks support for this approach.

    1
    2
    3
    4
    % arm-linux-gnueabihf-gcc -fpic -mfdpic -O2 -Wa,--fdpic q.c -shared
    /tmp/ccxpnij8.o: in function `get_hidden_fun':
    q.c:(.text+0x10): dangerous relocation: no dynamic index information available
    collect2: error: ld returned 1 exit status

    Let's try sh4.sh4-linux-gnu-gcc -fpic -mfdpic -O2 q.c -shared -nostdliballows taking the address of a hidden function but not a protectedfunction.

    Then, let's see a global variable initialized by the address of afunction and a C++ virtual table.

    1
    2
    3
    struct A { virtual void foo(); };
    void call(A *a) { a->foo(); }
    auto *var_call = call;

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    // arm-linux-gnueabihf-g++ -c -fpic -mfdpic -Wa,--fdpic
    ldr r3, [r0] // load vtable
    ...
    ldr r3, [r3] // load vtable entry `.word _ZN1A3fooEv(FUNCDESC)`
    ldr r9, [r3, #4] // load FDPIC register value
    ldr r3, [r3] // load foo's entry point
    blx r3

    .section .data.rel,"aw"
    var_call:
    // Function descriptor address, relocated by R_ARM_FUNCDESC dynamic relocation
    .word _Z4callP1A(FUNCDESC) // R_ARM_FUNCDESC

    .section .data.rel.ro,"aw"
    _ZTV1A:
    .word 0
    .word _ZTI1A
    // Function descriptor address, relocated by R_ARM_FUNCDESC dynamic relocation
    .word _ZN1A3fooEv(FUNCDESC) // R_ARM_FUNCDESC

    Data access in FDPIC

    GOT-indirect addressing is required for accessing data symbols undertwo conditions:

    • Preemptible symbols: Traditional GOT requirement.
    • Non-preemptible symbols with potential data segment placement: Thisincludes
      • Writable data symbols: This covers both locally declared(int var;) and externally declared(extern int var;) non-const variables.
      • Potential dynamic initialization:const A a; extern const int var;
      • Certain guaranteed constant initialization:extern constinit const int *const extern_const;. Constantinitialization may require a relocation, e.g.constinit const int *const extern_const = &var;
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    get_hidden_var: // non-preemptible data with potential data segment placement
    ldr r3, .L9 // r3 = &.got[n] - FDPIC
    ldr r0, [r9, r3]
    ...
    .L9:
    // Linker resolves this to &.got[n] - FDPIC. .got[n], relocated by R_ARM_RELATIVE, will contain the address of hidden_var.
    .word hidden_var(GOT) // R_ARM_GOT_BREL

    get_var: // preemptible data
    ldr r3, .L12
    ldr r0, [r9, r3]
    ...
    .L12:
    // Linker resolves this to &.got[n] - FDPIC. .got[n], relocated by R_ARM_GLOB_DAT, will contain the address of var.
    .word var(GOT) // R_ARM_GOT_BREL

    The dynamic relocations R_*_RELATIVE andR_*_GLOB_DAT do not use the standard+ load_base semantics. It seems that musl fdpic doesn'tsupport the special R_*_RELATIVE.

    If the referenced data symbol is non-preemptible and guaranteed to bein the text segment, we can use PC-relative addressing. However, thisscenario is remarkably rare in practice. The most likely use case islike the following:

    1
    2
    3
    const int ro_array[] = {1, 2, 3, 4}; // text segment

    int get_ro_array_elem(int i) { return ro_array[i]; }

    GCC's arm port does not seem to utilize PC-relative addressing. Wecan try GCC's SuperH port:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    // sh4-linux-gnu-gcc -S -fpic -mfdpic -O2 q.c
    get_hidden_var: // non-preemptible data
    mov.l .L12,r0
    rts
    add r12,r0
    .L13:
    .align 2
    .L12:
    .long hidden_var@GOTOFF

    get_ro_hidden_var: // non-preemptible data
    mov.l .L18,r0
    rts
    mov.l @(r0,r12),r0
    .L19:
    .align 2
    .L18:
    .long ro_hidden_var@GOT

    It optimizes get_hidden_var but notget_ro_hidden_var.

    Thread-local storage

    ARM FDPIC ABI defines static TLS relocationsR_ARM_TLS_GD32_FDPIC, R_ARM_TLS_LDM32_FDPIC, R_ARM_TLS_IE32_FDPICto be relative to GOT, as opposed to their non-FDPIC counterpartrelative to PC.

    Toolchain notes

    -mfdpic, which only makes sense for-fpie/-fpic, enables FDPIC code generation.Like -mno-pic-data-is-text-relative, external data accessesuse a different base register, r9 for arm. In addition, externalfunction calls save and restore r9.

    gas's arm port needs --fdpic to assemble FDPIC-relatedrelocation types. -mfdpic -mtls-dialect=gnu2 is notsupported. The ARM FDPIC ABI uses ldr to load a 32-bitconstant embedded in the text segment. The offset is used to materializethe address of a GOT entry (canonical function descriptor, address ofthe canonical function descriptor, or address of data).

    RISC-V

    Several proposals exist for defining FDPIC-like ABIs to work forMMU-less systems.

    • [RFC]RISC-V ELF FDPIC psABI addendum and RISC-VFDPIC/NOMMU toolchain/runtime support: This offers a starting point,but needs further discussion.
    • lowRISC ePIC: While simple and interesting, ePIC lacks dynamiclinking support.
    • Stefan O'Rear's proposal: This proposal holds promise and deservesclose attention.

    Undoubtly, GP should be used as the FDPIC register.

    Loading a constant near code (like ARM) is not efficient. Instead,consider a two-instruction sequence:

    • Use hi20 and lo12 instructions to generate an offset relative to theGP register.
    • Use c.add a0, gp to compute the address of the GOTentry.

    Maciej's code sequence supports both function and data access throughindirect GP-relative addressing. We can easily enhance it by addingR_RISCV_RELAX to enable linker relaxation and improveperformance. Additionally, for consistency with similar notations onx86-64 and AArch64 ("gotpcrel"), let's adopt "gotgprel" notation.

    1
    2
    3
    4
    .L0:
    lui rX, %gotgprel_hi20(sym) # R_RISCV_?(sym); R_RISCV_RELAX
    c.add rX, gp # R_RISCV_?(.L0)
    ld rY, %gotgprel_lo12(sym) # R_RISCV_?(.L0); rY = address

    For data access, the code sequence is followed by instructions like:

    1
    2
    lb a0,0(rY)
    sb a1,0(rY)

    Function descriptors and data have different semantics, requiring tworelocation types. Stefan O'Rear proposes:

    • R_RISCV_FUNCDESC_GOTGPREL_HI: Find or create two GOTentries for the canonical function descriptor.
    • R_RISCV_GOTGPREL_HI: For or create a GOT for thesymbol, and return an offset from the FDPIC register.

    Drawing inspiration from ARM FDPIC, two additional relocation typesare needed for TLS. This results in a 4-type scheme.

    RISC-V FDPIC: optimization

    Addressing performance concerns is crucial. Stefan suggests an"indirect-to-relative optimization and relaxation scheme":

    • R_RISCV_PIC_ADD: Tags c.add rX, gp toenable optimization
    • R_RISCV_INTERMEDIATE_LOAD: Tagsld rY, <gotgprel_lo12>(rX) to enableoptimization

    Indirect GP-relative addressing can be optimized to directGP-relative addressing under specific conditions:

    • Non-preemptible functions
    • Non-preemptible data in the data segment
    1
    2
    3
    4
    # Indirect GP-relative to direct GP-relative
    lui rX, <gprel_hi20>
    c.add rX, gp
    addi rY, rX, <gprel_lo12>

    GOT-indirect addressing can be optimized to PC-relative fornon-preemptible data in the text segment.

    1
    2
    3
    4
    # Indirect GP-relative to PC-relative
    auipc rX, %pcrel_hi20(sym)
    c.nop # deletable
    addi rY, rX, %pcrel_lo12(sym)

    GOT-indirect addressing can be optimized to absolute addressing fornon-preemptible data in the text segment.

    1
    2
    3
    4
    # Indirect GP-relative to absolute
    lui rX, %hi20(sym)
    c.nop # deletable
    addi rY, rX, %lo12(sym)

    This can be used for SHN_ABS and unresolved undefinedweak symbols. With -no-pie linking, regular symbols areelligible for this optimization as well. However, linkers may choose notto implement this since the added complexity might outweigh thebenefits.

    RISC-V FDPIC: thread-localstorage

    To handle TLSDESC, we introduce a new relocation type:R_RISCV_TLSDESC_GPREL_HI. This type instructs the linker tofind or create two GOT entries unless optimized to local-exec orinit-exec. The combined hi20 and lo12 offsets compute the GP-relativeoffset to the first GOT entry.

    1
    2
    3
    4
    5
    6
    label:
    lui rX, %tlsdesc_gprel_hi(sym) # R_RISCV_TLSDESC_GPREL_HI(sym); R_RISCV_RELAX
    c.add a0, gp # R_RISCV_PIC_ADD(label)
    ld rY, rX, %tlsdesc_load_lo(label) # R_RISCV_TLSDESC_LOAD_LO12(label)
    addi a0, rX, %tlsdesc_add_lo(label) # R_RISCV_TLSDESC_ADD_LO12(label)
    jalr t0, rY, %tlsdesc_call(label) # R_RISCV_TLSDESC_CALL(label)

    Existing relocation types, R_RISCV_TLSDESC_LOAD_LO12 andR_RISCV_TLSDESC_ADD_LO12, are extended to work withR_RISCV_TLSDESC_GPREL_HI.

    1
    2
    3
    4
    5
    6
    7
    8
    # TLSDESC to initial-exec optimization
    lui a0, <gottpoff_gprel_hi20>
    c.add a0, gp
    ld a0, <gottpoff_gprel_lo12>(a0)

    # TLSDESC to local-exec optimization
    lui a0, <tpoff_hi20>
    addi a0, a0, <tpoff_lo12>

    For initial-exec TLS model, we need a new pseudoinstruction, say,la.tls.ie.fd rX, sym. It expands to:

    1
    2
    3
    lui rX, 0                    # R_RISCV_TLS_GOTGPREL_HI20(sym)
    c.add rX, gp # R_RISCV_PIC_ADD(label)
    ld rX, 0(rX) # R_RISCV_PCREL_LO12_I(label)

    Stefan's scheme uses R_RISCV_PIC_ADDR_LO12_I. I think wecan just repurpose R_RISCV_PCREL_LO12_I.

    RISC-V FDPIC: -fno-plt



沪ICP备19023445号-2号
友情链接