IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    Relocation overflow and code models

    MaskRay发表于 2023-05-14 21:55:18
    love 0

    When linking an oversized executable, it is possible to encountererrors such asrelocation truncated to fit: R_X86_64_PC32 against `.text'(GNU ld) or relocation R_X86_64_PC32 out of range (ld.lld).These diagnostics are a result of the relocation overflow check, afeature in the linker.

    This article aims to explain why such issues can occur and providesinsights on how to mitigate them.

    Certain groups prefer static linking or mostly static linking for thesake of deployment convenience and performance. In scenarios where thedistributed program contains a significant amount of code (related:software bloat), employing full or mostly static linking can result invery large executable files. Consequently, certain relocations may beclose to the distance limit, and even a minor disruption can triggerrelocation overflow linker errors.

    Static linking

    In this section, we will deviate slightly from the main topic todiscuss static linking. Static linking involves copying dependenciesdirectly into the final executable and is often preferred in certainscenarios due to its advantages in deployment convenience andperformance.

    By including all dependencies within the executable itself, it canrun without relying on external shared objects. This eliminates thepotential risks associated with updating dependencies separately.

    Static linking also offers performance benefits in severalaspects:

    • Link-time optimization is more effective when all dependencies areknown. Providing shared object information during executableoptimization is possible, but it may not be a worthwhile engineeringeffort.
    • Profiling techniques are more efficient dealing with one singleexecutable.
    • The traditional ELF dynamic linking approach incurs overhead tosupport symbolinterposition.
    • Dynamic linking involves PLT and GOT, which can introduce additionaloverhead. Static linking eliminates the overhead.
    • Loading libraries in the dynamic loader has a time complexityO(|libs|^2*|libname|). The existing implementations aredesigned to handle tens of shared objects, rather than a thousand ormore.

    Furthermore, the current lack of techniques to partition anexecutable into a few larger shared objects, as opposed to numeroussmaller shared objects, exacerbates the overhead issue.

    These performance benefits make static linking an attractive optionin certain scenarios, where the convenience and performance gainsoutweigh the potential drawbacks.

    Relocation overflow

    We will use the following C program to illustrate the concepts.

    1
    2
    3
    int var;
    int callee();
    int caller() { return callee() + var; }

    The generated x86-64 assembly may appear as follows, with commentsindicating the relocation types associated with each instruction:

    1
    2
    3
    4
    5
    6
    7
    8
    .globl caller
    caller:
    call callee@PLT # R_X86_64_PLT32
    add eax, dword ptr [rip + var] # R_X86_64_PC32

    .bss
    .globl var
    var: .long 0

    Both R_X86_64_PLT32 and R_X86_64_PC32 havea value range of [-2**31,2**31). If the referenced symbolis too far away from the relocated location, we may get a relocationoverflow.

    In practice, relocation overflows due to code referencing code arenot common. The more frequent occurrences of overflows involve thefollowing categories (where we use .text to represent codesections, .rodata for read-only data, and so on):

    • .text <-> .rodata
    • .text <-> .eh_frame: .eh_frame has32-bit offsets. 64-bit code offsets are possible, but I don't know if animplementation exists.
    • .text <-> .bss
    • .rodata <-> .bss

    In many programs, .text <-> .data/.bss relocationshave the stringent constraints. Overflows due to.text <-> .rodata relocations are possible but rare(although I have encountered such issues in the past). Typically,.rodata tends to be larger than the combined.data and .bss.

    .rodata <-> .bss overflows are generallyinfrequent. However, caution must be exercised when working withmetadata .quad label-. instead of.long label-.. Such issues can be easily addressed on thecompiler side.

    The current section layout of ld.lld is as follows:

    1
    2
    3
    4
    .rodata
    .text
    .data
    .bss

    One notable distinction from GNU ld is that .rodataprecedes .text. This ordering decreases the distancebetween .text and .data/.bss,thereby alleviating relocation overflow pressure for references from.text to .data/.bss.

    For handling .eh_frame, I suggest compiling with the -fno-asynchronous-unwind-tablesoption. .eh_frame is used by runtime support for C++exceptions. Many programs don't utilitize exceptions, making.eh_frame non-mandatory. For profiling purposes, thelimitations of the .eh_frame format have become apparent,and it is not the most suitable unwinding format.

    x86-64

    The x86-64 psABI defines multiple code models. The small code modelis the one we are most familiar with. In the small code model, symbolsare required to be located within the range[0, 2**31 - 2**24).

    The medium code model introduces large data sections such as.lrodata, .ldata, .lbss, and usesmovabs instructions to access them. GCC providess severalvariants for .ldata, including .ldata.rel,.ldata.rel.local, .ldata.rel.ro, and.ldata.rel.ro.local.

    Linkers are expected to recognize these large data sections and placethem in appropriate locations. GNU ld uses the following section layoutin its internal linker scripts:

    1
    2
    3
    4
    5
    6
    7
    8
    .text
    .rodata # if -z separate-code, MAXPAGESIZE alignment
    RELRO # DATA_SEGMENT_ALIGN
    .data # DATA_SEGMENT_RELRO_END
    .bss
    .lbss
    .lrodata # MAXPAGESIZE alignment
    .ldata # MAXPAGESIZE alignment

    GNU ld places .lbss, .lrodata,.ldata after .bss and inserts 2 MAXPAGESIZEalignments for .lrodata and .ldata.

    .lbss is placed immediately after .bss,creating a single BSS, i.e. a read-write PT_LOAD programheader with p_filesz<p_memsz. The file image in thesegment is smaller than the memory image. When the dynamic loadercreates the memory image for the PT_LOAD segment, it willset the byte range [p_filesz,p_memsz) to zeros. However,there is a missing optimization that the [p_filesz,p_memsz)portion occupies zero bytes in the object file, preventing overlapbetween .lrodata and BSS in the file image.

    For ld.lld, I am contemplating the following section layout:

    1
    2
    3
    4
    5
    6
    7
    8
    .lrodata
    .rodata
    .text # if --ro-segment, MAXPAGESIZE alignment
    RELRO # MAXPAGESIZE alignment
    .data # MAXPAGESIZE alignment
    .bss
    .ldata # MAXPAGESIZE alignment
    .lbss

    GCC generates both regular and large data sections with-mcmodel=medium. This is decided by a section sizethreshold (-mlarge-data-threshold). In practice, programsalways mix object files built from small and medium/large code models(just think of prebuilt object files including libc). The large datasections built with -mcmodel=large do not exert relocationpressure on sections in object files with-mcmodel=small

    However, GCC only generates regular data sections with-mcmodel=large. -mlarge-data-threshold isignored. As a result, the data sections built with-mcmodel=large may exert relocation pressure on sections inobject files with -mcmodel=small.

    I propose that we make -mcmodel=large respect-mlarge-data-threshold and generate large data sections aswell. I posted a GCC patch and Uros Bizjak askedme to discuss the issue with the x86-64 psABI group. So I created Large datasections for the large code model.

    AArch64

    Likewise, the psABI defines multiple code models. The small codemodel allows for a maximum text segment size of 2GiB and a maximumcombined span of text and data segments of 4GiB. For smallposition-independent code (pic), there is an additional restriction onthe size of the Global Offset Table (GOT), which must be smaller than32KiB. The maximum combined span of text and data segments is largerthan that of x86-64.

    Linked object file sizes for AArch64 and x86-64 are comparable, butAArch64 linked object files are more resistant to relocationoverflows.

    For .text <-> .rodata and.text <-> .bss references, x86-64 usesR_X86_64_REX_GOTPCRELX/R_X86_64_PC32relocations, which have a smaller range [-2**31,2**31). Incontrast, AArch64 employs R_AARCH64_ADR_PREL_PG_HI21relocations, which has a doubled range of [-2**32,2**32).This larger range makes it unlikely for AArch64 to encounter relocationoverflow issues before the binary becomes excessively oversized forx86-64.

    1
    2
    3
    bl      callee              // R_AARCH64_CALL26
    adrp x8, var // R_AARCH64_ADR_PREL_PG_HI21
    ldr w8, [x8, :lo12:var] // R_AARCH64_LDST32_ABS_LO12_NC

    Note: if callee is far away from caller,the linker will generate a range extensionthunk.

    GCC and Clang don't implement -mcmodel=large for PIC.This makes sense as we haven't identified a use case yet.

    Mitigation

    There are several strategies to mitigate relocation overflowissues.

    • Make the program smaller by reducing code and data size.
    • Partition the large monolithic executable into the main executableand a few shared objects.
    • Use linker script commands INSERT BEFOREand INSERT AFTER to reorder output sections.

    Debug information

    For large executables, it is possible to encounter DWARF32limitation. I will address this topic in another article.



沪ICP备19023445号-2号
友情链接