IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    Linker notes on AArch64

    MaskRay发表于 2023-08-03 02:42:13
    love 0

    This article describes target-specific details about AArch64 in ELFlinkers. AArch64 is the 64-bit execution state for the Arm architecture.The AArch64 execution state runs the A64 instruction set. The AArch32and AArch64 execution states use very different instruction sets, somany pieces of software use two ports for the two execution states ofthe Arm architecture.

    There were the "ARM architecture" and the "ARM instruction set",leading to many software projects using "ARM" or "arm" as their portnames. In 2011, ARMv8 introduced two execution states, AArch32 andAArch64. The previous instruction sets "ARM" and "Thumb" were renamed to"A32" and "T32", respectively. In 2017, the architecture was renamed tothe "Arm architecture" to reflect the rebranding of the company name.So, the "ARMv8-A" architecture profile is now named "Armv8-A".

    For the AArch64 execution state, while many projects use "AArch64" astheir port name, for legacy reasons, macOS, Windows, the Linux kernel,and some BSD operating systems unfortunately use "arm64". (Support forAArch64 was added to the Linux kernel in version 3.7. Initially, thepatch set was named "aarch64", but it was later changed at the request of kerneldevelopers.)

    ABI documents

    • ELFfor the Arm® 64-bit Architecture (AArch64)
    • SystemV ABI for the Arm® 64-bit Architecture (AArch64)

    Global Offset Table

    The Global Offset Table consists of two sections:

    • .got.plt holds code addresses for PLT.
    • .got holds other addresses and offsets.

    The symbol _GLOBAL_OFFSET_TABLE_ is defined at thebeginning of the .got section. GNU ld reserves a singleentry for .got and .got[0] holds the link-timeaddress of _DYNAMIC for a legacy reason Versions of glibcprior to 2.35 have the _DYNAMIC requirement. See Allabout Global Offset Table.

    .got.plt[1] and .got.plt[2] are for lazybinding PLT. Linkers communicate the address of .got.plt tortld with the dynamic tag DT_PLTGOT.

    Procedure Linkage Table

    The registers x16 (IP0) and x17 (IP1) arethe first and second intra-procedure-call temporary registers. They maybe used by PLT entries and veneers.

    The PLT header looks like:

    1
    2
    3
    4
    5
    6
    bti  c       // If BTI
    stp x16, x30, [sp,#-16]!
    adrp x16, &.got.plt[2]
    ldr x17, [x16, :lo12: &.got.plt[2]]
    add x16, x16, :lo12: &.got.plt[2]
    br x17

    The Nth PLT entry looks like:

    1
    2
    3
    4
    5
    6
    bti  c       // If BTI
    adrp x16, &.got.plt[N + 3]
    ldr x17, [x16, :lo12: &.got.plt[N + 3]]
    add x16, x16, :lo12: &.got.plt[N + 3]
    autia1716 // If PAC-PLT
    br x17

    When BTI is enabled for the output file, the code sequence startswith bti c. When PAC-PLT is enabled, the code sequenceincludes autia1716 before br x17.

    x16 is used by the lazy PLT resolver. Forld -z now, setting x16 is unneeded. https://github.com/ARM-software/abi-aa/issues/202discusses making x16 setting optional.

    Relocation optimization

    See Allabout Global Offset Table#GOT optimization for GOT optimization.

    There are a few optimization schemes beside GOT optimization,e.g.

    1
    2
    3
    4
    5
    add  x2, x2, 0  // R_<CLS>_ADD_ABS_LO12_NC

    =>

    nop
    1
    2
    3
    4
    5
    6
    7
    adrp x0, symbol
    add x0, x0, :lo12: symbol

    =>

    nop
    adr x0, symbol

    --no-relax disables the optimization.

    See ELFfor the Arm® 64-bit Architecture (AArch64)#Relocationoptimization.

    Thread Local Storage

    AArch64 uses a variant of TLS Variant I: the static TLS blocks areplaced above the thread pointer. The thread pointer points to the end ofthe thread control block.

    The linker performs TLS optimization.

    The traditional general dynamic and local dynamic TLS models areobsoleted and not supported by ld.lld

    See Allabout thread-local storage.

    Program Property

    A .note.gnu.property section contains program propertynotes that describe special handling requirements for the linker and thedynamic loader.

    The linker parses input .note.gnu.property sections andrecognizes command line options -z force-bti and-z pac-plt to compute the output.note.gnu.property (type is SHT_NOTE) section.Without these options, linkers only set the feature bit in the outputfile if all the input relocatable object files have the correspondingfeature set.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    for (ELFFileBase *f : ctx.objectFiles) {
    uint32_t features = f->andFeatures;
    if (!(features & GNU_PROPERTY_AARCH64_FEATURE_1_BTI)) {
    if (config->zBtiReport == "error")
    error(toString(f) + ": -z bti-report: file does not have GNU_PROPERTY_AARCH64_FEATURE_1_BTI property");
    else if (config->zBtiReport == "warning")
    warn(toString(f) + ": -z bti-report: file does not have GNU_PROPERTY_AARCH64_FEATURE_1_BTI property");
    }

    if (config->zForceBti && !(features & GNU_PROPERTY_AARCH64_FEATURE_1_BTI)) {
    if (config->zBtiReport == "none")
    warn(toString(f) + ": -z force-bti: file does not have "
    "GNU_PROPERTY_AARCH64_FEATURE_1_BTI property");
    features |= GNU_PROPERTY_AARCH64_FEATURE_1_BTI;
    }
    if (config->zPacPlt && !(features & GNU_PROPERTY_AARCH64_FEATURE_1_PAC)) {
    warn(toString(f) + ": -z pac-plt: file does not have "
    "GNU_PROPERTY_AARCH64_FEATURE_1_PAC property");
    features |= GNU_PROPERTY_AARCH64_FEATURE_1_PAC;
    }
    ret &= features;
    }

    Range extension thunks

    Function calls typically use B and BLinstructions. The two instructions have a range of +/-128MiB and may use2 relocation types: R_AARCH64_CALL26 andR_AARCH64_JUMP26. The range is larger than the branch rangefor many other instruction sets. If the destination is not reachable bya single B/BL, linkers may insert a veneer(range extension thunk).

    -no-pie links may use a thunk with absolute addressingtargeting any location in the 64-bit address space.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    <caller>:
    bl __AArch64AbsLongThunk_nonpreemptible
    b __AArch64AbsLongThunk_nonpreemptible

    <__AArch64AbsLongThunk_nonpreemptible>:
    ldr x16, .+8
    br x16

    <$d>:
    .word 0x00000000
    .word 0x00000010

    <.plt>:

    -pie and -shared links need to use a thunkwith PC-relative addressing targeting a range of +/-4GiB.

    1
    2
    3
    4
    5
    6
    7
    8
    <caller>:
    bl __AArch64ADRPThunk_nonpreemptible
    b __AArch64ADRPThunk_nonpreemptible

    <__AArch64ADRPThunk_nonpreemptible>:
    adrp x16, nonpreemptible
    add x16, x16, :lo12: nonpreemptible
    br x16

    The branch target of a thunk may be a PLT entry:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    <caller>:
    bl __AArch64ADRPThunk_preemptible

    <__AArch64ADRPThunk_preemptible>:
    adrp x16, preemptible@plt
    add x16, x16, :lo12: preemptible@plt
    br x16

    ...

    <preemptible@plt>:
    adrp x16, &.got.plt[N + 3]
    ldr x17, [x16, :lo12: &.got.plt[N + 3]]
    add x16, x16, :lo12: &.got.plt[N + 3]
    br x17

    In many situations we don't need the long code sequence(ldr/br or adrp/add/br). Instead, the thunk(short range thunk) canuse one single b/bl to reach the destination.

    --fix-cortex-a53-843419

    This option enables a linker workaround for Arm Cortex-A53 Errata843419. Full details are available in the ARM-EPM-048406 document.Linkers scan adrp in the last two instructions of a 4KiBpage, followed by a load or store instruction and two otherinstructions. Oncea erratum condition is detected, linkers try torewrite it into an alternative code sequence. See the comments in theimplementations for detail.

    In ld.lld this is implemented as a thunk, similar to a rangeextension thunk. ld.lld additionally sets a workaroundwhen relocating R_AARCH64_JUMP26.

    --android-memtag-{mode,stack,heap}

    The options instruct ld.lld to createDT_AARCH64_MEMTAG_* dynamic tags. See MemtagABI Extension to ELF for the Arm® 64-bit Architecture (AArch64).

    PAuth ABI

    -z pac-plt enables PAC PLT and sets theDT_AARCH64_PAC_PLT dynamic tag.



沪ICP备19023445号-2号
友情链接