IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    Mapping symbols: rethinking for efficiency

    MaskRay发表于 2024-07-22 01:46:02
    love 0

    In object files, certain code patterns incorporate data directlywithin the code or transit between instruction sets. This practice canpose challenges for disassemblers as data might be mistakenlyinterpreted as code, leading to nonsensical output. In addition, codefrom instruction set A might be disassembled as instruction set B. Toaddress the issues, some architectures define mapping symbols todescribe state transition. Let's explore this concept using an AArch32code example:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    .text
    adr r0, .LJTI0_0
    vldr d0, .LCPI0_0
    bl thumb_callee
    .LBB0_1:
    nop
    .LBB0_2:
    nop

    .LCPI0_0:
    .long 3367254360 @ double 1.234
    .long 1072938614

    nop

    .LJTI0_0:
    .long .LBB0_1
    .long .LBB0_2

    .thumb
    .type thumb_callee, %function
    thumb_callee:
    bx lr

    Jump Tables (.LJTI0_0): Jump tables, like .LJTI0_0, store a list ofabsolute addresses used for efficient branching. They can reside ineither data or text sections, each with its trade-offs.

    Jump tables (.LJTI0_0): Jump tables canreside in either data or text sections, each with its trade-offs. Herewe see a jump table in the text section, allowing a single instructionto take its address. Other architectures generally prefer to place jumptables in data sections. While avoiding data in code, RISC architecturestypically require two instructions to materialize the address, sincetext/data distance can be pretty large.

    Constant pool (.LCPI0_0): Thevldr instruction loads a 16-byte floating-point literal tothe SIMD&FP register.

    ISA transition: This code blends A32 and T32instructions (the latter used in thumb_callee).

    In these cases, a dumb disassembler might treat data as code and trydisassembling them as instructions. Assemblers create mapping symbols toassist disassemblers. For this example, the assembled object file lookslike the following:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    $a:
    ...

    $d:
    .long 3367254360 @ double 1.234
    .long 1072938614

    $a:
    nop

    $d:
    .long .LBB0_1
    .long .LBB0_2
    .long .LBB0_3

    $t:
    thumb_callee:
    bx lr

    Toolchain

    Now, let's delve into how mapping symbols are managed within thetoolchain.

    Disassemblers

    llvm-objdump sorts symbols, including mapping symbols, relative tothe current section, presenting interleaved labels and instructions.Mapping symbols act as signals for the disassembler to switchstates.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    % fllvm-objdump -d --triple=armv7 --show-all-symbols a.o

    a.o: file format elf32-littlearm

    Disassembly of section .text:

    00000000 <$a.0>:
    0: e28f0018 add r0, pc, #24
    4: ed9f0b02 vldr d0, [pc, #8] @ 0x14
    8: ebfffffe bl 0x8 @ imm = #-0x8
    c: e320f000 hint #0x0
    10: e320f000 hint #0x0

    00000014 <$d.1>:
    14: 58 39 b4 c8 .word 0xc8b43958
    18: 76 be f3 3f .word 0x3ff3be76

    0000001c <$a.2>:
    1c: e320f000 nop

    00000020 <$d.3>:
    20: 0c 00 00 00 .word 0x0000000c
    24: 10 00 00 00 .word 0x00000010

    00000028 <$t.4>:
    00000028 <thumb_callee>:
    28: 4770 bx lr

    I changed llvm-objdump18 to not display mapping symbols as labels unless--show-all-symbols is specified.

    nm

    Both llvm-nm and GNU nm typically conceal mapping symbols alongsideSTT_FILE and STT_SECTION symbols. However, youcan reveal these special symbols using the --special-symsoption.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    % cat a.s
    foo:
    bl thumb_callee
    .long 42
    .thumb
    thumb_callee:
    bx lr
    % clang --target=arm-linux-gnueabi -c a.s
    % fllvm-nm a.o
    00000000 t foo
    00000008 t thumb_callee
    % fllvm-nm --special-syms a.o
    00000000 t $a.0
    00000004 t $d.1
    00000008 t $t.2
    00000000 t foo
    00000008 t thumb_callee

    GNU nm behaves similarly, but with a slight quirk. If the default BFDtarget isn't AArch32, mapping symbols are displayed even without--special-syms.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    % arm-linux-gnueabi-nm a.o
    00000000 t foo
    00000008 t thumb_callee
    % nm a.o
    00000000 t $a.0
    00000004 t $d.1
    00000008 t $t.2
    00000000 t foo
    00000008 t thumb_callee

    Symbolizers

    Mapping symbols, being non-unique and lacking descriptive names, areintentionally omitted by symbolizers like addr2line and llvm-symbolizer.Their primary role lies in guiding the disassembly process rather thanproviding human-readable context.

    Size problem: symbol tablebloat

    While mapping symbols are useful, they can significantly inflate thesymbol table, particularly in 64-bit architectures(sizeof(Elf64_Sym) == 24) with larger programs. This issuebecomes more pronounced when using-ffunction-sections -fdata-sections, which generatesnumerous small sections.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    % cat a.c
    void f0() {}
    void f1() {}
    void f2() {}
    int d1 = 1;
    int d2 = 2;
    % clang -c --target=aarch64 -ffunction-sections -fdata-sections a.c
    % llvm-readelf -sX a.o

    Symbol table '.symtab' contains 17 entries:
    Num: Value Size Type Bind Vis+Other Ndx(SecName) Name [+ Version Info]
    0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
    1: 0000000000000000 0 FILE LOCAL DEFAULT ABS a.c
    2: 0000000000000000 0 SECTION LOCAL DEFAULT 3 (.text.f0) .text.f0
    3: 0000000000000000 0 NOTYPE LOCAL DEFAULT 3 (.text.f0) $x.0
    4: 0000000000000000 0 SECTION LOCAL DEFAULT 4 (.text.f1) .text.f1
    5: 0000000000000000 0 NOTYPE LOCAL DEFAULT 4 (.text.f1) $x.1
    6: 0000000000000000 0 SECTION LOCAL DEFAULT 5 (.text.f2) .text.f2
    7: 0000000000000000 0 NOTYPE LOCAL DEFAULT 5 (.text.f2) $x.2
    8: 0000000000000000 0 NOTYPE LOCAL DEFAULT 6 (.data.d1) $d.3
    9: 0000000000000000 0 NOTYPE LOCAL DEFAULT 7 (.data.d2) $d.4
    10: 0000000000000000 0 NOTYPE LOCAL DEFAULT 8 (.comment) $d.5
    11: 0000000000000000 0 NOTYPE LOCAL DEFAULT 10 (.eh_frame) $d.6
    12: 0000000000000000 4 FUNC GLOBAL DEFAULT 3 (.text.f0) f0
    13: 0000000000000000 4 FUNC GLOBAL DEFAULT 4 (.text.f1) f1
    14: 0000000000000000 4 FUNC GLOBAL DEFAULT 5 (.text.f2) f2
    15: 0000000000000000 4 OBJECT GLOBAL DEFAULT 6 (.data.d1) d1
    16: 0000000000000000 4 OBJECT GLOBAL DEFAULT 7 (.data.d2) d2

    Except the trivial cases (e.g. empty section), in both GNU assemblerand LLVM integrated assembler:

    • A non-text section (data, debug, etc) almost always starts with aninitial $d.
    • A text section almost always starts with an initial $x.ABI requires a mapping symbol at offset 0.

    The behaviors ensure that each function or data symbol has acorresponding mapping symbol, while extra mapping symbols might occur inrare cases. Thereore, the number of mapping symbols in the output symboltable usually exceeds 50%.

    Most text sections have 2 or 3 symbols:

    • A STT_FUNC symbol.
    • A STT_SECTION symbol due to a referenced from.eh_frame. This symbol is absent if-fno-asynchronous-unwind-tables.
    • A $x mapping symbol.

    During the linking process, the linker combines input sections andeliminates STT_SECTION symbols.

    Alternative mapping symbolscheme

    I have proposed an alternaive scheme to address the size concern.

    • Text sections: Assume an implicit $x at offset 0. Addan ending $x if the final data isn't instructions.
    • Non-text sections: Assume an implicit $d at offset 0.Add an ending $d only if the final data isn't datacommands.

    This approach eliminates most mapping symbols while ensuring correctdisassembly. Here is an illustrated assembler example:

    1
    2
    3
    4
    5
    6
    7
    8
    .section .text.f0,"ax"
    ret
    // emit $d
    .long 42
    // emit $x. Without this, .text.f1 might be interpreted as data.

    .section .text.f1,"ax"
    ret

    The ending mapping symbol is to ensure the subsequent section in thelinker output starts with the desired state. The data in code case isextremely rare for AArch64 as jump tables are usually placed in.rodata.

    Impressive results

    Experiments with a Clang build using this alternative scheme haveshown impressive results, eliminating over 50% of symbol tableentries.

    1
    2
    3
    4
    .o size   |   build |
    261345384 | a64-0 | standard
    260443728 | a64-1 | optimizing $d
    254106784 | a64-2 | optimizing both $x
    1
    2
    3
    4
    5
    6
    % bloaty a64-2/bin/clang -- a64-0/bin/clang
    FILE SIZE VM SIZE
    -------------- --------------
    -5.4% -1.13Mi [ = ] 0 .strtab
    -50.9% -4.09Mi [ = ] 0 .symtab
    -4.0% -5.22Mi [ = ] 0 TOTAL

    However, omitting a mapping symbol at offset 0 for sections withinstructions is currently non-conformant. An ABI update has been requestedto address this.

    Some interoperability issues might arise, but a significant portionof users don't care.

    particularly when linking text sections with trailing data assembledusing the traditional behavior, or when a linker script combinesnon-text and text sections. These scenarios could potentially confusedisassemblers.

    There are some interop issues that a significant portion of usersdon't care.

    In a text section with trailing data assembled using the traditionalbehavior, the last mapping symbol will be $d. During thelinking process, if the subsequent section lacks an initial$x due to the new behavior, the result could confusedisassemblers.

    In addition, a linker script combining non-text sections and textsections might confuse disassemblers.

    1
    2
    3
    4
    SECTIONS {
    ...
    mix : { *(.data.*) *(.text.foo) }
    }

    In conclusion, the proposed alternative scheme solves the symboltable bloat problem, but it requires careful consideration of complianceand interoperability. An opt-in assembler option might be useful. Withthis optimization in place, the remaining symbols would primarilyoriginate from range extension thunks, prebuilt libraries, or highlyspecialized assembly implementations.

    Mapping symbols forrange extension thunks

    When lld creates an AArch64range extension thunk, it defines a $x symbol tosignify the A64 state. This symbol is only relevant when the precedingsection ends with the data state, a scenario that's only possible withthe traditional assembler behavior.

    Given the infrequency of range extension thunks, the $xsymbol overhead is generally tolerable.

    Peculiar alignmentbehavior in GNU assembler

    In contrast to LLVM's integrated assembler, which restricts statetransitions to instructions and data commands, GNU assembler introducesadditional state transitions for alignments. These alignments can beeither implicit (arising from alignment requirements) or explicit(specified through directives). This behavior has led to someinteresting edge cases and bug fixes over time. (See related code beside[PATCH][GAS][AARCH64]Fix"align directive causes MAP_DATA symbol to be lost"https://sourceware.org/bugzilla/show_bug.cgi?id=20364)

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    .section .foo1,"a"
    // no $d
    .word 0

    .section .foo2,"a"
    // $d
    .balign 4
    .word 0

    .section .foo3,"a"
    // $d
    .word 0
    // $a
    nop

    In the example, .foo1 only contains data directives andthere is no $d. However, .foo2 includes analignment directive, triggering the creation of a $dsymbol. Interestingly, .foo3 starts with data but ends withan instruction, necessitating both a $d and an$a mapping symbol.

    It's worth noting that DWARF sections, typically generated by thecompiler, don't include explicit alignment directives. They behavesimilarly to the .foo1 example and lack an associated$d mapping symbol.

    RISC-V ISA extension

    RISC-V mapping symbols are similar to AArch64, but with a notableextension:

    1
    2
    $x<ISA>       | Start of a sequence of instructions with <ISA> extension.
    $x<ISA>.<any>

    The alternative scheme for optimizing symbol table size can beadapted to accommodate RISC-V's $x<ISA> symbols. Theapproach remains the same: add an ending $x<ISA> onlyif the final data in a text section doesn't belong to the desiredISA.

    The alternative scheme can be adapted to work with$x<ISA>: Add an ending $x<ISA> ifthe final data isn't of the desired ISA.

    This adaptation works seamlessly as long as all relocatable filesprovided to the linker share the same baseline ISA. However, inscenarios where the relocatable files are more heterogeneous, a crucialquestion arises: which state should be restored at section end? Wouldthe subsequent section in the linker output be compiled with differentISA extensions?

    Mach-OLC_DATA_IN_CODE load command

    In contrast to ELF's symbol pair approach, Mach-O employs theLC_DATA_IN_CODE load command to store non-instructionranges within code sections. This method is remarkably compact, witheach entry requiring only 8 bytes. ELF, on the other hand, needs twosymbols ($d and $x) per data region, consuming48 bytes (in ELFCLASS64) in the symbol table.

    1
    2
    3
    4
    5
    struct data_in_code_entry {
    uint32_t offset; /* from mach_header to start of data range*/
    uint16_t length; /* number of bytes in data range */
    uint16_t kind; /* a DICE_KIND_* value */
    };

    In llvm-project, the possible kind values are defined inllvm/include/llvm/BinaryFormat/MachO.h. I recentlyrefactored the generic MCAssembler to place this Mach-Ospecific thing, alongside others, to MachObjectWriter.

    1
    2
    3
    4
    5
    6
    7
    8
    enum DataRegionType {
    // Constants for the "kind" field in a data_in_code_entry structure
    DICE_KIND_DATA = 1u,
    DICE_KIND_JUMP_TABLE8 = 2u,
    DICE_KIND_JUMP_TABLE16 = 3u,
    DICE_KIND_JUMP_TABLE32 = 4u,
    DICE_KIND_ABS_JUMP_TABLE32 = 5u
    };

    Achieving Mach-O'sefficiency in ELF

    Given ELF's symbol table bloat due to the st_size member(myprevious analysis), how can it attain Mach-O's level of efficiency?Instead of introducing a new format, we can leverage the standard ELFfeature: SHF_COMPRESSED.

    Both .symtab and .strtab lack theSHF_ALLOC flag, making them eligible for compressionwithout requiring any changes to the ELF specification.

    • LLVMdiscussion
    • A featurerequest has already been submitted to binutils to explore thispossibility.

    The implementation within LLVM shouldn't be overly complex, and I'mmore than willing to contribute if there's interest from thecommunity.



沪ICP备19023445号-2号
友情链接