IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    Port LLVM XRay to Apple systems

    MaskRay发表于 2023-06-19 07:24:49
    love 0

    I do not use Apple products, but I sometimes like investigatingMach-O as an object file format and my llvm-project changes sometimesneed to work around the quirks.

    LLVM has a function call tracing system called XRay. It supports manyarchitectures on Linux and some BSDs but does not support Apple systems.If the target triple is x86_64-apple-darwin*, you maynotice that Clang will allow you to perform compilation, but linkingwill fail. For other architectures, Clang will reject it.

    1
    2
    3
    4
    5
    6
    % clang --target=x86_64-apple-darwin -fxray-instrument -fxray-instruction-threshold=1 -c a.c
    % clang --target=x86_64-apple-darwin -fxray-instrument -fxray-instruction-threshold=1 a.o
    ld: in section __DATA,xray_instr_map reloc 0: X86_64_RELOC_SUBTRACTOR must have r_extern=1 file 'a.o' for architecture x86_64
    clang: error: linker command failed with exit code 1 (use -v to see invocation)
    % clang --target=arm64-apple-darwin -fxray-instrument -fxray-instruction-threshold=1 -c a.c
    clang: error: the clang compiler does not support '-fxray-instrument on aarch64-apple-darwin'

    So I dove down the rabbit hole.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
            .section        __DATA,xray_instr_map
    Lxray_sleds_start0:
    Ltmp0:
    .quad Lxray_sled_0-Ltmp0
    .quad Lfunc_begin0-(Ltmp0+8)
    .byte 0x00
    .byte 0x00
    .byte 0x02
    .space 13
    Ltmp1:
    .quad Lxray_sled_1-Ltmp1
    .quad Lfunc_begin0-(Ltmp1+8)
    .byte 0x01
    .byte 0x00
    .byte 0x02
    .space 13
    Lxray_sleds_end0:

    .quad Lxray_sled_0-Ltmp0 is represented as a pair ofrelocations (llvm-readobj -r a.o)

    1
    2
    0x0 0 3 0 X86_64_RELOC_SUBTRACTOR 0 xray_instr_map
    0x0 0 3 1 X86_64_RELOC_UNSIGNED 0 _foo

    X86_64_RELOC_SUBTRACTOR is an external relocation(r_extern==1) where r_symbolnum references asymbol table entry. Linkers will give an error ifr_extern=0. The symbols with the "L" prefix are calledtemporary symbols in LLVMMC and are not present in the symbol table.LLVM integrated assembler tries to convert the subtractor symbol to anatom, that is, a non-temporary symbol defined in the same section.However, since xray_instr_map does not define anon-temporary symbol, the X86_64_RELOC_SUBTRACTORrelocation will have no associated symbol and its r_externwill be 0.

    To fix this issue, we need to define a non-temporary symbol. We canaccomplish this by renaming Lxray_sleds_start0 tolxray_sleds_start0. In LLVMMC,LinkerPrivateGlobalPrefix is set to "l" for Apple targets.We can define an overload ofMCContext::createLinkerPrivateTempSymbol(const Twine &Name)to allow LLVMMC to select an unused symbol starting withlxray_sleds_start. (There is a pitfall: "ltmp" should becompiler internal.) For ELF targets, theMCContext::createLinkerPrivateTempSymbol function creates atemporary symbol starting with ".L".

    Oleksii Lozovskyi reported that the-fxray-function-index option has been broken.

    • (default): no function index
    • -fxray-function-index: no function index
    • -fno-xray-function-index: xray_fn_idxsection is present

    -fxray-function-index was the default. It turns out thata clangDriverrefactoring accidentally caused this regression, but the negativevariable name was probably the main reason. XRay tests were not greatand there was no driver test to catch this. I fixedthis.

    Now that -fxray-function-index is back, we get thexray_fn_idx section by default. The section containsentries like the following:

    1
    2
    3
    4
    .section        __DATA,xray_fn_idx
    .p2align 4, 0x90
    .quad Lxray_sleds_start0
    .quad Lxray_sleds_end0

    BTW: I noticed an old workaround (2015) for ld64 and proposed toremove it: https://reviews.llvm.org/D152831.

    These absolute addresses require rebase opcodes in the specialsection __LINKEDIT,__rebase. This is not great and I wantedto fix it back in 2020 but never got around to do it. This motivatedme to actually fix the issue and create https://reviews.llvm.org/D152661 to change the[start,end) representation to the(pc_relative_start, size) representation.

    My initial attempt somehow wrote something like this. I took adifference of two labels, and right shifted it by 5 to get the number ofsleds.

    1
    2
    3
    4
    5
            .section        __DATA,xray_fn_idx,regular,live_support
    .p2align 4, 0x90
    lxray_fn_idx0:
    .quad lxray_sleds_start0-lxray_fn_idx0
    .quad (Lxray_sleds_end0-lxray_sleds_start0)>>5

    This approach works on ELF targets but not on Mach-O targets due to apile of assembler issues.

    1
    2
    3
    4
    % clang -c --target=x86_64-apple-darwin a.s
    /tmp/a-b0645c.s:53:8: error: expected relocatable expression
    .quad (Lxray_sleds_end0-lxray_sleds_start0)>>5
    ^

    Assembler issues

    When assembling an assembly file into an object file, an expressioncan be evaluated in multiple steps. Two steps are particularlyimportant:

    • Parsing time. At this stage, We have a MCAssemblerobject but no MCAsmLayout object. Instruction operands andcertain directives like .if require the ability to evaluatean expression early.
    • Object file writing time. At this stage, we have both aMCAssembler object and a MCAsmLayout object.The MCAsmLayout object provides information about theoffset of each fragment.

    The first issue is not specific to this case and is also encounteredin ELF. The following assembly code should assemble to the hex pairs01000001, but Clang fails to compute .if .-1b == 3.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    % cat x.s
    1:
    .byte 1
    .space 2
    .if .-1b == 3
    .byte 1
    .else
    .byte 0
    .endif
    % clang -c x.s
    x.s:4:5: error: expected absolute expression
    .if .-1b == 3
    ^

    Jian Cai implementedlimited expression folding support to LLVM integrated assembler tosupport the Linux kernel arm use case.

    1
    2
    arch/arm/mm/proc-v7.S:169:143: error: expected absolute expression
    .pushsection ".alt.smp.init", "a" ; .long 9998b ;9997: orr r1, r1, #((1 << 0) | (1 << 6))|(3 << 3) ; .if . - 9997b == 2 ; nop ; .endif ; .if . - 9997b != 4 ; .error "ALT_UP() content must assemble to exactly 4 bytes"; .endif ; .popsection

    I have added support for MCFillFragment(.space and .fill) and for A-B, where A is apending label (which will be reassigned to a real fragment influshPendingLabels()). Now, the LLVM integrated assemblercan successfully assemble x.s when aMCAssembler object is present. However, evaluation stilldoes not work without a MCAssembler object, which isexpected.

    1
    2
    3
    4
    % llvm-mc x.s -filetype=null
    x.s:4:5: error: expected absolute expression
    .if .-1b == 3
    ^

    Then I noticed a potential pitfall for Mach-O inMCSection::flushPendingLabels. When flushing pendinglabels, it did not ensure that the new fragment inherits the previousatom symbol. I fixed this issue, although I haven't been able to createa test case to verify this behavior.

    After this fix,.quad (Lxray_sleds_end0-lxray_sleds_start0)>>5 canbe successfully assembled. However, during "directobject emission", an errorexpected relocatable expression will be reported. The issueis quite subtle.

    In the case of direct object emission, where LLVM IR is directlylowered to an object file bypassing assembly (e.g., usingclang -c a.c instead of clang -c a.s orclang -c --save-temps a.c), the assembler information isnot used for parsing(MCStreamer::UseAssemblerInfoForParsing). As a result, theassembly code.quad (Lxray_sleds_end0-lxray_sleds_start0)>>5 willbe transformed into a fixup.

    During object writing time, we have a MCAsmLayout objectand atom information for fragments. However, the labelLxray_sleds_end0 will belong to the next fragment, causingthe condition inMachObjectWriter::isSymbolRefDifferenceFullyResolvedImpl tofail. In my opinion, it may be necessary to relax the condition in thiscase.

    Linker dead stripping

    To support linker dead stripping, also known as linker garbagecollection, we need to add the S_ATTR_LIVE_SUPPORTattribute to the two sections xray_instr_map andxray_fn_idx.

    Runtime issue

    compiler-rt/lib/xray/xray_trampoline_x86_64.S used.Ltmp* symbols which are temporary for ELF butnon-temporary for Mach-O. The non-temporary labels become atoms and cancause bad dead stripping behaviors.

    I fixed the problem by using the LOCAL_LABEL macro,which generates an "L" symbol specifically for Mach-O.

    Driver change

    After AArch64 works, we can make Clang Driver accept--target=arm64-apple-darwin for XRay.



沪ICP备19023445号-2号
友情链接