IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    Port LLVM XRay to Apple systems

    MaskRay发表于 2023-08-10 19:44:05
    love 0

    I do not use Apple products myself, but I sometimes delve into Mach-Odue to my interest in object file formats. Additionally, my LLVM/Clangchanges sometimes require some understanding of Mach-O. Occasionally, Ineed to understand the format to some extent to work around its quirks(the old format inherited many problems of "a.out").

    Recently, there has been interest (from Oleksii Lozovskyi) inenabling XRay, a functioncall tracing system in LLVM, to work on Apple systems. Intrigued bythis, I decided to delve into the details and investigate the necessarychanges. XRay supports many 64-bit architectures on Linux and some BSDs.I became acquainted with XRay back in 2017 and made some casualcontributions since then.

    If the target triple is x86_64-apple-darwin, you maynotice that Clang will allow you to perform compilation, but linkingwill fail.

    1
    2
    3
    4
    % clang --target=x86_64-apple-darwin -fxray-instrument -fxray-instruction-threshold=1 -c a.c
    % clang --target=x86_64-apple-darwin -fxray-instrument -fxray-instruction-threshold=1 a.o
    ld: in section __DATA,xray_instr_map reloc 0: X86_64_RELOC_SUBTRACTOR must have r_extern=1 file 'a.o' for architecture x86_64
    clang: error: linker command failed with exit code 1 (use -v to see invocation)

    For arm64-apple-darwin, Clang rejected-fxray-instrument, but we can bypass this driver detectionwith -Xclang. (This is now accepted.)

    1
    2
    % clang --target=arm64-apple-darwin -fxray-instrument -fxray-instruction-threshold=1 -c a.c
    clang: error: the clang compiler does not support '-fxray-instrument on aarch64-apple-darwin'
    1
    2
    % clang --target=arm64-apple-darwin -Xclang -fxray-instrument -Xclang -fxray-instruction-threshold=1 -c a.c
    fatal error: error in backend: unsupported relocation of local symbol ''. Must have non-local symbol earlier in section.

    I noticed that this fatal error is caused by an old workaround (2015)for ld64 and proposed to remove it: https://reviews.llvm.org/D152831.

    SUBTRACTORrelocation types require r_extern==1

    Let's examineclang -S --target=x86_64-apple-darwin -fxray-instrument -fxray-instruction-threshold=1generated assembly.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
            .section        __DATA,xray_instr_map
    Lxray_sleds_start0:
    Ltmp0:
    .quad Lxray_sled_0-Ltmp0 // problematic
    .quad Lfunc_begin0-(Ltmp0+8) // problematic
    .byte 0x00
    .byte 0x00
    .byte 0x02
    .space 13
    Ltmp1:
    .quad Lxray_sled_1-Ltmp1 // problematic
    .quad Lfunc_begin0-(Ltmp1+8) // problematic
    .byte 0x01
    .byte 0x00
    .byte 0x02
    .space 13
    Lxray_sleds_end0:

    I have marked 4 label differences as problematic. Let's examine thefirst one, .quad Lxray_sled_0-Ltmp0, which is representedas a pair of relocations (llvm-readobj -r a.o)

    1
    2
    0x0 0 3 0 X86_64_RELOC_SUBTRACTOR 0 xray_instr_map
    0x0 0 3 1 X86_64_RELOC_UNSIGNED 0 _foo

    X86_64_RELOC_SUBTRACTOR is an external relocation(r_extern==1) where r_symbolnum references asymbol table entry. If r_extern==0, Linkers will produce anerror. Symbols prefixed with "L" are considered temporary symbols inLLVMMC and are not included in the symbol table. LLVM integratedassembler attempts to rewrite A-B intoA-B'+offset where B' can be included in thesymbol table. B' is called an atom and should be a non-temporary symbolin the same section. However, since xray_instr_map does notdefine a non-temporary symbol, the X86_64_RELOC_SUBTRACTORrelocation will have no associated symbol, and its r_externvalue will be 0.

    To fix this issue, we need to define a non-temporary symbol in thesection. We can accomplish this by renamingLxray_sleds_start0 to lxray_sleds_start0 ("L"to "l"). In LLVMMC, LinkerPrivateGlobalPrefix is set to "l"for Apple targets. We can define an overload ofMCContext::createLinkerPrivateTempSymbol(const Twine &Name)to allow LLVMMC to select an unused symbol starting withlxray_sleds_start. (There is a pitfall: we should not use"ltmp[0-9]+" which may collide with other compiler internal symbols.)For ELF targets, theMCContext::createLinkerPrivateTempSymbol function creates atemporary symbol starting with ".L".

    Note: GNU assembler calls .L and Lsystem-specific local label prefixes.

    Function index section

    Oleksii Lozovskyi reported that the-fxray-function-index option was not functioning asexpected.

    • (default): no function index section
    • -fxray-function-index: no function index section
    • -fno-xray-function-index: function index section(xray_fn_idx) is present

    Initially, the function index section was always present. Theintroduction of the -f[no-]xray-function-index optioncaused confusion due to the use of a negatively named variable, leadingto potential misunderstandings. A clangDriverrefactoring accidentally introduced this regression. I have resolvedthe issue for Clang 17.

    Now that -fxray-function-index is back, we get thexray_fn_idx section by default. The section containsentries like the following:

    1
    2
    3
    4
    .section        __DATA,xray_fn_idx
    .p2align 4, 0x90
    .quad Lxray_sleds_start0
    .quad Lxray_sleds_end0

    These absolute addresses require rebase opcodes in the specialsection __LINKEDIT,__rebase. This is not great and I wantedto fix it back in 2020 but never got around to do it. This motivatedme to actually fix the issue and create https://reviews.llvm.org/D152661 to change the[start,end) representation to the(pc_relative_start, size) representation.

    My initial attempt somehow wrote something like this. I took adifference of two labels, and right shifted it by 5 to get the number ofsleds.

    1
    2
    3
    4
    5
            .section        __DATA,xray_fn_idx,regular,live_support
    .p2align 4, 0x90
    lxray_fn_idx0:
    .quad lxray_sleds_start0-lxray_fn_idx0
    .quad (Lxray_sleds_end0-lxray_sleds_start0)>>5

    This approach works on ELF targets but not on Mach-O targets due to apile of assembler issues.

    1
    2
    3
    4
    % clang -c --target=x86_64-apple-darwin a.s
    /tmp/a-b0645c.s:53:8: error: expected relocatable expression
    .quad (Lxray_sleds_end0-lxray_sleds_start0)>>5
    ^

    Assembler issues

    When assembling an assembly file into an object file, an expressioncan be evaluated in multiple steps. Two steps are particularlyimportant:

    • Parsing time. At this stage, We have a MCAssemblerobject but no MCAsmLayout object. Instruction operands andcertain directives like .if require the ability to evaluatean expression early.
    • Object file writing time. At this stage, we have both aMCAssembler object and a MCAsmLayout object.The MCAsmLayout object provides information about theoffset of each fragment.

    The first issue is not specific to this case and is also encounteredin ELF. The following assembly code should assemble to the hex pairs01000001, but Clang fails to compute .if .-1b == 3.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    % cat x.s
    1:
    .byte 1
    .space 2
    .if .-1b == 3
    .byte 1
    .else
    .byte 0
    .endif
    % clang -c x.s
    x.s:4:5: error: expected absolute expression
    .if .-1b == 3
    ^

    In 2019, Jian Cai implemented limitedexpression folding support to LLVM integrated assembler to support theLinux kernel arm use case.

    1
    2
    arch/arm/mm/proc-v7.S:169:143: error: expected absolute expression
    .pushsection ".alt.smp.init", "a" ; .long 9998b ;9997: orr r1, r1, #((1 << 0) | (1 << 6))|(3 << 3) ; .if . - 9997b == 2 ; nop ; .endif ; .if . - 9997b != 4 ; .error "ALT_UP() content must assemble to exactly 4 bytes"; .endif ; .popsection

    I have just added support for MCFillFragment(.space and .fill) and for A-B, where A is apending label (which will be reassigned to a real fragment influshPendingLabels()). Latest LLVM integrated assembler(milestone: LLVM 17) can successfully assemble x.s when aMCAssembler object is present. However, evaluation stilldoes not work without a MCAssembler object, which isexpected.

    1
    2
    3
    4
    % llvm-mc x.s -filetype=null
    x.s:4:5: error: expected absolute expression
    .if .-1b == 3
    ^

    Then I noticed a potential pitfall for Mach-O inMCSection::flushPendingLabels. When flushing pendinglabels, it did not ensure that the new fragment inherits the previousatom symbol. I fixed this issue, although I haven't been able to createa test case to verify this behavior.

    After this fix,.quad (Lxray_sleds_end0-lxray_sleds_start0)>>5 canbe successfully assembled. However, during "directobject emission", an errorexpected relocatable expression will be reported. The issueis quite subtle.

    In the case of direct object emission, where LLVM IR is directlylowered to an object file bypassing assembly (e.g., usingclang -c a.c instead of clang -c a.s orclang -c --save-temps a.c), the assembler information isnot used at parse time(MCStreamer::UseAssemblerInfoForParsing). As a result, theassembly code.quad (Lxray_sleds_end0-lxray_sleds_start0)>>5 willbe transformed into a fixup. Lxray_sleds_end0 is a pendinglabel which will then be assigned to the next fragment when AsmPrinteremits new piece of data to the xray_instr_map section.

    During object writing time, we have a MCAsmLayout objectand atom information for fragments. LLVMMC evaluates the fixup with theMCAsmLayout object. However, sincelxray_sleds_start0 and Lxray_sleds_end0 havedifferent atoms, causing the condition inMachObjectWriter::isSymbolRefDifferenceFullyResolvedImpl tofail. In my opinion, it may be necessary to relax the condition in thiscase.

    Linker dead stripping

    Both xray_instr_map and xray_fn_idxsections contain independent pieces. To support linker dead stripping,also known as linker garbagecollection, we need to add the S_ATTR_LIVE_SUPPORTattribute to the two sections.

    Runtime library issues

    The runtime library is located at compiler-rt/lib/xrayand there are a number of issues.

    First, compiler-rt/lib/xray/xray_trampoline_x86_64.Sused .Ltmp* symbols which are temporary for ELF butnon-temporary for Mach-O. The non-temporary labels become atoms and cancause bad dead stripping behaviors. I fixed the problem by using theLOCAL_LABEL macro, which generates an "L" symbolspecifically for Mach-O.

    compiler-rt/lib/xray/xray_trampoline_AArch64.S had asimilar problem. I fixed it to look like the following:

    1
    2
    3
    4
    5
    6
    7
    #include "../sanitizer_common/sanitizer_asm.h"
    ...
    .p2align 2
    .global ASM_SYMBOL(__xray_FunctionEntry)
    ASM_HIDDEN(__xray_FunctionEntry)
    ASM_TYPE_FUNCTION(__xray_FunctionEntry)
    ASM_SYMBOL(__xray_FunctionEntry):

    For Apple targets, # define ASM_SYMBOL(symbol) _##symbolprepends an underscore to the symbol name. For ELF targets,ASM_SYMBOL is the identity function.

    Compiling compiler-rt/lib/xray/xray_trampoline_AArch64.Swill give us an external definition (N_SECT | N_EXT)___xray_FunctionEntry. The undefined reference from C++code gives an undefined symbol ___xray_FunctionEntry(N_UNDF | N_EXT), which can be resolved by the definition.

    1
    Success = patchFunctionEntry(Enable, FuncId, Sled, __xray_FunctionEntry); // extern "C" void __xray_FunctionEntry();

    Second, we need https://reviews.llvm.org/D153221 to build Mach-Oruntimes for multiple architectures.

    After the build system, we may discover new issues.

    Driver change

    Normally, we want to enable a Clang driver option when a feature isfully ready. In this case, we know the code generation has been workingfor a long time, and the full support (metadata section refactoring andruntime) will be implemented soon. Therefore, I believe it's acceptableto make the driver accept--target=arm64-apple-darwin -fxray-instrument as it makestesting the runtime more convenient.

    Epilogue

    As the work to port LLVM XRay to Apple systems continues, it isencouraging to witness the progress achieved thus far. The endeavor hasgiven me opportunities to explore the Mach-O object file format.

    There is a pendingpatch for ELF RISC-V support.



沪ICP备19023445号-2号
友情链接