I do not use Apple products, but I sometimes like investigatingMach-O as an object file format and my llvm-project changes sometimesneed to work around the quirks.
LLVM has a function call tracing system called XRay. It supports manyarchitectures on Linux and some BSDs but does not support Apple systems.If the target triple is x86_64-apple-darwin*
, you maynotice that Clang will allow you to perform compilation, but linkingwill fail. For other architectures, Clang will reject it.
1 | % clang --target=x86_64-apple-darwin -fxray-instrument -fxray-instruction-threshold=1 -c a.c |
So I dove down the rabbit hole.
1 | .section __DATA,xray_instr_map |
.quad Lxray_sled_0-Ltmp0
is represented as a pair ofrelocations (llvm-readobj -r a.o
) 1
20x0 0 3 0 X86_64_RELOC_SUBTRACTOR 0 xray_instr_map
0x0 0 3 1 X86_64_RELOC_UNSIGNED 0 _foo
X86_64_RELOC_SUBTRACTOR
is an external relocation(r_extern==1
) where r_symbolnum
references asymbol table entry. Linkers will give an error ifr_extern=0
. The symbols with the "L" prefix are calledtemporary symbols in LLVMMC and are not present in the symbol table.LLVM integrated assembler tries to convert the subtractor symbol to anatom, that is, a non-temporary symbol defined in the same section.However, since xray_instr_map
does not define anon-temporary symbol, the X86_64_RELOC_SUBTRACTOR
relocation will have no associated symbol and its r_extern
will be 0.
To fix this issue, we need to define a non-temporary symbol. We canaccomplish this by renaming Lxray_sleds_start0
tolxray_sleds_start0
. In LLVMMC,LinkerPrivateGlobalPrefix
is set to "l" for Apple targets.We can define an overload ofMCContext::createLinkerPrivateTempSymbol(const Twine &Name)
to allow LLVMMC to select an unused symbol starting withlxray_sleds_start
. (There is a pitfall: "ltmp" should becompiler internal.) For ELF targets, theMCContext::createLinkerPrivateTempSymbol
function creates atemporary symbol starting with ".L".
Oleksii Lozovskyi reported that the-fxray-function-index
option has been broken.
-fxray-function-index
: no function index-fno-xray-function-index
: xray_fn_idx
section is present-fxray-function-index
was the default. It turns out thata clangDriverrefactoring accidentally caused this regression, but the negativevariable name was probably the main reason. XRay tests were not greatand there was no driver test to catch this. I fixedthis.
Now that -fxray-function-index
is back, we get thexray_fn_idx
section by default. The section containsentries like the following: 1
2
3
4.section __DATA,xray_fn_idx
.p2align 4, 0x90
.quad Lxray_sleds_start0
.quad Lxray_sleds_end0
BTW: I noticed an old workaround (2015) for ld64 and proposed toremove it: https://reviews.llvm.org/D152831.
These absolute addresses require rebase opcodes in the specialsection __LINKEDIT,__rebase
. This is not great and I wantedto fix it back in 2020 but never got around to do it. This motivatedme to actually fix the issue and create https://reviews.llvm.org/D152661 to change the[start,end)
representation to the(pc_relative_start, size)
representation.
My initial attempt somehow wrote something like this. I took adifference of two labels, and right shifted it by 5 to get the number ofsleds. 1
2
3
4
5 .section __DATA,xray_fn_idx,regular,live_support
.p2align 4, 0x90
lxray_fn_idx0:
.quad lxray_sleds_start0-lxray_fn_idx0
.quad (Lxray_sleds_end0-lxray_sleds_start0)>>5
This approach works on ELF targets but not on Mach-O targets due to apile of assembler issues.
1 | % clang -c --target=x86_64-apple-darwin a.s |
When assembling an assembly file into an object file, an expressioncan be evaluated in multiple steps. Two steps are particularlyimportant:
MCAssembler
object but no MCAsmLayout
object. Instruction operands andcertain directives like .if
require the ability to evaluatean expression early.MCAssembler
object and a MCAsmLayout
object.The MCAsmLayout
object provides information about theoffset of each fragment.The first issue is not specific to this case and is also encounteredin ELF. The following assembly code should assemble to the hex pairs01000001, but Clang fails to compute .if .-1b == 3
.1
2
3
4
5
6
7
8
9
10
11
12
13% cat x.s
1:
.byte 1
.space 2
.if .-1b == 3
.byte 1
.else
.byte 0
.endif
% clang -c x.s
x.s:4:5: error: expected absolute expression
.if .-1b == 3
^
Jian Cai implementedlimited expression folding support to LLVM integrated assembler tosupport the Linux kernel arm use case. 1
2arch/arm/mm/proc-v7.S:169:143: error: expected absolute expression
.pushsection ".alt.smp.init", "a" ; .long 9998b ;9997: orr r1, r1, #((1 << 0) | (1 << 6))|(3 << 3) ; .if . - 9997b == 2 ; nop ; .endif ; .if . - 9997b != 4 ; .error "ALT_UP() content must assemble to exactly 4 bytes"; .endif ; .popsection
I have added support for MCFillFragment
(.space
and .fill
) and for A-B, where A is apending label (which will be reassigned to a real fragment influshPendingLabels()
). Now, the LLVM integrated assemblercan successfully assemble x.s
when aMCAssembler
object is present. However, evaluation stilldoes not work without a MCAssembler
object, which isexpected. 1
2
3
4% llvm-mc x.s -filetype=null
x.s:4:5: error: expected absolute expression
.if .-1b == 3
^
Then I noticed a potential pitfall for Mach-O
inMCSection::flushPendingLabels
. When flushing pendinglabels, it did not ensure that the new fragment inherits the previousatom symbol. I fixed this issue, although I haven't been able to createa test case to verify this behavior.
After this fix,.quad (Lxray_sleds_end0-lxray_sleds_start0)>>5
canbe successfully assembled. However, during "directobject emission", an errorexpected relocatable expression
will be reported. The issueis quite subtle.
In the case of direct object emission, where LLVM IR is directlylowered to an object file bypassing assembly (e.g., usingclang -c a.c
instead of clang -c a.s
orclang -c --save-temps a.c
), the assembler information isnot used for parsing(MCStreamer::UseAssemblerInfoForParsing
). As a result, theassembly code.quad (Lxray_sleds_end0-lxray_sleds_start0)>>5
willbe transformed into a fixup.
During object writing time, we have a MCAsmLayout
objectand atom information for fragments. However, the labelLxray_sleds_end0
will belong to the next fragment, causingthe condition inMachObjectWriter::isSymbolRefDifferenceFullyResolvedImpl
tofail. In my opinion, it may be necessary to relax the condition in thiscase.
To support linker dead stripping, also known as linker garbagecollection, we need to add the S_ATTR_LIVE_SUPPORT
attribute to the two sections xray_instr_map
andxray_fn_idx
.
compiler-rt/lib/xray/xray_trampoline_x86_64.S
used.Ltmp*
symbols which are temporary for ELF butnon-temporary for Mach-O. The non-temporary labels become atoms and cancause bad dead stripping behaviors.
I fixed the problem by using the LOCAL_LABEL
macro,which generates an "L" symbol specifically for Mach-O.
After AArch64 works, we can make Clang Driver accept--target=arm64-apple-darwin
for XRay.