I do not use Apple products myself, but I sometimes delve into Mach-Odue to my interest in object file formats. Additionally, my LLVM/Clangchanges sometimes require some understanding of Mach-O. Occasionally, Ineed to understand the format to some extent to work around its quirks(the old format inherited many problems of "a.out").
Recently, there has been interest (from Oleksii Lozovskyi) inenabling XRay, a functioncall tracing system in LLVM, to work on Apple systems. Intrigued bythis, I decided to delve into the details and investigate the necessarychanges. XRay supports many 64-bit architectures on Linux and some BSDs.I became acquainted with XRay back in 2017 and made some casualcontributions since then.
If the target triple is x86_64-apple-darwin
, you maynotice that Clang will allow you to perform compilation, but linkingwill fail.
1 | % clang --target=x86_64-apple-darwin -fxray-instrument -fxray-instruction-threshold=1 -c a.c |
For arm64-apple-darwin
, Clang rejected-fxray-instrument
, but we can bypass this driver detectionwith -Xclang
. (This is now accepted.) 1
2% clang --target=arm64-apple-darwin -fxray-instrument -fxray-instruction-threshold=1 -c a.c
clang: error: the clang compiler does not support '-fxray-instrument on aarch64-apple-darwin'1
2% clang --target=arm64-apple-darwin -Xclang -fxray-instrument -Xclang -fxray-instruction-threshold=1 -c a.c
fatal error: error in backend: unsupported relocation of local symbol ''. Must have non-local symbol earlier in section.
I noticed that this fatal error is caused by an old workaround (2015)for ld64 and proposed to remove it: https://reviews.llvm.org/D152831.
r_extern==1
Let's examineclang -S --target=x86_64-apple-darwin -fxray-instrument -fxray-instruction-threshold=1
generated assembly. 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17 .section __DATA,xray_instr_map
Lxray_sleds_start0:
Ltmp0:
.quad Lxray_sled_0-Ltmp0 // problematic
.quad Lfunc_begin0-(Ltmp0+8) // problematic
.byte 0x00
.byte 0x00
.byte 0x02
.space 13
Ltmp1:
.quad Lxray_sled_1-Ltmp1 // problematic
.quad Lfunc_begin0-(Ltmp1+8) // problematic
.byte 0x01
.byte 0x00
.byte 0x02
.space 13
Lxray_sleds_end0:
I have marked 4 label differences as problematic. Let's examine thefirst one, .quad Lxray_sled_0-Ltmp0
, which is representedas a pair of relocations (llvm-readobj -r a.o
)1
20x0 0 3 0 X86_64_RELOC_SUBTRACTOR 0 xray_instr_map
0x0 0 3 1 X86_64_RELOC_UNSIGNED 0 _foo
X86_64_RELOC_SUBTRACTOR
is an external relocation(r_extern==1
) where r_symbolnum
references asymbol table entry. If r_extern==0
, Linkers will produce anerror. Symbols prefixed with "L" are considered temporary symbols inLLVMMC and are not included in the symbol table. LLVM integratedassembler attempts to rewrite A-B
intoA-B'+offset
where B'
can be included in thesymbol table. B' is called an atom and should be a non-temporary symbolin the same section. However, since xray_instr_map
does notdefine a non-temporary symbol, the X86_64_RELOC_SUBTRACTOR
relocation will have no associated symbol, and its r_extern
value will be 0.
To fix this issue, we need to define a non-temporary symbol in thesection. We can accomplish this by renamingLxray_sleds_start0
to lxray_sleds_start0
("L"to "l"). In LLVMMC, LinkerPrivateGlobalPrefix
is set to "l"for Apple targets. We can define an overload ofMCContext::createLinkerPrivateTempSymbol(const Twine &Name)
to allow LLVMMC to select an unused symbol starting withlxray_sleds_start
. (There is a pitfall: we should not use"ltmp[0-9]+" which may collide with other compiler internal symbols.)For ELF targets, theMCContext::createLinkerPrivateTempSymbol
function creates atemporary symbol starting with ".L".
Note: GNU assembler calls .L
and L
system-specific local label prefixes.
Oleksii Lozovskyi reported that the-fxray-function-index
option was not functioning asexpected.
-fxray-function-index
: no function index section-fno-xray-function-index
: function index section(xray_fn_idx
) is presentInitially, the function index section was always present. Theintroduction of the -f[no-]xray-function-index
optioncaused confusion due to the use of a negatively named variable, leadingto potential misunderstandings. A clangDriverrefactoring accidentally introduced this regression. I have resolvedthe issue for Clang 17.
Now that -fxray-function-index
is back, we get thexray_fn_idx
section by default. The section containsentries like the following: 1
2
3
4.section __DATA,xray_fn_idx
.p2align 4, 0x90
.quad Lxray_sleds_start0
.quad Lxray_sleds_end0
These absolute addresses require rebase opcodes in the specialsection __LINKEDIT,__rebase
. This is not great and I wantedto fix it back in 2020 but never got around to do it. This motivatedme to actually fix the issue and create https://reviews.llvm.org/D152661 to change the[start,end)
representation to the(pc_relative_start, size)
representation.
My initial attempt somehow wrote something like this. I took adifference of two labels, and right shifted it by 5 to get the number ofsleds. 1
2
3
4
5 .section __DATA,xray_fn_idx,regular,live_support
.p2align 4, 0x90
lxray_fn_idx0:
.quad lxray_sleds_start0-lxray_fn_idx0
.quad (Lxray_sleds_end0-lxray_sleds_start0)>>5
This approach works on ELF targets but not on Mach-O targets due to apile of assembler issues.
1 | % clang -c --target=x86_64-apple-darwin a.s |
When assembling an assembly file into an object file, an expressioncan be evaluated in multiple steps. Two steps are particularlyimportant:
MCAssembler
object but no MCAsmLayout
object. Instruction operands andcertain directives like .if
require the ability to evaluatean expression early.MCAssembler
object and a MCAsmLayout
object.The MCAsmLayout
object provides information about theoffset of each fragment.The first issue is not specific to this case and is also encounteredin ELF. The following assembly code should assemble to the hex pairs01000001, but Clang fails to compute .if .-1b == 3
.1
2
3
4
5
6
7
8
9
10
11
12
13% cat x.s
1:
.byte 1
.space 2
.if .-1b == 3
.byte 1
.else
.byte 0
.endif
% clang -c x.s
x.s:4:5: error: expected absolute expression
.if .-1b == 3
^
In 2019, Jian Cai implemented limitedexpression folding support to LLVM integrated assembler to support theLinux kernel arm use case. 1
2arch/arm/mm/proc-v7.S:169:143: error: expected absolute expression
.pushsection ".alt.smp.init", "a" ; .long 9998b ;9997: orr r1, r1, #((1 << 0) | (1 << 6))|(3 << 3) ; .if . - 9997b == 2 ; nop ; .endif ; .if . - 9997b != 4 ; .error "ALT_UP() content must assemble to exactly 4 bytes"; .endif ; .popsection
I have just added support for MCFillFragment
(.space
and .fill
) and for A-B, where A is apending label (which will be reassigned to a real fragment influshPendingLabels()
). Latest LLVM integrated assembler(milestone: LLVM 17) can successfully assemble x.s
when aMCAssembler
object is present. However, evaluation stilldoes not work without a MCAssembler
object, which isexpected. 1
2
3
4% llvm-mc x.s -filetype=null
x.s:4:5: error: expected absolute expression
.if .-1b == 3
^
Then I noticed a potential pitfall for Mach-O
inMCSection::flushPendingLabels
. When flushing pendinglabels, it did not ensure that the new fragment inherits the previousatom symbol. I fixed this issue, although I haven't been able to createa test case to verify this behavior.
After this fix,.quad (Lxray_sleds_end0-lxray_sleds_start0)>>5
canbe successfully assembled. However, during "directobject emission", an errorexpected relocatable expression
will be reported. The issueis quite subtle.
In the case of direct object emission, where LLVM IR is directlylowered to an object file bypassing assembly (e.g., usingclang -c a.c
instead of clang -c a.s
orclang -c --save-temps a.c
), the assembler information isnot used at parse time(MCStreamer::UseAssemblerInfoForParsing
). As a result, theassembly code.quad (Lxray_sleds_end0-lxray_sleds_start0)>>5
willbe transformed into a fixup. Lxray_sleds_end0
is a pendinglabel which will then be assigned to the next fragment when AsmPrinteremits new piece of data to the xray_instr_map
section.
During object writing time, we have a MCAsmLayout
objectand atom information for fragments. LLVMMC evaluates the fixup with theMCAsmLayout
object. However, sincelxray_sleds_start0
and Lxray_sleds_end0
havedifferent atoms, causing the condition inMachObjectWriter::isSymbolRefDifferenceFullyResolvedImpl
tofail. In my opinion, it may be necessary to relax the condition in thiscase.
Both xray_instr_map
and xray_fn_idx
sections contain independent pieces. To support linker dead stripping,also known as linker garbagecollection, we need to add the S_ATTR_LIVE_SUPPORT
attribute to the two sections.
The runtime library is located at compiler-rt/lib/xray
and there are a number of issues.
First, compiler-rt/lib/xray/xray_trampoline_x86_64.S
used .Ltmp*
symbols which are temporary for ELF butnon-temporary for Mach-O. The non-temporary labels become atoms and cancause bad dead stripping behaviors. I fixed the problem by using theLOCAL_LABEL
macro, which generates an "L" symbolspecifically for Mach-O.
compiler-rt/lib/xray/xray_trampoline_AArch64.S
had asimilar problem. I fixed it to look like the following:1
2
3
4
5
6
7#include "../sanitizer_common/sanitizer_asm.h"
...
.p2align 2
.global ASM_SYMBOL(__xray_FunctionEntry)
ASM_HIDDEN(__xray_FunctionEntry)
ASM_TYPE_FUNCTION(__xray_FunctionEntry)
ASM_SYMBOL(__xray_FunctionEntry):
For Apple targets, # define ASM_SYMBOL(symbol) _##symbol
prepends an underscore to the symbol name. For ELF targets,ASM_SYMBOL
is the identity function.
Compiling compiler-rt/lib/xray/xray_trampoline_AArch64.S
will give us an external definition (N_SECT | N_EXT
)___xray_FunctionEntry
. The undefined reference from C++code gives an undefined symbol ___xray_FunctionEntry
(N_UNDF | N_EXT
), which can be resolved by the definition.1
Success = patchFunctionEntry(Enable, FuncId, Sled, __xray_FunctionEntry); // extern "C" void __xray_FunctionEntry();
Second, we need https://reviews.llvm.org/D153221 to build Mach-Oruntimes for multiple architectures.
After the build system, we may discover new issues.
Normally, we want to enable a Clang driver option when a feature isfully ready. In this case, we know the code generation has been workingfor a long time, and the full support (metadata section refactoring andruntime) will be implemented soon. Therefore, I believe it's acceptableto make the driver accept--target=arm64-apple-darwin -fxray-instrument
as it makestesting the runtime more convenient.
As the work to port LLVM XRay to Apple systems continues, it isencouraging to witness the progress achieved thus far. The endeavor hasgiven me opportunities to explore the Mach-O object file format.
There is a pendingpatch for ELF RISC-V support.