Within the LLVM project, MC is a library responsible for handlingassembly, disassembly, and object file formats. [Intro to the LLVM MCProject], which was written back in 2010, remains a good source tounderstand the high-level structures.
In the latest release cycle, substantial effort has been dedicated torefining MC's internal representation for improved performance andreadability. These changes have decreased compiletime significantly. This blog post will delve into the details,providing insights into the specific changes.
MCAssembler
andMCAsmLayout
MCAssembler
manages assembler states (includingsections, symbols) and implements layout and object file writing afterparsing. MCAsmLayout
, tightly coupled withMCAssembler
, was in charge of symbol and fragment offsetsduring MCAssembler::Finish
. Many MCAssembler
and MCExpr
member functions have aconst MCAsmLayout &
parameter, contributing to slightoverhead.
I have startedto merge MCAsmLayout
into MCAssembler
andsimplify fragment management.
Fragments, representing sequences of non-relaxable instructions,relaxable instruction, alignment directives, and other elements.MCDataFragment
and MCRelaxableFragment
, whosesizes are crucial for memory consumption, have undergone severaloptimizations:
The fragment management system has also been streamlined bytransitioning from a doubly-linked list (llvm::iplist
) to asingly-linkedlist, eliminating unnecessary overhead. A few prerequisite commitsremoved backward iterator requirements.
Furthermore, I introducedthe "current fragment" concept (MCSteamer::CurFrag
)allowing for faster appending of new fragments.
@aengelke madetwo noticeable performance improvements:
In MCObjectStreamer
, newly defined labels were put intoa "pending label" list and initially assigned to aMCDummyFragment
associated with the current section. Thesymbols will be reassigned to a new fragment when the next instructionor directive is parsed. This pending label system, while necessary foraligned bundling, introduced complexity and potential for subtlebugs.
To streamline this, I revampedthe implementation by directly adjusting offsets of existingfragments, eliminating over 100 lines of code and reducing the potentialfor errors.
Details: In 2014, [MC]Attach labels to existing fragments instead of using a separatefragment introduced flushPendingLabels
aligned bundlingassembler extension for Native Client. [MC] Match labels to existingfragments even when switching sections., built on top offlushPendingLabels
, added further complication. Worse, alot of directive handling code have to addflushPendingLabels
and a missingflushPendingLabels
could lead to subtle bugs related toincorrect symbol values.
For the following code, aligned bundling requires that.Ltmp
is defined at addl
. 1
2
3
4
5
6
7
8
9$ clang var.c -S -o - -fPIC -m32
...
.bundle_lock align_to_end
calll .L0$pb
.bundle_unlock
.L0$pb:
popl %eax
.Ltmp0:
addl $_GLOBAL_OFFSET_TABLE_+(.Ltmp0-.L0$pb), %eax
( MCAsmStreamer
doesn't callflushPendingLabels
in its handlers. This is the reason thatwe cannot change MCAsmStreamer::getAssemblerPtr
to use aMCAssembler
and changeAsmParser::parseExpression
. )
Section handling was also refined. MCStreamer maintains a a sectionstack for features like.push_section
/.pop_section
/.previous
directives. Many functions relied on the section stack for loading thecurrent section, which introduced overhead due to the additionalindirection and nullable return values.
By leveraging the "current fragment" concept, the need for thesection stack was eliminated in most cases, simplifying the codebase andimproving efficiency.
I have eliminated nullable getCurrentSectionOnly
usesand changedgetCurrentSectionOnly
to leverage the "current fragment"concept. This change also revealedan interesting quirk in NVPTX assembly related to DWARFsections.
Many section creation functions (MCContext::get*Section
)had a const char *BeginSymName
parameter to support the section symbol concept. This led to issueswhen we want to treat the section name as a symbol. In 2017, theparameter was removedfor ELF, streamlining section symbol handling.
I changed the way MC handles section symbols for COFF andremoved the unused parameters for WebAssembly. The work planned forXCOFF is outlined in https://github.com/llvm/llvm-project/issues/96810.
Expression evaluation in MCAssembler::layout
previouslyemployed a complex lazy evaluation algorithm, which aimed to minize thenumber of fragment relaxation. It proved difficult to understand andresulted in complex recursiondetection.
To address this, I removed lazy evaluation in favor of eagerfragment relaxation. This simplification improved the reliability ofthe layout process, eliminating the need for intricate workarounds likethe MCFragment::IsBeingLaidOut
flag introduced earlier.
Note: the benefit of lazy evaluation largely diminished when https://reviews.llvm.org/D76114 invalidated all sectionsto fix the correctness issue for the following assembly:
1 | .section .text1,"ax" |
In addition, I removed anoverload of isSymbolRefDifferenceFullyResolvedImpl, enablingconstant folding for variable differences in Mach-O.
Target-specificfeatures misplaced in the generic implementationI have made efforts to relocate target-specific functionalities totheir respective target implementations:
LLVM 19 introduces significant enhancements to the integratedassembler, resulting in notable performance gains, reduced memory usage,and a more streamlined codebase. These optimizations pave the way forfuture improvements.
I compiled the preprocessed SQLite Amalgamation (from llvm-test-suite)using a Release build of clang:
build | 2024-05-14 | 2024-06-30 |
---|---|---|
-O0 | 0.5304 | 0.4942 |
-O0 -g | 0.8818 | 0.8026 |
-O2 | 6.249 | 6.087 |
-O2 -g | 7.931 | 7.659 |
clang -c -w sqlite3.i
The AsmPrinter pass, which couples the assembler, dominates the-O0
compile time. I have modified the-ftime-report
mechanism to decrease the per-instructionoverhead. The decrease in compile time matches the decrease in the spentin AsmPrinter. Coupled with a recent observation that BOLT, whichheavily utilizes MC, is ~8% faster, it's clear that MC modificationshave yielded substantial improvements.
llvm-mc:Diagnose misuse (mix) of defined symbols and labels. addedredefinition error. This was refined many times. I hope to fix this inthe future.
The Mach-O assembler lacks the robustness of its ELF counterpart.Notably, certain aspects of the Mach-O implementation, such as theconditions for constant folding inMachObjectWriter::isSymbolRefDifferenceFullyResolvedImpl
(different for x86-64 and AArch64), warrant revisiting.
Additionally, the Mach-O has a hack to maintaincompatibility with Apple cctools assembler, when the relocationaddend is non-zero.
1 | .data |
This leads to another workaround inMCFragment.cpp:getSymbolOffsetImpl
([MC] Recursively calculatesymbol offset), which is to support the following assembly:
1 | l_a: |