IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    Integrated assembler improvements in LLVM 19

    MaskRay发表于 2024-07-01 03:02:45
    love 0

    Within the LLVM project, MC is a library responsible for handlingassembly, disassembly, and object file formats. [Intro to the LLVM MCProject], which was written back in 2010, remains a good source tounderstand the high-level structures.

    In the latest release cycle, substantial effort has been dedicated torefining MC's internal representation for improved performance andreadability. These changes have decreased compiletime significantly. This blog post will delve into the details,providing insights into the specific changes.

    MCAssembler andMCAsmLayout

    MCAssembler manages assembler states (includingsections, symbols) and implements layout and object file writing afterparsing. MCAsmLayout, tightly coupled withMCAssembler, was in charge of symbol and fragment offsetsduring MCAssembler::Finish. Many MCAssemblerand MCExpr member functions have aconst MCAsmLayout & parameter, contributing to slightoverhead.

    I have startedto merge MCAsmLayout into MCAssembler andsimplify fragment management.

    Fragments

    Fragments, representing sequences of non-relaxable instructions,relaxable instruction, alignment directives, and other elements.MCDataFragment and MCRelaxableFragment, whosesizes are crucial for memory consumption, have undergone severaloptimizations:

    • MCInst:decrease inline element count to 6
    • [MC]Reduce size of MCDataFragment by 8 bytes by @aengelke
    • [MC] MoveMCFragment::Atom to MCSectionMachO::Atoms

    The fragment management system has also been streamlined bytransitioning from a doubly-linked list (llvm::iplist) to asingly-linkedlist, eliminating unnecessary overhead. A few prerequisite commitsremoved backward iterator requirements.

    Furthermore, I introducedthe "current fragment" concept (MCSteamer::CurFrag)allowing for faster appending of new fragments.

    Symbols

    @aengelke madetwo noticeable performance improvements:

    • [MC] Don'tevaluate name of unnamed symbols
    • [MC]Eliminate two symbol-related hash maps

    In MCObjectStreamer, newly defined labels were put intoa "pending label" list and initially assigned to aMCDummyFragment associated with the current section. Thesymbols will be reassigned to a new fragment when the next instructionor directive is parsed. This pending label system, while necessary foraligned bundling, introduced complexity and potential for subtlebugs.

    To streamline this, I revampedthe implementation by directly adjusting offsets of existingfragments, eliminating over 100 lines of code and reducing the potentialfor errors.

    Details: In 2014, [MC]Attach labels to existing fragments instead of using a separatefragment introduced flushPendingLabels aligned bundlingassembler extension for Native Client. [MC] Match labels to existingfragments even when switching sections., built on top offlushPendingLabels, added further complication. Worse, alot of directive handling code have to addflushPendingLabels and a missingflushPendingLabels could lead to subtle bugs related toincorrect symbol values.

    For the following code, aligned bundling requires that.Ltmp is defined at addl.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    $ clang var.c -S -o - -fPIC -m32
    ...
    .bundle_lock align_to_end
    calll .L0$pb
    .bundle_unlock
    .L0$pb:
    popl %eax
    .Ltmp0:
    addl $_GLOBAL_OFFSET_TABLE_+(.Ltmp0-.L0$pb), %eax

    ( MCAsmStreamer doesn't callflushPendingLabels in its handlers. This is the reason thatwe cannot change MCAsmStreamer::getAssemblerPtr to use aMCAssembler and changeAsmParser::parseExpression. )

    Sections

    Section handling was also refined. MCStreamer maintains a a sectionstack for features like.push_section/.pop_section/.previousdirectives. Many functions relied on the section stack for loading thecurrent section, which introduced overhead due to the additionalindirection and nullable return values.

    By leveraging the "current fragment" concept, the need for thesection stack was eliminated in most cases, simplifying the codebase andimproving efficiency.

    I have eliminated nullable getCurrentSectionOnly usesand changedgetCurrentSectionOnly to leverage the "current fragment"concept. This change also revealedan interesting quirk in NVPTX assembly related to DWARFsections.

    Section symbols

    Many section creation functions (MCContext::get*Section)had a const char *BeginSymNameparameter to support the section symbol concept. This led to issueswhen we want to treat the section name as a symbol. In 2017, theparameter was removedfor ELF, streamlining section symbol handling.

    I changed the way MC handles section symbols for COFF andremoved the unused parameters for WebAssembly. The work planned forXCOFF is outlined in https://github.com/llvm/llvm-project/issues/96810.

    Expression evaluation

    Expression evaluation in MCAssembler::layout previouslyemployed a complex lazy evaluation algorithm, which aimed to minize thenumber of fragment relaxation. It proved difficult to understand andresulted in complex recursiondetection.

    To address this, I removed lazy evaluation in favor of eagerfragment relaxation. This simplification improved the reliability ofthe layout process, eliminating the need for intricate workarounds likethe MCFragment::IsBeingLaidOut flag introduced earlier.

    Note: the benefit of lazy evaluation largely diminished when https://reviews.llvm.org/D76114 invalidated all sectionsto fix the correctness issue for the following assembly:

    1
    2
    3
    4
    5
    6
    7
    8
    .section .text1,"ax"
    .skip after-before,0x0
    .L0:

    .section .text2
    before:
    jmp .L0
    after:

    In addition, I removed anoverload of isSymbolRefDifferenceFullyResolvedImpl, enablingconstant folding for variable differences in Mach-O.

    Target-specificfeatures misplaced in the generic implementation

    I have made efforts to relocate target-specific functionalities totheir respective target implementations:

    • [MC,X86]emitInstruction: remove virtual function calls due to Intel JCCErratum
    • [MC,X86]De-virtualize emitPrefix
    • [MC]Move Mach-O specific getAtom and isSectionAtomizableBySymbols to Mach-Ofiles
    • [MC]Move ELFWriter::createMemtagRelocs toAArch64TargetELFStreamer::finish

    Summary

    LLVM 19 introduces significant enhancements to the integratedassembler, resulting in notable performance gains, reduced memory usage,and a more streamlined codebase. These optimizations pave the way forfuture improvements.

    I compiled the preprocessed SQLite Amalgamation (from llvm-test-suite)using a Release build of clang:

    build2024-05-142024-06-30
    -O00.53040.4942
    -O0 -g0.88180.8026
    -O26.2496.087
    -O2 -g7.9317.659

    clang -c -w sqlite3.i

    The AsmPrinter pass, which couples the assembler, dominates the-O0 compile time. I have modified the-ftime-report mechanism to decrease the per-instructionoverhead. The decrease in compile time matches the decrease in the spentin AsmPrinter. Coupled with a recent observation that BOLT, whichheavily utilizes MC, is ~8% faster, it's clear that MC modificationshave yielded substantial improvements.

    Roadmap

    Symbol redefinition

    llvm-mc:Diagnose misuse (mix) of defined symbols and labels. addedredefinition error. This was refined many times. I hope to fix this inthe future.

    Addressing Mach-O weakness

    The Mach-O assembler lacks the robustness of its ELF counterpart.Notably, certain aspects of the Mach-O implementation, such as theconditions for constant folding inMachObjectWriter::isSymbolRefDifferenceFullyResolvedImpl(different for x86-64 and AArch64), warrant revisiting.

    Additionally, the Mach-O has a hack to maintaincompatibility with Apple cctools assembler, when the relocationaddend is non-zero.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    .data
    a = b + 4
    .long a # ARM64_RELOC_UNSIGNED(a) instead of b; This might work around the linker bug(?) when the referenced symbol is b and the addend is 4.

    c = d
    .long c # ARM64_RELOC_UNSIGNED(d)

    y:
    x = y + 4
    .long x # ARM64_RELOC_UNSIGNED(x) instead of y

    This leads to another workaround inMCFragment.cpp:getSymbolOffsetImpl ([MC] Recursively calculatesymbol offset), which is to support the following assembly:

    1
    2
    3
    4
    l_a:
    l_b = l_a + 1
    l_c = l_b
    .long l_c


沪ICP备19023445号-2号
友情链接