IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    A deep dive into Clang\'s source file compilation

    MaskRay发表于 2023-09-25 07:30:02
    love 0

    Clang is a C/C++ compiler that generates LLVM IR and utilitizes LLVMto generate relocatable object files. Using the classic three-stagecompiler structure, the stages can be described as follows:

    1
    C/C++ =(front end)=> LLVM IR =(middle end)=> LLVM IR (optimized) =(back end)=> relocatable object file

    If we follow the representation of functions and instructions, a moredetailed diagram looks like this:

    1
    C/C++ =(front end)=> LLVM IR =(middle end)=> LLVM IR (optimized) =(instruction selector)=> MachineInstr =(AsmPrinter)=> MCInst =(assembler)=> relocatable object file

    LLVM and Clang are designed as a collection of libraries. This postdescribes how different libraries work together to create the finalrelocatable object file. I will focus on how a function goes through themultiple compilation stages.

    This post describes how different libraries work together to createthe final relocatable object file.

    Compiler frontend

    The compiler frontend primarily comprises the followinglibraries:

    • clangDriver
    • clangFrontend
    • clangParse and clangSema
    • clangCodeGen

    Let's use a C++ source file as an example.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    % cat a.cc
    template <typename T>
    T div(T a, T b) {
    return a / b;
    }

    __attribute__((noinline))
    int foo(int a, int b, int c) {
    int s = a + b;
    return div(s, c);
    }

    int main() {
    return foo(3, 2, 1);
    }
    % clang++ -g a.cc

    The entry point of the Clang executable is implemented inclang/tools/driver/. clang_main creates aclang::driver::Driver instance, callsBuildCompilation to construct aclang::driver::Compilation instance, and then callsExecuteCompilation.

    clangDriver

    clangDriverparses the command line arguments, constructs compilation actions,assigns actions to tools, generates commands for these tools, andexecutes the commands.

    You may read Compilerdriver and cross compilation for additional information.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    BuildCompilation
    getToolchain
    HandleImmediateArgs
    BuildInputs
    BuildActions
    handleArguments
    BuildJobs
    BuildJobsForAction
    ToolChain::SelectTool
    Clang::ConstructJob
    Clang::RenderTargetOptions
    renderDebugOptions
    ExecuteCompilation
    ExecuteJobs
    ExecuteJob
    CC1Command::Execute
    cc1_main

    For clang++ -g a.cc, clangDriver identifies thefollowing phases: preprocessor, compiler (C++ to LLVM IR), backend,assembler, and linker. The first several phases can be performed by onesingle clang::driver::tools::Clang object (also known asClang cc1), while the final phase requires an external program (thelinker).

    1
    2
    3
    4
    % clang++ -g a.cc '-###'
    ...
    "/tmp/Rel/bin/clang-18" "-cc1" "-triple" "x86_64-unknown-linux-gnu" "-emit-obj" ...
    "/usr/bin/ld" "-pie" ... -o a.out ... /tmp/a-f58f75.o ...

    cc1_main in clangDriver callsExecuteCompilerInvocation defined in clangFrontend.

    clangFrontend

    clangFrontend defines CompilerInstance,which manages various classes, includingCompilerInvocation, DiagnosticsEngine,TargetInfo, FileManager,SourceManager, Preprocessor,ASTContext, ASTConsumer, andSema.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    ExecuteCompilerInvocation
    CreateFrontendAction
    ExecuteAction
    FrontendAction::BeginSourceFile
    CompilerInstance::createFileManager
    CompilerInstance::createSourceManager
    CompilerInstance::createPreprocessor
    CompilerInstance::createASTContext
    CreateWrappedASTConsumer
    BackendConsumer::BackendConsumer
    CodeGenerator::CodeGenerator
    CompilerInstance::setASTConsumer
    CodeGeneratorImpl::Initialize
    CodeGenModule::CodeGenModule
    FrontendAction::Execute
    FrontendAction::ExecutionAction => CodeGenAction
    ASTFrontendAction::ExecuteAction
    CompilerInstance::createSema
    ParseAST
    FrontendAction::EndSourceFile

    In ExecuteCompilerInvocation, a FrontActionis created based on the CompilerInstance argument and thenexecuted. When using the -emit-obj option, the selectedFrontAction is an EmitObjAction, which is aderivative of CodeGenAction.

    During FrontendAction::BeginSourceFile, several classesmentioned earlier are created, and a BackendConsumer isalso established. The BackendConsumer serves as a wrapperaround CodeGenerator, which is another derivative ofASTConsumer. Finally, inFrontendAction::BeginSourceFile,CompilerInstance::setASTConsumer is called to create aCodeGenModule object, responsible for managing an LLVM IRmodule.

    In FrontendAction::Execute,CodeGenAction::ExecuteAction is invoked, primarily handlingthe compilation of LLVM IR files. This function, in turn, calls the basefunction ASTFrontendAction::ExecuteAction, which, inessence, triggers the entry point of clangParse:ParseAST.

    clangParse and clangSema

    clangParse consumes tokens from clangLexand invokes parser actions, many of which are named Act*,defined in clangSema. clangSema performssemantic analysis and generates AST nodes.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    ParseAST
    ParseFirstTopLevelDecl
    Sema::ActOnStartOfTranslationUnit
    ParseTopLevelDecl
    ParseDeclarationOrFunctionDefinition
    ParseDeclOrFunctionDefInternal
    ParseDeclGroup
    ParseFunctionDefinition
    ParseFunctionStatementBody
    ParseCompoundStatementBody
    ParseStatementOrDeclaration
    ParseStatementOrDeclarationAfterAttributes
    Sema::ActOnDeclStmt
    Sema::ActOnCompoundStmt
    Sema::ActOnFinishFunctionBody
    Sema::ConvertDeclToDeclGroup
    BackendConsumer::HandleTopLevelDecl
    BackendConsumer::HandleTranslationUnit

    In the end, we get a full AST (actually a misnomer as therepresentation is not abstract, not only about syntax, and is not atree). ParseAST calls virtual functionsHandleTopLevelDecl andHandleTranslationUnit.

    clangCodeGen

    BackendConsumer defined in clangCodeGen overridesHandleTopLevelDecl and HandleTranslationUnitto perform LLVM IR and machine code generation.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    BackendConsumer::HandleTopLevelDecl
    CodeGenModule::EmitTopLevelDecl
    CodeGenModule::EmitGlobal
    CodeGenModule::EmitGlobalDefinition
    CodeGenModule::EmitGlobalFunctionDefinition
    CodeGenFunction::CodeGenFunction
    CodeGenFunction::GenerateCode
    CodeGenFunction::StartFunction
    CodeGenFunction::EmitFunctionBody
    BackendConsumer::HandleTranslationUnit
    setupLLVMOptimizationRemarks
    EmitBackendOutput
    EmitAssemblyHelper::EmitAssembly
    EmitAssemblyHelper::RunOptimizationPipeline
    PassBuilder::buildPerModuleDefaultPipeline // There are other build*Pipeline alternatives
    MPM.run(*TheModule, MAM);
    EmitAssemblyHelper::RunCodegenPipeline
    EmitAssemblyHelper::AddEmitPasses
    LLVMTargetMachine::addPassesToEmitFile
    CodeGenPasses.run(*TheModule);

    BackendConsumer::HandleTopLevelDecl generates LLVM IRfor each top-level declaration. This means that Clang generates afunction at a time.

    BackendConsumer::HandleTranslationUnit invokesEmitBackendOutput to create an LLVM IR file, an assemblyfile, or a relocatable object file. EmitBackendOutputestablishes an optimization pipeline and a machine code generationpipeline.

    Now let's explore CodeGenFunction::EmitFunctionBody.Generating IR for a variable declaration and a return statement involvethe following functions, among others:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    EmitFunctionBody
    EmitCompoundStmtWithoutScope
    EmitStmt
    EmitSimpleStmt
    EmitDeclStmt
    EmitDecl
    EmitVarDecl
    EmitStopPoint
    EmitReturnStmt
    EmitScalarExpr
    ScalarExprEmitter::EmitBinOps

    After generating the LLVM IR, clangCodeGen proceeds to executeEmitAssemblyHelper::RunOptimizationPipeline to performmiddle-end optimizations and subsequentlyEmitAssemblyHelper::RunCodegenPipeline to generate machinecode.

    For our integer division example, the unoptimized LLVM IR looks likethis (attributes are omitted):

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    $_Z3divIiET_S0_S0_ = comdat any

    ; Function Attrs: mustprogress noinline uwtable
    define dso_local noundef i32 @_Z3fooiii(i32 noundef %0, i32 noundef %1, i32 noundef %2) #0 {
    %4 = alloca i32, align 4
    %5 = alloca i32, align 4
    %6 = alloca i32, align 4
    %7 = alloca i32, align 4
    store i32 %0, ptr %4, align 4, !tbaa !5
    store i32 %1, ptr %5, align 4, !tbaa !5
    store i32 %2, ptr %6, align 4, !tbaa !5
    call void @llvm.lifetime.start.p0(i64 4, ptr %7) #4
    %8 = load i32, ptr %4, align 4, !tbaa !5
    %9 = load i32, ptr %5, align 4, !tbaa !5
    %10 = add nsw i32 %8, %9
    store i32 %10, ptr %7, align 4, !tbaa !5
    %11 = load i32, ptr %7, align 4, !tbaa !5
    %12 = load i32, ptr %6, align 4, !tbaa !5
    %13 = call noundef i32 @_Z3divIiET_S0_S0_(i32 noundef %11, i32 noundef %12)
    call void @llvm.lifetime.end.p0(i64 4, ptr %7) #4
    ret i32 %13
    }

    ; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
    declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #1

    ; Function Attrs: mustprogress nounwind uwtable
    define linkonce_odr dso_local noundef i32 @_Z3divIiET_S0_S0_(i32 noundef %0, i32 noundef %1) #2 comdat {
    %3 = alloca i32, align 4
    %4 = alloca i32, align 4
    store i32 %0, ptr %3, align 4, !tbaa !5
    store i32 %1, ptr %4, align 4, !tbaa !5
    %5 = load i32, ptr %3, align 4, !tbaa !5
    %6 = load i32, ptr %4, align 4, !tbaa !5
    %7 = sdiv i32 %5, %6
    ret i32 %7
    }

    ; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
    declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture) #1

    ; Function Attrs: mustprogress norecurse uwtable
    define dso_local noundef i32 @main() #3 {
    %1 = alloca i32, align 4
    store i32 0, ptr %1, align 4
    %2 = call noundef i32 @_Z3fooiii(i32 noundef 3, i32 noundef 2, i32 noundef 1)
    ret i32 %2
    }

    Compiler middle end

    EmitAssemblyHelper::RunOptimizationPipeline creates apass manager to schedule the middle-end optimization pipeline. This passmanager executes numerous optimization passes and analyses.

    The option -mllvm -print-pipeline-passes providesinsight into these passes:

    1
    2
    % clang -c -O1 -mllvm -print-pipeline-passes a.c
    annotation2metadata,forceattrs,declare-to-assign,inferattrs,coro-early,...

    For our integer division example, the optimized LLVM IR looks likethis:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    ; Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none) uwtable
    define dso_local noundef i32 @_Z3fooiii(i32 noundef %a, i32 noundef %b, i32 noundef %c) local_unnamed_addr #0 {
    entry:
    %add = add nsw i32 %b, %a
    %div.i = sdiv i32 %add, %c
    ret i32 %div.i
    }

    ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable
    define dso_local noundef i32 @main() local_unnamed_addr #1 {
    entry:
    %call = call noundef i32 @_Z3fooiii(i32 noundef 3, i32 noundef 2, i32 noundef 1)
    ret i32 %call
    }

    The most notaceable differences are the following

    • SROAPass runs mem2reg and optimizes out manyAllocaInsts
    • InlinerPass inlines the instantiated divfunction into its caller foo

    Compiler back end

    The demarcation between the middle end and the back end may not beentirely distinct. WithinLLVMTargetMachine::addPassesToEmitFile, several IR passesare scheduled. It's reasonable to consider these IR passes (everythingbefore addCoreISelPasses) as part of the middle end, whilethe phase beginning with instruction selection can be regarded as theactual back end.

    Here is an overview ofLLVMTargetMachine::addPassesToEmitFile:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    LLVMTargetMachine::addPassesToEmitFile
    addPassesToGenerateCode
    TargetPassConfig::addISelPasses
    TargetPassConfig::addIRPasses => X86PassConfig::addIRPasses
    TargetPassConfig::addCodeGenPrepare # -O1 or above
    TargetPassConfig::addPassesToHandleExceptions
    TargetPassConfig::addISelPrepare
    TargetPassConfig::addPreISel => X86PassConfig::addPreISel
    addPass(createCallBrPass());
    addPass(createPrintFunctionPass(...)); # if -print-isel-input
    addPass(createVerifierPass());
    TargetPassConfig::addCoreISelPasses # SelectionDAG or GlobalISel
    TargetPassConfig::addMachinePasses
    LLVMTargetMachine::addAsmPrinter
    PM.add(createPrintMIRPass(Out)); // if -stop-before or -stop-after
    PM.add(createFreeMachineFunctionPass());

    These IR and machine passes are scheduled by the legacy pass manager.The option -mllvm -debug-pass=Structure provides insightinto these passes:

    1
    clang -c -O1 a.c -mllvm -debug-pass=Structure

    Instruction selector

    There are three instruction selectors: SelectionDAG, FastISel, andGlobalISel. FastISel is integrated within the SelectionDAGframework.

    For most targets, FastISel is the default for clang -O0while SelectionDAG is the default for optimized builds. However, formost AArch64 -O0 configurations, GlobalISel is thedefault.

    SelectionDAG

    See https://llvm.org/docs/WritingAnLLVMBackend.html#instruction-selector.

    1
    2
    3
    4
    5
    SectionDAG: normal code path
    LLVM IR =(visit)=> SDNode =(DAGCombiner,LegalizeTypes,DAGCombiner,Legalize,DAGCombiner,Select,Schedule)=> MachineInstr

    SectionDAG: FastISel (fast but not optimal)
    LLVM IR =(FastISel)=> MachineInstr

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    TargetPassConfig::addCoreISelPasses
    addInstSelector(); // add an instance of a target-specific derived class of SelectionDAGISel
    addPass(&FinalizeISelID);

    SelectionDAGISel::runOnMachineFunction
    TargetMachine::resetTargetOptions
    SelectionDAGISel::SelectAllBasicBlocks
    SelectionDAGISel::SelectBasicBlock
    SelectionDAGBuilder::visit
    SelectionDAGISel::CodeGenAndEmitDAG
    CurDAG->Combine(BeforeLegalizeTypes, AA, OptLevel);
    Changed = CurDAG->LegalizeTypes();
    if (Changed)
    CurDAG->Combine(AfterLegalizeTypes, AA, OptLevel);
    Changed = CurDAG->LegalizeVectors();
    if (Changed) {
    CurDAG->LegalizeTypes();
    CurDAG->Combine(AfterLegalizeVectorOps, AA, OptLevel);
    }
    CurDAG->Legalize();
    DoInstructionSelection
    Select
    SelectCode
    PreprocessISelDAG()
    Scheduler->Run(CurDAG, FuncInfo->MBB);
    Scheduler->EmitSchedule(FuncInfo->InsertPt);
    EmitNode
    CreateMachineInstr

    Each backend implements a derived class of SelectionDAGISel. Forexample, the X86 backend implements X86DAGToDAGISel andoverrides runOnMachineFunction to set up variables likeX86Subtarget and then invokes the base functionSelectionDAGISel::runOnMachineFunction.

    SelectionDAGISel creates aSelectionDAGBuilder. For each basic block,SelectionDAGISel::SelectBasicBlock iterates over all IRinstructions and calls SelectionDAGBuilder::visit on them,creating a new SDNode for each Value thatbecomes part of the DAG.

    The initial DAG may contain types and operations that are notnatively supported by the target.SelectionDAGISel::CodeGenAndEmitDAG invokesLegalizeTypes and Legalize to convertunsupported types and operations to supported ones.

    For the IR instruction %add = add nsw i32 %b, %a,SelectionDAGBuilder::visit creates a newSDNode with the opcode ISD::ADD.

    1
    2
    3
    SelectionDAGBuilder::visit
    SelectionDAGBuilder::visitAdd
    SelectionDAGBuilder::visitBinary # binary operators are handled similarly

    For llvm.memset, the call stack may resemble thefollowing:

    1
    2
    3
    4
    SelectionDAGBuilder::visit
    SelectionDAGBuilder::visitCall
    SelectionDAGBuilder::visitIntrinsicCall
    SelectionDAG::getMemset

    ScheduleDAGSDNodes::EmitSchedule emits the machine code(MachineInstrs) in the scheduled order.

    FastISel, typically used for clang -O0, represents afast path of SelectionDAG that generates less optimized machinecode.

    When FastISel is enabled, SelectAllBasicBlocks tries toskip SelectBasicBlock and select instructions withFastISel. However, FastISel only handles a subset of IR instructions.For unhandled instructions, SelectAllBasicBlocks falls backto SelectBasicBlock to handle the remaining instructions inthe basic block.

    GlobalISel

    GlobalISelis a new instruction selection framework that operates on the entirefunction, in contrast to the basic block view of SelectionDAG.GlobalISel offers improved performance and modularity.

    The design of the generic MachineInstr replaces anintermediate representation, SDNode, which was used in theSelectionDAG framework.

    1
    LLVM IR =(IRTranslator)=> generic MachineInstr =(Legalizer,RegBankSelect,GlobalInstructionSelect)=> MachineInstr
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    TargetPassConfig::addCoreISelPasses
    addIRTranslator();
    addPreLegalizeMachineIR();
    addPreRegBankSelect();
    addRegBankSelect();
    addPreGlobalInstructionSelect();
    addGlobalInstructionSelect();
    Pass to reset the MachineFunction if the ISel failed.
    addInstSelector();
    addPass(&FinalizeISelID);

    Machine passes

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    TargetPassConfig::addMachinePasses
    TargetPassConfig::addSSAOptimization
    TargetPassConfig::addPreRegAlloc
    TargetPassConfig::addOptimizedRegAlloc
    TargetPassConfig::addPostRegAlloc
    addPass(createPrologEpilogInserterPass());
    TargetPassConfig::addMachineLateOptimization
    TargetPassConfig::addPreSched2
    TargetPassConfig::addPreEmitPass
    // basic block section related passes
    TargetPassConfig::addPreEmitPass2

    AsmPrinter

    This target-specific AsmPrinter pass convertsMachineInstrs to MCInsts and emits them to aMCStreamer.

    MC

    Clang has the capability to output either assembly code or an objectfile. Generating an object file directly without involving an assembleris referred to as "direct object emission".

    To provide a unified interface, MCStreamer is created tohandle the emission of both assembly code and object files. The twoprimary subclasses of MCStreamer areMCAsmStreamer and MCObjectStreamer,responsible for emitting assembly code and machine coderespectively.

    LLVMAsmPrinter calls the MCStreamer API to emit assemblycode or machine code.

    In the case of an assembly input file, LLVM creates anMCAsmParser object (LLVMMCParser) and a target-specificMCTargetAsmParser object. The MCAsmParser isresponsible for tokenizing the input, parsing assembler directives, andinvoking the MCTargetAsmParser to parse an instruction.Both the MCAsmParser and MCTargetAsmParserobjects can call the MCStreamer API to emit assembly codeor machine code.

    For an instruction parsed by the MCTargetAsmParser, ifthe streamer is an MCAsmStreamer, the MCInstwill be pretty-printed. If the streamer is an MCELFStreamer(other object file formats are similar),MCELFStreamer::emitInstToData will use${Target}MCCodeEmitter from LLVM${Target}Desc to encode theMCInst, emit its byte sequence, and records neededrelocations. An ELFObjectWriter object is used to write therelocatable object file.

    You may read my post Assemblers for more informationabout the LLVM integrated assembler.



沪ICP备19023445号-2号
友情链接