Clang is a C/C++ compiler that generates LLVM IR and utilitizes LLVMto generate relocatable object files. Using the classic three-stagecompiler structure, the stages can be described as follows:1
C/C++ =(front end)=> LLVM IR =(middle end)=> LLVM IR (optimized) =(back end)=> relocatable object file
If we follow the representation of functions and instructions, a moredetailed diagram looks like this: 1
C/C++ =(front end)=> LLVM IR =(middle end)=> LLVM IR (optimized) =(instruction selector)=> MachineInstr =(AsmPrinter)=> MCInst =(assembler)=> relocatable object file
LLVM and Clang are designed as a collection of libraries. This postdescribes how different libraries work together to create the finalrelocatable object file. I will focus on how a function goes through themultiple compilation stages.
This post describes how different libraries work together to createthe final relocatable object file.
The compiler frontend primarily comprises the followinglibraries:
Let's use a C++ source file as an example.
1 | % cat a.cc |
The entry point of the Clang executable is implemented inclang/tools/driver/
. clang_main
creates aclang::driver::Driver
instance, callsBuildCompilation
to construct aclang::driver::Compilation
instance, and then callsExecuteCompilation
.
clangDriverparses the command line arguments, constructs compilation actions,assigns actions to tools, generates commands for these tools, andexecutes the commands.
You may read Compilerdriver and cross compilation for additional information.
1 | BuildCompilation |
For clang++ -g a.cc
, clangDriver identifies thefollowing phases: preprocessor, compiler (C++ to LLVM IR), backend,assembler, and linker. The first several phases can be performed by onesingle clang::driver::tools::Clang
object (also known asClang cc1), while the final phase requires an external program (thelinker).
1 | % clang++ -g a.cc '-###' |
cc1_main
in clangDriver callsExecuteCompilerInvocation
defined in clangFrontend.
clangFrontend
defines CompilerInstance
,which manages various classes, includingCompilerInvocation
, DiagnosticsEngine
,TargetInfo
, FileManager
,SourceManager
, Preprocessor
,ASTContext
, ASTConsumer
, andSema
.
1 | ExecuteCompilerInvocation |
In ExecuteCompilerInvocation
, a FrontAction
is created based on the CompilerInstance
argument and thenexecuted. When using the -emit-obj
option, the selectedFrontAction
is an EmitObjAction
, which is aderivative of CodeGenAction
.
During FrontendAction::BeginSourceFile
, several classesmentioned earlier are created, and a BackendConsumer
isalso established. The BackendConsumer
serves as a wrapperaround CodeGenerator
, which is another derivative ofASTConsumer
. Finally, inFrontendAction::BeginSourceFile
,CompilerInstance::setASTConsumer
is called to create aCodeGenModule
object, responsible for managing an LLVM IRmodule.
In FrontendAction::Execute
,CodeGenAction::ExecuteAction
is invoked, primarily handlingthe compilation of LLVM IR files. This function, in turn, calls the basefunction ASTFrontendAction::ExecuteAction
, which, inessence, triggers the entry point of clangParse
:ParseAST
.
clangParse
consumes tokens from clangLex
and invokes parser actions, many of which are named Act*
,defined in clangSema
. clangSema
performssemantic analysis and generates AST nodes.
1 | ParseAST |
In the end, we get a full AST (actually a misnomer as therepresentation is not abstract, not only about syntax, and is not atree). ParseAST
calls virtual functionsHandleTopLevelDecl
andHandleTranslationUnit
.
BackendConsumer
defined in clangCodeGen overridesHandleTopLevelDecl
and HandleTranslationUnit
to perform LLVM IR and machine code generation.
1 | BackendConsumer::HandleTopLevelDecl |
BackendConsumer::HandleTopLevelDecl
generates LLVM IRfor each top-level declaration. This means that Clang generates afunction at a time.
BackendConsumer::HandleTranslationUnit
invokesEmitBackendOutput
to create an LLVM IR file, an assemblyfile, or a relocatable object file. EmitBackendOutput
establishes an optimization pipeline and a machine code generationpipeline.
Now let's explore CodeGenFunction::EmitFunctionBody
.Generating IR for a variable declaration and a return statement involvethe following functions, among others: 1
2
3
4
5
6
7
8
9
10
11EmitFunctionBody
EmitCompoundStmtWithoutScope
EmitStmt
EmitSimpleStmt
EmitDeclStmt
EmitDecl
EmitVarDecl
EmitStopPoint
EmitReturnStmt
EmitScalarExpr
ScalarExprEmitter::EmitBinOps
After generating the LLVM IR, clangCodeGen proceeds to executeEmitAssemblyHelper::RunOptimizationPipeline
to performmiddle-end optimizations and subsequentlyEmitAssemblyHelper::RunCodegenPipeline
to generate machinecode.
For our integer division example, the unoptimized LLVM IR looks likethis (attributes are omitted): 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48$_Z3divIiET_S0_S0_ = comdat any
; Function Attrs: mustprogress noinline uwtable
define dso_local noundef i32 @_Z3fooiii(i32 noundef %0, i32 noundef %1, i32 noundef %2) #0 {
%4 = alloca i32, align 4
%5 = alloca i32, align 4
%6 = alloca i32, align 4
%7 = alloca i32, align 4
store i32 %0, ptr %4, align 4, !tbaa !5
store i32 %1, ptr %5, align 4, !tbaa !5
store i32 %2, ptr %6, align 4, !tbaa !5
call void @llvm.lifetime.start.p0(i64 4, ptr %7) #4
%8 = load i32, ptr %4, align 4, !tbaa !5
%9 = load i32, ptr %5, align 4, !tbaa !5
%10 = add nsw i32 %8, %9
store i32 %10, ptr %7, align 4, !tbaa !5
%11 = load i32, ptr %7, align 4, !tbaa !5
%12 = load i32, ptr %6, align 4, !tbaa !5
%13 = call noundef i32 @_Z3divIiET_S0_S0_(i32 noundef %11, i32 noundef %12)
call void @llvm.lifetime.end.p0(i64 4, ptr %7) #4
ret i32 %13
}
; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #1
; Function Attrs: mustprogress nounwind uwtable
define linkonce_odr dso_local noundef i32 @_Z3divIiET_S0_S0_(i32 noundef %0, i32 noundef %1) #2 comdat {
%3 = alloca i32, align 4
%4 = alloca i32, align 4
store i32 %0, ptr %3, align 4, !tbaa !5
store i32 %1, ptr %4, align 4, !tbaa !5
%5 = load i32, ptr %3, align 4, !tbaa !5
%6 = load i32, ptr %4, align 4, !tbaa !5
%7 = sdiv i32 %5, %6
ret i32 %7
}
; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture) #1
; Function Attrs: mustprogress norecurse uwtable
define dso_local noundef i32 @main() #3 {
%1 = alloca i32, align 4
store i32 0, ptr %1, align 4
%2 = call noundef i32 @_Z3fooiii(i32 noundef 3, i32 noundef 2, i32 noundef 1)
ret i32 %2
}
EmitAssemblyHelper::RunOptimizationPipeline
creates apass manager to schedule the middle-end optimization pipeline. This passmanager executes numerous optimization passes and analyses.
The option -mllvm -print-pipeline-passes
providesinsight into these passes: 1
2% clang -c -O1 -mllvm -print-pipeline-passes a.c
annotation2metadata,forceattrs,declare-to-assign,inferattrs,coro-early,...
For our integer division example, the optimized LLVM IR looks likethis: 1
2
3
4
5
6
7
8
9
10
11
12
13
14; Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none) uwtable
define dso_local noundef i32 @_Z3fooiii(i32 noundef %a, i32 noundef %b, i32 noundef %c) local_unnamed_addr #0 {
entry:
%add = add nsw i32 %b, %a
%div.i = sdiv i32 %add, %c
ret i32 %div.i
}
; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable
define dso_local noundef i32 @main() local_unnamed_addr #1 {
entry:
%call = call noundef i32 @_Z3fooiii(i32 noundef 3, i32 noundef 2, i32 noundef 1)
ret i32 %call
}
The most notaceable differences are the following
SROAPass
runs mem2reg and optimizes out manyAllocaInst
sInlinerPass
inlines the instantiated div
function into its caller foo
The demarcation between the middle end and the back end may not beentirely distinct. WithinLLVMTargetMachine::addPassesToEmitFile
, several IR passesare scheduled. It's reasonable to consider these IR passes (everythingbefore addCoreISelPasses
) as part of the middle end, whilethe phase beginning with instruction selection can be regarded as theactual back end.
Here is an overview ofLLVMTargetMachine::addPassesToEmitFile
:
1 | LLVMTargetMachine::addPassesToEmitFile |
These IR and machine passes are scheduled by the legacy pass manager.The option -mllvm -debug-pass=Structure
provides insightinto these passes: 1
clang -c -O1 a.c -mllvm -debug-pass=Structure
There are three instruction selectors: SelectionDAG, FastISel, andGlobalISel. FastISel is integrated within the SelectionDAGframework.
For most targets, FastISel is the default for clang -O0
while SelectionDAG is the default for optimized builds. However, formost AArch64 -O0
configurations, GlobalISel is thedefault.
See https://llvm.org/docs/WritingAnLLVMBackend.html#instruction-selector.1
2
3
4
5SectionDAG: normal code path
LLVM IR =(visit)=> SDNode =(DAGCombiner,LegalizeTypes,DAGCombiner,Legalize,DAGCombiner,Select,Schedule)=> MachineInstr
SectionDAG: FastISel (fast but not optimal)
LLVM IR =(FastISel)=> MachineInstr
1 | TargetPassConfig::addCoreISelPasses |
Each backend implements a derived class of SelectionDAGISel. Forexample, the X86 backend implements X86DAGToDAGISel
andoverrides runOnMachineFunction
to set up variables likeX86Subtarget
and then invokes the base functionSelectionDAGISel::runOnMachineFunction
.
SelectionDAGISel
creates aSelectionDAGBuilder
. For each basic block,SelectionDAGISel::SelectBasicBlock
iterates over all IRinstructions and calls SelectionDAGBuilder::visit
on them,creating a new SDNode
for each Value
thatbecomes part of the DAG.
The initial DAG may contain types and operations that are notnatively supported by the target.SelectionDAGISel::CodeGenAndEmitDAG
invokesLegalizeTypes
and Legalize
to convertunsupported types and operations to supported ones.
For the IR instruction %add = add nsw i32 %b, %a
,SelectionDAGBuilder::visit
creates a newSDNode
with the opcode ISD::ADD
.1
2
3SelectionDAGBuilder::visit
SelectionDAGBuilder::visitAdd
SelectionDAGBuilder::visitBinary # binary operators are handled similarly
For llvm.memset
, the call stack may resemble thefollowing: 1
2
3
4SelectionDAGBuilder::visit
SelectionDAGBuilder::visitCall
SelectionDAGBuilder::visitIntrinsicCall
SelectionDAG::getMemset
ScheduleDAGSDNodes::EmitSchedule
emits the machine code(MachineInstr
s) in the scheduled order.
FastISel, typically used for clang -O0
, represents afast path of SelectionDAG that generates less optimized machinecode.
When FastISel is enabled, SelectAllBasicBlocks
tries toskip SelectBasicBlock
and select instructions withFastISel. However, FastISel only handles a subset of IR instructions.For unhandled instructions, SelectAllBasicBlocks
falls backto SelectBasicBlock
to handle the remaining instructions inthe basic block.
GlobalISelis a new instruction selection framework that operates on the entirefunction, in contrast to the basic block view of SelectionDAG.GlobalISel offers improved performance and modularity.
The design of the generic MachineInstr
replaces anintermediate representation, SDNode
, which was used in theSelectionDAG framework.
1 | LLVM IR =(IRTranslator)=> generic MachineInstr =(Legalizer,RegBankSelect,GlobalInstructionSelect)=> MachineInstr |
1 | TargetPassConfig::addCoreISelPasses |
1 | TargetPassConfig::addMachinePasses |
This target-specific AsmPrinter pass convertsMachineInstr
s to MCInst
s and emits them to aMCStreamer
.
Clang has the capability to output either assembly code or an objectfile. Generating an object file directly without involving an assembleris referred to as "direct object emission".
To provide a unified interface, MCStreamer
is created tohandle the emission of both assembly code and object files. The twoprimary subclasses of MCStreamer
areMCAsmStreamer
and MCObjectStreamer
,responsible for emitting assembly code and machine coderespectively.
LLVMAsmPrinter calls the MCStreamer
API to emit assemblycode or machine code.
In the case of an assembly input file, LLVM creates anMCAsmParser
object (LLVMMCParser) and a target-specificMCTargetAsmParser
object. The MCAsmParser
isresponsible for tokenizing the input, parsing assembler directives, andinvoking the MCTargetAsmParser
to parse an instruction.Both the MCAsmParser
and MCTargetAsmParser
objects can call the MCStreamer
API to emit assembly codeor machine code.
For an instruction parsed by the MCTargetAsmParser
, ifthe streamer is an MCAsmStreamer
, the MCInst
will be pretty-printed. If the streamer is an MCELFStreamer
(other object file formats are similar),MCELFStreamer::emitInstToData
will use${Target}MCCodeEmitter
from LLVM${Target}Desc to encode theMCInst
, emit its byte sequence, and records neededrelocations. An ELFObjectWriter
object is used to write therelocatable object file.
You may read my post Assemblers for more informationabout the LLVM integrated assembler.