IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    Understanding and improving Clang -ftime-report

    MaskRay发表于 2025-01-12 22:32:03
    love 0

    Clang provides a few options to generate timing report. Among them,-ftime-report and -ftime-trace can be used toanalyze the performance of Clang's internal passes.

    • -fproc-stat-report records time and memory on spawnedprocesses (ld, and gas if-fno-integrated-as).
    • -ftime-trace, introduced in 2019, generates Clangtiming information in the Chrome Trace Event format (JSON). The formatsupports nested events, providing a rich view of the front end.
    • -ftime-report: The option name is borrowed fromGCC.

    This post focuses on the traditional -ftime-report,which uses a line-based textual format.

    Understanding-ftime-report output

    The output consists of information about multiple timer groups. Thelast group spans the largest interval and encompasses timing data fromother groups.

    Up to Clang 19, the last group is called "Clang front-end timereport". You would see something like the following.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    % clang -c -w -ftime-report ~/Dev/testsuite/sqlite3.i
    ...
    ===-------------------------------------------------------------------------===
    Miscellaneous Ungrouped Timers
    ===-------------------------------------------------------------------------===

    ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
    0.2993 ( 71.5%) 0.1069 ( 93.5%) 0.4062 ( 76.3%) 0.4066 ( 76.2%) Code Generation Time
    0.1190 ( 28.5%) 0.0074 ( 6.5%) 0.1264 ( 23.7%) 0.1270 ( 23.8%) LLVM IR Generation Time
    0.4183 (100.0%) 0.1143 (100.0%) 0.5326 (100.0%) 0.5336 (100.0%) Total
    ...
    ===-------------------------------------------------------------------------===
    Clang front-end time report
    ===-------------------------------------------------------------------------===
    Total Execution Time: 0.7780 seconds (0.7788 wall clock)

    ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
    0.6538 (100.0%) 0.1241 (100.0%) 0.7780 (100.0%) 0.7788 (100.0%) Clang front-end timer
    0.6538 (100.0%) 0.1241 (100.0%) 0.7780 (100.0%) 0.7788 (100.0%) Total

    The "Clang front-end timer" timer measured the time spent inclang::FrontendAction::Execute, which includes lexing,parsing, semantic analysis, LLVM IR generation, optimization, andmachine code generation. However, "Code Generation Time" and "LLVM IRGeneration Time" belonged to the default timer group "MiscellaneousUngrouped Timers". This caused confusion for many users. For example, https://aras-p.info/blog/2019/01/12/Investigating-compile-times-and-Clang-ftime-report/elaborates on the issues.

    To address the ambiguity, I revamped the output in Clang 20.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    ...
    ===-------------------------------------------------------------------------===
    Clang time report
    ===-------------------------------------------------------------------------===
    Total Execution Time: 0.7685 seconds (0.7686 wall clock)

    ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
    0.2798 ( 42.4%) 0.0966 ( 89.6%) 0.3765 ( 49.0%) 0.3768 ( 49.0%) Machine code generation
    0.2399 ( 36.3%) 0.0045 ( 4.2%) 0.2445 ( 31.8%) 0.2442 ( 31.8%) Front end
    0.1179 ( 17.8%) 0.0067 ( 6.2%) 0.1246 ( 16.2%) 0.1246 ( 16.2%) LLVM IR generation
    0.0230 ( 3.5%) 0.0000 ( 0.0%) 0.0230 ( 3.0%) 0.0230 ( 3.0%) Optimizer
    0.6606 (100.0%) 0.1079 (100.0%) 0.7685 (100.0%) 0.7686 (100.0%) Total

    The last group has been renamed and changed to cover a longerinterval within the invocation. It provides timing information for fourstages:

    • Front end: Includes lexing, parsing, semantic analysis, andmiscellnaenous tasks not captured by the subsequent timers.
    • LLVM IR generation: The time spent in generating LLVM IR.
    • LLVM IR optimization: The time consumed by LLVM's IR optimizationpipeline.
    • Machine code generation: The time taken to generate machine code orassembly from the optimized IR.

    The -ftime-report output further elaborates on thesestages through additional groups:

    • "Pass execution timing report" (first instance): A subset of the"Optimizer" group, providing detailed timing for individual optimizationpasses.
    • "Analysis execution timing report": A subset of the first "Passexecution timing report". In LLVM's new pass manager, analyses areexecuted as part of pass invocations.
    • "Pass execution timing report" (second instance): A subset of the"Machine code generation" group. (This group's name should be updatedonce the legacy pass manager is no longer used for IRoptimization.)
    • "Instruction Selection and Scheduling": This group appears whenSelectionDAG is utilized and is part of the "Instruction Selection"timer within the second "Pass execution timing report".

    Examples:

    "Pass execution timing report" (first instance)

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    ===-------------------------------------------------------------------------===
    Pass execution timing report
    ===-------------------------------------------------------------------------===
    Total Execution Time: 3.0009 seconds (3.0016 wall clock)

    ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
    0.9626 ( 32.7%) 0.0162 ( 26.6%) 0.9788 ( 32.6%) 0.9790 ( 32.6%) InstCombinePass
    0.3203 ( 10.9%) 0.0056 ( 9.2%) 0.3259 ( 10.9%) 0.3263 ( 10.9%) InlinerPass
    0.3123 ( 10.6%) 0.0068 ( 11.1%) 0.3190 ( 10.6%) 0.3187 ( 10.6%) SimplifyCFGPass
    ...

    When -ftime-report=per-run-pass is specified, a timer iscreated for each pass object. This can result in significant output,especially for modules with numerous functions, as each pass will bereported multiple times.

    Clang internals

    As clang -### -ftime-report shows, clangDriver forwards-ftime-report to Clang cc1. Within cc1, this option setsthe codegen flag clang::CodeGenOptions::TimePasses. Thisflag enables th uses of llvm::Timer objects to measure theexecution time of specific code blocks.

    From Clang 20 onwards, the placement of the timers can be understoodthrough the following call tree.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    cc1_main
    ExecuteCompilerInvocation // "Front end" minus the following timers
    ... all kinds of initialization
    CompilerInstance::ExecuteAction
    FrontendAction::BeginSourceFile
    FrontendAction::Execute
    FrontendAction::ExecutionAction
    ASTFrontendAction::ExecuteAction
    ParseAST
    BackendConsumer::HandleTranslationUnit
    clang::emitBackendOutput
    EmitAssemblyHelper::emitAssembly
    RunOptimizationPipeline // "Optimizer"
    RunCodegenPipeline // "Machine code generation"
    FrontendAction::EndSourceFile

    LLVM internals

    LLVM/lib/Support/Time.cpp implements the timer feature.Timer belongs to a TimerGroup.Timer::startTimer and Timer::stopTimergenerate a TimeRecord. Inclang/tools/driver/cc1_main.cpp,llvm::TimerGroup::printAll(llvm::errs()); dumps theseTimerGroup and TimeRecord information tostderr.

    There are a few cl::opt options

    • sort-timers (default: true): sort the timers in a groupin descending wall time.
    • track-memory: record increments or decrements in mallocstatistics. In glibc 2.33 and above, this utilizesmallinfo2::unordblks.
    • info-output-file: dump output to the specifiedfile.

    Examples:

    1
    2
    clang -c -ftime-report -mllvm -sort-timers=0 a.c
    clang -c -ftime-report -mllvm=-sort-timers=0 a.c

    The cl::opt option -time-passes can be used with theLLVM internal tools opt and llc, e.g.

    1
    2
    opt -S -passes='default<O2>' -time-passes < a.ll
    llc -time-passes < a.ll

    On Apple platforms, LLVM_SUPPORT_XCODE_SIGNPOSTS=onbuilds enableos_signpost forstartTimer/stopTimer.



沪ICP备19023445号-2号
友情链接