IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    Simplifying disassembly with LLVM tools

    MaskRay发表于 2024-12-22 22:49:39
    love 0

    Both compiler developers and security researchers have builtdisassemblers. They often prioritize different aspects. Compilertoolchains, benefiting from direct contributions from CPU vendors, tendto offer more accurate and robust decoding. Security-focused tools, onthe other hand, often excel in user interface design.

    For quick disassembly tasks, rizinprovides a convenient command-line interface.

    1
    2
    3
    % rz-asm -a x86 -b 64 -d 4829c390
    sub rbx, rax
    nop

    -a x86 can be omitted.

    llvm-mc

    Within the LLVM ecosystem, llvm-objdump serves as a drop-inreplacement for the traditional GNU objdump, leveraging instructioninformation from LLVM's TableGen files(llvm/lib/Target/*/*.td). Another LLVM tool, llvm-mc, wasoriginally designed for internal testing of the Machine Code (MC) layer,particularly the assembler and disassembler components. There arenumerous RUN: llvm-mc ... tests withinllvm/test/MC. Despite its internal origins, llvm-mc isoften distributed as part of the LLVM toolset, making it accessible tousers.

    However, using llvm-mc for simple disassembly tasks can becumbersome. It requires explicitly prefixing hexadecimal byte valueswith 0x:

    1
    2
    3
    4
    % echo 0x48 0x29 0xc3 0x90 | llvm-mc --triple=x86_64 --cdis --output-asm-variant=1
    .text
    sub rbx, rax
    nop

    Let's break down the options used in this command:

    • --triple=x86_64: This specifies the targetarchitecture. If your LLVM build's default target triple is alreadyx86_64-*-*, this option can be omitted.
    • --output-asm-variant=1:LLVM, like GCC, defaults to AT&T syntax for x86 assembly. Thisoption switches to the Intel syntax. See lhmouse/mcfgthread/wiki/Intel-syntaxif you prefer the Intel syntax in compiler toolchains.
    • --cdis: Introduced in LLVM 18, this option enablescolored disassembly. In older LLVM versions, you have to use--disassemble.

    I have contributed patches to remove.text and allow disassemblingraw bytes without the 0x prefix. You can now use the--hex option:

    1
    2
    3
    % echo 4829c390 | llvm-mc --cdis --hex --output-asm-variant=1
    sub rbx, rax
    nop

    You can further simplify this by creating a bash/zsh function. bashand zsh's "here string" feature provides a clean way to specifystdin.

    1
    2
    3
    disasm() {
    llvm-mc --cdis --hex --output-asm-variant=1 <<< $@
    }
    1
    2
    3
    4
    5
    6
    % disasm 4829c390
    sub rbx, rax
    nop
    % disasm $'4829 c3\n# comment\n90'
    sub rbx, rax
    nop

    The --hex option conveniently ignores whitespace and#-style comments within the input.

    llvm-objdump

    For address information, llvm-mc falls short. We need to turn tollvm-objdump to get that detail. Here are fish and zsh scripts that takeraw hex bytes as input, convert them to a binary format(xxd -r -p), and then create an ELF relocatable file(llvm-objcopy -I binary) targeting the x86-64 architecture.Finally, llvm-objdump with the -D flag disassembles thedata section (.data) containing the converted binary.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    % cat disasm.fish
    #!/usr/bin/env fish
    llvm-objdump -D -j .data (echo $argv | xxd -r -p | llvm-objcopy -I binary -O elf64-x86-64 - - | psub) | sed '1,/<_binary__stdin__start>:/d'
    % cat disasm.zsh
    #!/bin/zsh
    llvm-objdump -D -j .data <(xxd -r -p <<< $@ | llvm-objcopy -I binary -O elf64-x86-64 - -) | sed '1,/<_binary__stdin__start>:/d'
    % ./disasm.fish e8 00000000c3 e800000000 c3
    0: e8 00 00 00 00 callq 0x5 <_binary__stdin__start+0x5>
    5: c3 retq
    6: e8 00 00 00 00 callq 0xb <_binary__stdin__start+0xb>
    b: c3 retq


沪ICP备19023445号-2号
友情链接