IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    Linker notes on PE/COFF

    MaskRay发表于 2023-12-03 04:06:33
    love 0

    UNDER CONSTRUCTION

    This article describes linker notes about Portable Executable (PE)and Common Object File Format (COFF) used on Windows and UEFIenvironments.

    In ELF, an object file can be a relocatable file, an executable file,or a shared object file. On Windows, the term "object file" usuallyrefers to relocatable files like ELF. Such files use the Common ObjectFile Format (COFF) while image files (e.g. executables and DLLs) use thePortable Executable (PE) format.

    Input files

    The input files can be object files, archive files, and importlibraries. GNU ld and lld-link allow linking against DLL files withoutan import library.

    Relocatable object files

    TODO

    Import files

    An import file (.lib) is a special archive file. Eachmember represents a symbol to be imported. The symbol__imp_$sym is inserted to the global symbol table.

    The import header has a Type field indicatingIMPORT_OBJECT_CODE/IMPORT_OBJECT_DATA/IMPORT_OBJECT_CONST.

    For an import type of IMPORT_OBJECT_DATA, the symbol$sym is defined as an alias for__imp_$sym.

    For an import type of IMPORT_OBJECT_CODE, the symbol$sym is defined as an import thunk, which is like a PLTentry in ELF.

    GNU ld and lld-link allow linking against DLL files without an importlibrary. The behavior is as if the linker synthesizes an import libraryfrom a DLL file.

    Symbols

    An object file contributes defined and undefined symbols. An importfile contributes defined symbols in a DLL that can be referenced by__imp_$sym.

    A defined symbol can be any of the following kinds:

    • special (ignored in the global symbol table)
    • common (section number is IMAGE_SYM_UNDEFINED and valueis not 0)
    • absolute (section number is -1)
    • regular (section number is positive)

    An undefined symbol has a storage class ofIMAGE_SYM_CLASS_EXTERNAL, a section number ofIMAGE_SYM_UNDEFINED (zero), and a value of zero.

    An undefined symbol with a storage class ofIMAGE_SYM_CLASS_WEAK_EXTERNAL is a weak external, which isactually like a weak definition in ELF.

    PE requires explicit annotations for exported symbols and importedsymbols in DLL files. There are differences between code symbols andfunction symbols.

    Imported code symbols

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    // b.dll
    __declspec(dllexport) void f() {}

    // a.exe
    void local(void) {}
    void __declspec(dllimport) f(void);
    int main(void) {
    local();
    f();
    }

    Linking b.dll gives us b.lib (see "Importfiles" above).

    1
    2
    3
    4
    5
    6
    # b.dll
    .globl f
    f:

    .section .drectve,"yni"
    .ascii " -export:f"

    a.obj has two function calls. The call to freferences the prefixed symbol __imp_f.

    1
    2
    3
    # a.obj
    callq local
    callq *__imp_f(%rip)

    call *__imp_f(%rip) is like -fno-pltcodegen for ELF. In this case when we know that f isdefined elsewhere, the generated code is more efficient.

    When linking a.exe, we need to make the import fileb.lib as an input file The linker parses the import fileand creates a definition for __imp_f pointing to the importaddress table entry.

    TODO import table

    Actually, when __imp_f is defined, the unprefixed symbolf is also defined. Normally, the unprefixed fis unused and will be discarded. However, if the user code calls theunprefixed symbol (e.g. call f), the fdefinition will be retained in the linker output and point to a thunk:

    1
    2
    3
    4
      call f  # generated code without using dllimport

    f: # x86-64 thunk
    jmpq *__imp_f(%rip)

    Different architectures have different thunk implementations.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    // x86-32 and x86-64
    jmp *0x0 // references an entry in the import address table

    // AArch32
    mov.w ip, #0
    mov.t ip, #0
    ldr.w pc, [ip]

    // AArch64
    adrp x16, #0
    ldr x16, [x16]
    br x16

    TODO link.exe will issue a warning.

    Imported data symbols

    1
    2
    3
    4
    5
    6
    7
    // b.dll
    __declspec(dllexport) int var;

    // a.exe
    int local_var;
    __declspec(dllimport) extern int var;
    int main() { return local_var + var; }
    1
    2
    3
    4
    5
    6
    7
    # b.dll
    .bss
    .globl var
    var:

    .section .drectve,"yni"
    .ascii " -export:var,data"

    The linker parses the import file and creates a definition for__imp_var pointing to the import address table entry.Unlike a code symbol, the linker does not create a definition forvar (without the __imp_ prefix).

    With a dllimport:

    1
    2
    movq    __imp_var(%rip), %rax
    movl (%rax), %eax

    If dllimport is not specified, we get a referenced tothe unprefixed symbol:

    1
    movq    var(%rip), %rax

    link.exe will report an error.

    MinGW implements runtime pseudo relocations to patch the text sectionso that absolute pointers and relative offsets to the symbol will berewritten to bind to the actual definition.

    1
    movq var(%rip), %rax  # the runtime will rewrite this to point to the definition in b.dll

    If the variable is defined out of the +-2GiB range from the currentlocation, the runtime pseudo relocation can't fix the issue. See crt:Check pseudo relocations for overflows and error out clearly.

    For a non-definition declaration, GCC conservatively thinks thevariable may be defined in a DLL and generate indirection. This issimilar to a GOT code sequence in ELF.

    1
    2
    extern int extern_var;
    int main() { return extern_var; }

    1
    2
    3
    4
    5
    6
    // MSVC
    movl extern_var(%rip), %eax

    // GCC
    movq .refptr.extern_var(%rip), %rax
    movl (%rax), %eax

    Non-dllexport definitionand dllimport

    A dllimport symbol referenced by an object file isnormally satisfied by an import file. link.exe allows another objectfile to provide the definition. In such a case, link.exe will issue awarning (LinkerTools Warning LNK4217). lld-link has implemented this feature forcompatibility.

    1
    2
    3
    4
    echo '__declspec(dllimport) int foo(); int main() { return foo(); }' > a.cc
    echo 'int foo() { return 42; }' > b.cc
    clang-cl -c a.cc b.cc
    lld-link -nodefaultlib -entry:main a.obj b.obj
    1
    lld-link: warning: a.obj: locally defined symbol imported: int __cdecl foo(void) (defined in b.obj) [LNK4217]

    MinGW

    MinGW provides auto exporting and auto importing features to make PEDLL files work like ELF shared objects. When producing a DLL file, if nosymbol is chosen to be exported, almost all symbols are exported bydefault (--export-all-symbols).

    If an undefined symbol $sym is unresolved and__imp_$sym is defined, $sym will be aliased to__imp_$sym. TODO: example

    If the symbol .refptr.$sym is present, it will bealiased to __imp_$sym as well. mingw-w64 defaults to-mcmodel=medium and uses .refptr.$sym. TODO:example

    https://github.com/ziglang/zig/issues/9845

    Manual __imp_definition

    The user can define __imp_ instead of letting the linkerdoes.

    https://github.com/llvm/llvm-project/issues/57982

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    $ cat lto-dllimp1.c
    void __declspec(dllimport) importedFunc(void);
    void other(void);

    void entry(void) {
    importedFunc();
    other();
    }
    $ cat lto-dllimp2.c
    static void importedFuncReplacement(void) {
    }
    void (*__imp_importedFunc)(void) = importedFuncReplacement;

    void other(void) {
    }

    Shared library comparisonwith ELF

    The design of share libraries has major advancements around 1988.Before 1988, there were shared libraries implementations in a.out andCOFF objec file formats, but they had severe limitations, such as fixedaddresses and the requirement of extra files like import files.

    Such limitations are evidenced in 1986 Summer USENIX TechnicalConference & Exhibition Proceedings, Shared Libraries onUNIX System V from AT&T. Its shared library (presumably usingthe COFF object file format) must have a fixed virtual address, which iscalled "static shared library" in Linkers and Loaders'sterm.

    In 1988, SunOS 4.0 was released with an extended a.out binary formatwith dynamic shared library support. Unlike previous static sharedlibrary schemes, the a.out shared libraries are position independent andcan be loaded at different addresses. The dynamic linker source code isavailable somewhere and I find that its GOT and PLT schemes are exaclylike what we have for ELF today.

    AT&T and Sun collaborated to create the first System V release 4ABI (using ELF). AT&T contributed the ELF object format. Suncontributed all of the dynamic linking implementation from SunOS 4.x. In1992, SunOS 5.0 (Solaris 2.0) switched to ELF.

    For ELF, the designers tried to make shared libraries similar tostatic libraries. There is no need to annotate export and import symbolsto work with shared libraries.

    I cannot find more information about System V release 3's sharedlibrary support, but the Windows DLL is assuredly inspired by it, giventhat the PE object file format is based on COFF and the PE specificationrefers to COFF in numerous places.

    So, is the shared library design in ELF more advanced? It is.However, two aspects are worth deep thoughts.

    • The manual export and import annotations have its stregth.
    • Choices made to make ELF shared libraries flexible had majordownsides.
      • Performance downside due to symbol interposition on the compilerside. See -fno-semantic-interposition
      • Performance downside due to symbol interposition on the linker andloader side. See ELFinterposition and -Bsymbolic
      • Underlinking problems exacebated by the -z undefsdefault in linkers. See Dependencyrelated linker options.

    Limitations

    The number of symbols cannot exceed 65535. Several open-sourceprojects have faced problems that a DLL file cannot export more than65535 symbols. (GNU ld has a diagnosticerror: export ordinal too large:).

    A section header has only 8 bytes for the name field. link.exetruncates long section names to 8 bytes. For a section with a long nameand the IMAGE_SCN_MEM_DISCARDABLE flag, lld uses anon-standard string table and issues a warning.



沪ICP备19023445号-2号
友情链接