UNDER CONSTRUCTION
This article describes linker notes about Portable Executable (PE)and Common Object File Format (COFF) used on Windows and UEFIenvironments.
In ELF, an object file can be a relocatable file, an executable file,or a shared object file. On Windows, the term "object file" usuallyrefers to relocatable files like ELF. Such files use the Common ObjectFile Format (COFF) while image files (e.g. executables and DLLs) use thePortable Executable (PE) format.
The input files can be object files, archive files, and importlibraries. GNU ld and lld-link allow linking against DLL files withoutan import library.
TODO
An import file (.lib
) is a special archive file. Eachmember represents a symbol to be imported. The symbol__imp_$sym
is inserted to the global symbol table.
The import header has a Type
field indicatingIMPORT_OBJECT_CODE/IMPORT_OBJECT_DATA/IMPORT_OBJECT_CONST
.
For an import type of IMPORT_OBJECT_DATA
, the symbol$sym
is defined as an alias for__imp_$sym
.
For an import type of IMPORT_OBJECT_CODE
, the symbol$sym
is defined as an import thunk, which is like a PLTentry in ELF.
GNU ld and lld-link allow linking against DLL files without an importlibrary. The behavior is as if the linker synthesizes an import libraryfrom a DLL file.
An object file contributes defined and undefined symbols. An importfile contributes defined symbols in a DLL that can be referenced by__imp_$sym
.
A defined symbol can be any of the following kinds:
IMAGE_SYM_UNDEFINED
and valueis not 0)An undefined symbol has a storage class ofIMAGE_SYM_CLASS_EXTERNAL
, a section number ofIMAGE_SYM_UNDEFINED
(zero), and a value of zero.
An undefined symbol with a storage class ofIMAGE_SYM_CLASS_WEAK_EXTERNAL
is a weak external, which isactually like a weak definition in ELF.
PE requires explicit annotations for exported symbols and importedsymbols in DLL files. There are differences between code symbols andfunction symbols.
1 | // b.dll |
Linking b.dll
gives us b.lib
(see "Importfiles" above). 1
2
3
4
5
6# b.dll
.globl f
f:
.section .drectve,"yni"
.ascii " -export:f"
a.obj
has two function calls. The call to f
references the prefixed symbol __imp_f
. 1
2
3# a.obj
callq local
callq *__imp_f(%rip)
call *__imp_f(%rip)
is like -fno-plt
codegen for ELF. In this case when we know that f
isdefined elsewhere, the generated code is more efficient.
When linking a.exe
, we need to make the import fileb.lib
as an input file The linker parses the import fileand creates a definition for __imp_f
pointing to the importaddress table entry.
TODO import table
Actually, when __imp_f
is defined, the unprefixed symbolf
is also defined. Normally, the unprefixed f
is unused and will be discarded. However, if the user code calls theunprefixed symbol (e.g. call f
), the f
definition will be retained in the linker output and point to a thunk:1
2
3
4 call f # generated code without using dllimport
f: # x86-64 thunk
jmpq *__imp_f(%rip)
Different architectures have different thunk implementations.1
2
3
4
5
6
7
8
9
10
11
12// x86-32 and x86-64
jmp *0x0 // references an entry in the import address table
// AArch32
mov.w ip, #0
mov.t ip, #0
ldr.w pc, [ip]
// AArch64
adrp x16, #0
ldr x16, [x16]
br x16
TODO link.exe will issue a warning.
1 | // b.dll |
1 | # b.dll |
The linker parses the import file and creates a definition for__imp_var
pointing to the import address table entry.Unlike a code symbol, the linker does not create a definition forvar
(without the __imp_
prefix).
With a dllimport
: 1
2movq __imp_var(%rip), %rax
movl (%rax), %eax
If dllimport
is not specified, we get a referenced tothe unprefixed symbol: 1
movq var(%rip), %rax
link.exe will report an error.
MinGW implements runtime pseudo relocations to patch the text sectionso that absolute pointers and relative offsets to the symbol will berewritten to bind to the actual definition. 1
movq var(%rip), %rax # the runtime will rewrite this to point to the definition in b.dll
If the variable is defined out of the +-2GiB range from the currentlocation, the runtime pseudo relocation can't fix the issue. See crt:Check pseudo relocations for overflows and error out clearly.
For a non-definition declaration, GCC conservatively thinks thevariable may be defined in a DLL and generate indirection. This issimilar to a GOT code sequence in ELF. 1
2extern int extern_var;
int main() { return extern_var; }
1 | // MSVC |
A dllimport
symbol referenced by an object file isnormally satisfied by an import file. link.exe allows another objectfile to provide the definition. In such a case, link.exe will issue awarning (LinkerTools Warning LNK4217). lld-link has implemented this feature forcompatibility.
1 | echo '__declspec(dllimport) int foo(); int main() { return foo(); }' > a.cc |
1 | lld-link: warning: a.obj: locally defined symbol imported: int __cdecl foo(void) (defined in b.obj) [LNK4217] |
MinGW provides auto exporting and auto importing features to make PEDLL files work like ELF shared objects. When producing a DLL file, if nosymbol is chosen to be exported, almost all symbols are exported bydefault (--export-all-symbols
).
If an undefined symbol $sym
is unresolved and__imp_$sym
is defined, $sym
will be aliased to__imp_$sym
. TODO: example
If the symbol .refptr.$sym
is present, it will bealiased to __imp_$sym
as well. mingw-w64 defaults to-mcmodel=medium
and uses .refptr.$sym
. TODO:example
https://github.com/ziglang/zig/issues/9845
__imp_
definitionThe user can define __imp_
instead of letting the linkerdoes.
https://github.com/llvm/llvm-project/issues/57982 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15$ cat lto-dllimp1.c
void __declspec(dllimport) importedFunc(void);
void other(void);
void entry(void) {
importedFunc();
other();
}
$ cat lto-dllimp2.c
static void importedFuncReplacement(void) {
}
void (*__imp_importedFunc)(void) = importedFuncReplacement;
void other(void) {
}
The design of share libraries has major advancements around 1988.Before 1988, there were shared libraries implementations in a.out andCOFF objec file formats, but they had severe limitations, such as fixedaddresses and the requirement of extra files like import files.
Such limitations are evidenced in 1986 Summer USENIX TechnicalConference & Exhibition Proceedings, Shared Libraries onUNIX System V from AT&T. Its shared library (presumably usingthe COFF object file format) must have a fixed virtual address, which iscalled "static shared library" in Linkers and Loaders'sterm.
In 1988, SunOS 4.0 was released with an extended a.out binary formatwith dynamic shared library support. Unlike previous static sharedlibrary schemes, the a.out shared libraries are position independent andcan be loaded at different addresses. The dynamic linker source code isavailable somewhere and I find that its GOT and PLT schemes are exaclylike what we have for ELF today.
AT&T and Sun collaborated to create the first System V release 4ABI (using ELF). AT&T contributed the ELF object format. Suncontributed all of the dynamic linking implementation from SunOS 4.x. In1992, SunOS 5.0 (Solaris 2.0) switched to ELF.
For ELF, the designers tried to make shared libraries similar tostatic libraries. There is no need to annotate export and import symbolsto work with shared libraries.
I cannot find more information about System V release 3's sharedlibrary support, but the Windows DLL is assuredly inspired by it, giventhat the PE object file format is based on COFF and the PE specificationrefers to COFF in numerous places.
So, is the shared library design in ELF more advanced? It is.However, two aspects are worth deep thoughts.
-z undefs
default in linkers. See Dependencyrelated linker options.The number of symbols cannot exceed 65535. Several open-sourceprojects have faced problems that a DLL file cannot export more than65535 symbols. (GNU ld has a diagnosticerror: export ordinal too large:
).
A section header has only 8 bytes for the name field. link.exetruncates long section names to 8 bytes. For a section with a long nameand the IMAGE_SCN_MEM_DISCARDABLE
flag, lld uses anon-standard string table and issues a warning.