This article describes an interesting overflow bug in the ELF hashfunction.
The System V Application Binary Interface (generic ABI) specifies theELF object file format. When producing an output executable or sharedobject needing a dynamic symbol table (.dynsym
), a linkergenerates a .hash
section with type SHT_HASH
to hold a symbolhash table. A DT_HASH
tag is produced to hold theaddress of .hash
.
DT_HASH
is used by a dynamic loader to perform symbollookup (for dynamic relocations and dlsym
familyfunctions). A detailed description of the format can be found in ELF: symbol lookup viaDT_HASH
.
In a Solaris Version Definition Section, vd_hash
holds avalue generated using the ELF hash function. The GNU symbol versioningscheme inherits this field from Solaris.
The generic ABI gives the following code fragment in "Figure 5-13:Hashing Function". 1
2
3
4
5
6
7
8
9
10
11
12
13unsigned long
elf_hash(const unsigned char *name)
{
unsigned longh = 0, g;
while (*name)
{
h = (h << 4) + *name++;
if (g = h & 0xf0000000)
h ^= g >> 24;
h &= ~g;
}
return h;
}
The function is supposed to return a value no larger than 0x0fffffff.Unfortunately, there is a bug. When unsigned long
consistsof more than 32 bits, the return value may be larger thanUINT32_MAX
. For instance,elf_hash((const unsigned char *)"\xff\x0f\x0f\x0f\x0f\x0f\x12")
returns 0x100000002, which is clearly unintended, as the function shouldbehave the same way regardless of whether long
represents a32-bit integer or a 64-bit integer.
It is possible to use 7-bit ASCII characters to trigger the issue.For instance,
elf_hash((const unsigned char *)"iiiiii\na")) == 100000001
elf_hash((const unsigned char *)"ZZZZZX+a")) == 100000011
elf_hash((const unsigned char *)"ZZZZZW9p")) == 100000000
Note: most ELF operating systems have switched fromDT_HASH
to DT_GNU_HASH
for many years. Versiondefinitions are rarely used.
glibc fixedthe overflow issue while optimizing the function in April 1995. Thetwo XOR operations were optimizedto one in Dec 2011.
binutils-gdb fixedthe overflow issue in May 2003.
musl has had the elegant and efficient implementation since June 2011(initial check-in of the dynamic linker). It is worth noting thatuint_fast32_t
is used, so that an architecture can optimizethe implementation if the architecture has slow 32-bit integerarithmetic operations. 1
2
3
4
5
6
7
8
9
10static uint32_t sysv_hash(const char *s0)
{
const unsigned char *s = (void *)s0;
uint_fast32_t h = 0;
while (*s) {
h = 16*h + *s++;
h ^= h>>24 & 0xf0;
}
return h & 0xfffffff;
}
Nathan Sidwell raised theissue for llvm-project and pointed out a bug about usingchar
instead of unsigned char
on2023-04-09.
I asked Whatif the result of elf_hash is larger than UINT32_MAX? on2023-04-11.