IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    ELF hash function may overflow

    MaskRay发表于 2023-04-14 06:48:43
    love 0

    This article describes an interesting overflow bug in the ELF hashfunction.

    The System V Application Binary Interface (generic ABI) specifies theELF object file format. When producing an executable or shared objectfile needing a dynamic symbol table (.dynsym), a linkergenerates a .hash section with type SHT_HASHto hold a symbolhash table. A DT_HASH tag is produced to hold theaddress of .hash.

    The hash table is used by a dynamic loader to perform symbol lookup(for dynamic relocations and dlsym family functions). Adetailed description of the format can be found in ELF: symbol lookup viaDT_HASH.

    Other use cases

    In a Solaris Version Definition Section, vd_hash holds avalue generated using the ELF hash function. The GNU symbol versioningscheme inherits this field from Solaris. Dynamic loaders use this fieldto accelerate symbol lookup of versioned symbols.

    Given such a small number of version definitions, even if a dynamicloader implementation has a bug, the overflow bug described below isvery unlikely to cause any problems.

    Overflow bug

    The generic ABI gives the following code fragment in "Figure 5-13:Hashing Function".

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    unsigned long
    elf_hash(const unsigned char *name)
    {
    unsigned longh = 0, g;
    while (*name)
    {
    h = (h << 4) + *name++;
    if (g = h & 0xf0000000)
    h ^= g >> 24;
    h &= ~g;
    }
    return h;
    }

    The function is supposed to return a value no larger than 0x0fffffff.h &= ~g clears the 4 highest bits in a 32-bit integer.(This isn't a good hash function: the high bits are unnecessarilydiscarded.)

    Unfortunately, there is a bug. When unsigned longconsists of more than 32 bits, the return value may be larger thanUINT32_MAX. If h is in the range[0x0fffff01,0x0fffffff] in the previous iteration, shiftingit by 4 and adding *name may make h largerthan UINT32_MAX.

    For instance,elf_hash((const unsigned char *)"\xff\x0f\x0f\x0f\x0f\x0f\x12")returns 0x100000002, which is clearly unintended, as the function shouldbehave the same way regardless of whether long represents a32-bit integer or a 64-bit integer.

    It is possible to use 7-bit ASCII characters to trigger the issue.For instance,

    • elf_hash((const unsigned char *)"iiiiii\na")) == 100000001
    • elf_hash((const unsigned char *)"ZZZZZX+a")) == 100000011
    • elf_hash((const unsigned char *)"ZZZZZW9p")) == 100000000

    Most ELF operating systems have switched from DT_HASH toDT_GNU_HASH for many years and preferDT_GNU_HASH for symbol search. We can build a shared objectwith ld -shared --hash-style=sysv and check whether adynamic symbol named "ZZZZZW9p" can be bound by relocationresolver/dlsym.

    A bug in FreeBSD rtld-elf

    If we compile the following C source file and link it with-shared -fuse-ld=lld -Wl,--hash-style=sysv, we get 3dynamic symbols and a SysV hash table with nbuckets=3. Asof April 2023, dlsym(dl, "ZZZZZW9p") returns NULL onFreeBSD.

    1
    2
    void ZZZZZW9p(void) {}
    void fn(void) {}

    If we link the shared object with --hash-style=gnu or--hash-style=both, rtld-elf will use the GNU hash table(DT_GNU_HASH) and dlsym(dl, "ZZZZZW9p") willreturn the correct value.

    This was just fixed by rtld: fix SysV hash functionoverflow, prompted by this article. I am so thrilled - my articleled to a bug fix within a few hours of my posting it. I did check theFreeBSD source before publishing this article and would have wanted tofix the issue for my own pleasure, but having a FreeBSD developer fix itmade me even happier.:)

    Project survey

    glibc fixedthe overflow issue while optimizing the function in April 1995. Thetwo XOR operations were optimizedto one in Dec 2011.

    binutils-gdb fixedthe overflow issue in May 2003. The function has a lovely comment"... Do not change this function; you will cause invalid hash tables tobe generated."

    musl has had the elegant and efficient implementation since June 2011(initial check-in of the dynamic linker). It is worth noting thatuint_fast32_t is used, so that an architecture can optimizethe implementation if the architecture has slow 32-bit integerarithmetic operations.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    static uint32_t sysv_hash(const char *s0)
    {
    const unsigned char *s = (void *)s0;
    uint_fast32_t h = 0;
    while (*s) {
    h = 16*h + *s++;
    h ^= h>>24 & 0xf0;
    }
    return h & 0xfffffff;
    }

    Nathan Sidwell raised theissue for llvm-project and pointed out a bug about usingchar instead of unsigned char on2023-04-09.

    I asked Whatif the result of elf_hash is larger than UINT32_MAX? on2023-04-11.



沪ICP备19023445号-2号
友情链接