IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    并发引起的诡异 Bug 一枚

    dutor发表于 2013-11-04 03:15:14
    love 0

      Tair 又 core dump 了。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    
    Core was generated by `sbin/tair_server -f etc/dataserver.conf'.
    Program terminated with signal 11, Segmentation fault.
    (gdb) f
    #0  0x000000000048229b in request_processor::process () at request_processor.cpp:1340
    1340        req->r->opacket = resp;
    (gdb) p/a req
    $36 = 0x2aadba4d0d80
    (gdb) p/a resp
    $37 = 0x2aadba4d7e60
    (gdb) p/a req->r
    $38 = 0x2aacf3986da0
    (gdb) p/a &req->r->opacket
    $39 = 0x2aacf3986de0
    (gdb) disassemble $rip-9, +15
    Dump of assembler code from 0x482292 to 0x4822a1:
       # load address of resp to rax
       0x0000000000482292 <request_processor::process()+98>:       mov    0x10(%rsp),%rax
       # load address of req->r to rdx
       0x0000000000482297 <request_processor::process()+103>:      mov    0x20(%r13),%rdx
       # assign address of resp to req->r->opacket
    => 0x000000000048229b <request_processor::process()+107>:      mov    %rax,0x40(%rdx)
       0x000000000048229f <request_processor::process()+111>:      mov    0x10(%rsp),%rdi
    End of assembler dump.
    (gdb) x/a $rsp+0x10
    0x4f56fcd0:     0x2aadba4d7e60 # address of resp
    (gdb) p/a $rax
    $40 = 0x2aadba4d7e60 # address of resp
    (gdb) p/a $r13 # address of req
    $41 = 0x2aadba4d0d80
    (gdb) x/a $r13+0x20 # address of req->r
    0x2aadba4d0da0: 0x2aacf3986da0
    (gdb) p/a $rdx
    $42 = 0x0

      又是一个 Segmentation falt,core 在一个赋值操作, req->r->opacket = resp; 按照惯例,req, req->r 或者 req->r->opacket 指向的地址应该是非法的,但查看这些地址,却全都是合法的地址。查看汇编代码,程序 core 在指令 mov %rax, 0x40(%rdx) 处,%rdx 内容为 NULL,即 req->r 为 NULL!%rdx 的值是从 %r13 + 0x20 处取得的,而该处的值是 0x2aacf3986da0,不是 NULL!
      只有一种可能:最初从 (%r13+0x20),即 req->r 取出的值(到 %rdx)是 NULL,在访问 0x40(%rdx) 之前,req->r 又被复制为非 NULL。那就是并发问题了。
      类似这种诡异的现象,可能还会遇到 assert(var != 0) 失败,但 var 却是非 0 的情况。
      遇到难以置信的 bug,就想想并发。



沪ICP备19023445号-2号
友情链接