Tair 又 core dump 了。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | Core was generated by `sbin/tair_server -f etc/dataserver.conf'. Program terminated with signal 11, Segmentation fault. (gdb) f #0 0x000000000048229b in request_processor::process () at request_processor.cpp:1340 1340 req->r->opacket = resp; (gdb) p/a req $36 = 0x2aadba4d0d80 (gdb) p/a resp $37 = 0x2aadba4d7e60 (gdb) p/a req->r $38 = 0x2aacf3986da0 (gdb) p/a &req->r->opacket $39 = 0x2aacf3986de0 (gdb) disassemble $rip-9, +15 Dump of assembler code from 0x482292 to 0x4822a1: # load address of resp to rax 0x0000000000482292 <request_processor::process()+98>: mov 0x10(%rsp),%rax # load address of req->r to rdx 0x0000000000482297 <request_processor::process()+103>: mov 0x20(%r13),%rdx # assign address of resp to req->r->opacket => 0x000000000048229b <request_processor::process()+107>: mov %rax,0x40(%rdx) 0x000000000048229f <request_processor::process()+111>: mov 0x10(%rsp),%rdi End of assembler dump. (gdb) x/a $rsp+0x10 0x4f56fcd0: 0x2aadba4d7e60 # address of resp (gdb) p/a $rax $40 = 0x2aadba4d7e60 # address of resp (gdb) p/a $r13 # address of req $41 = 0x2aadba4d0d80 (gdb) x/a $r13+0x20 # address of req->r 0x2aadba4d0da0: 0x2aacf3986da0 (gdb) p/a $rdx $42 = 0x0 |
又是一个 Segmentation falt,core 在一个赋值操作, req->r->opacket = resp; 按照惯例,req, req->r 或者 req->r->opacket 指向的地址应该是非法的,但查看这些地址,却全都是合法的地址。查看汇编代码,程序 core 在指令 mov %rax, 0x40(%rdx) 处,%rdx 内容为 NULL,即 req->r 为 NULL!%rdx 的值是从 %r13 + 0x20 处取得的,而该处的值是 0x2aacf3986da0,不是 NULL!
只有一种可能:最初从 (%r13+0x20),即 req->r 取出的值(到 %rdx)是 NULL,在访问 0x40(%rdx) 之前,req->r 又被复制为非 NULL。那就是并发问题了。
类似这种诡异的现象,可能还会遇到 assert(var != 0) 失败,但 var 却是非 0 的情况。
遇到难以置信的 bug,就想想并发。