IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    PMON crash the instance of RAC

    admin发表于 2015-02-11 07:11:22
    love 0

    本站文章除注明转载外,均为本站原创: 转载自love wife & love life —Roger 的Oracle技术博客

    本文链接地址: PMON crash the instance of RAC

    这是某个网友的问题,其一套rac的某个节点被重启了,通过分析日志我们可以看到其中一个rac节点crash了,如下:

    Errors in file /opt/oracle/diag/rdbms/cspora/cspora2/trace/cspora2_pmon_57410148.trc  (incident=584434):
    ORA-00600: internal error code, arguments: [kjucvl:!busy], [8], [], [], [], [], [], [], [], [], [], []
    Incident details in: /opt/oracle/diag/rdbms/cspora/cspora2/incident/incdir_584434/cspora2_pmon_57410148_i584434.trc
    Thu Feb 05 10:24:04 2015
    Trace dumping is performing id=[cdmp_20150205102404]
    Errors in file /opt/oracle/diag/rdbms/cspora/cspora2/trace/cspora2_pmon_57410148.trc  (incident=584435):
    ORA-00600: internal error code, arguments: [kjuscv], [6], [], [], [], [], [], [], [], [], [], []
    ORA-00600: internal error code, arguments: [kjucvl:!busy], [8], [], [], [], [], [], [], [], [], [], []
    Incident details in: /opt/oracle/diag/rdbms/cspora/cspora2/incident/incdir_584435/cspora2_pmon_57410148_i584435.trc
    Thu Feb 05 10:24:06 2015
    Sweep Incident[584434]: completed
    Errors in file /opt/oracle/diag/rdbms/cspora/cspora2/trace/cspora2_pmon_57410148.trc:
    ORA-00600: internal error code, arguments: [kjuscv], [6], [], [], [], [], [], [], [], [], [], []
    ORA-00600: internal error code, arguments: [kjucvl:!busy], [8], [], [], [], [], [], [], [], [], [], []
    PMON (ospid: 57410148): terminating the instance due to error 472
    Thu Feb 05 10:24:18 2015
    Termination issued to instance processes. Waiting for the processes to exit
    Instance termination failed to kill one or more processes
    Instance terminated by PMON, pid = 57410148
    Thu Feb 05 10:24:58 2015
    Starting ORACLE instance (normal)
    sskgpgetexecname failed to get name
    LICENSE_MAX_SESSION = 0
    LICENSE_SESSIONS_WARNING = 0
    Picked latch-free SCN scheme 3
    Autotune of undo retention is turned on.
    LICENSE_MAX_USERS = 0
    SYS auditing is disabled
    Starting up ORACLE RDBMS Version: 11.1.0.7.0.
    Using parameter settings in server-side spfile /dev/rlv_spfile
    .......
    .......
    ORA-1154 signalled during: alter database open...
    Thu Feb 05 10:31:11 2015
    Reconfiguration started (old inc 102, new inc 104)
    List of nodes:
    1
     Global Resource Directory frozen
     * dead instance detected - domain 0 invalid = TRUE
    Communication channels reestablished
     Master broadcasted resource hash value bitmaps
     Non-local Process blocks cleaned out
    Thu Feb 05 10:31:12 2015
     LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    Thu Feb 05 10:31:12 2015

    从上述alert log可以看出,数据库实例是被pmon进程强行终止的,之所以pmon进程会强行终止instance,
    很明显之前出现了ORA-00600错误。很明显上述ORA-00600错误是pmon进程出现的。我们来分析pmon进程的trace内容:

    *** ACTION NAME:() 2015-02-05 10:24:01.185
    
    Dump continued from file: /opt/oracle/diag/rdbms/cspora/cspora2/trace/cspora2_pmon_57410148.trc
    ORA-00600: internal error code, arguments: [kjucvl:!busy], [8], [], [], [], [], [], [], [], [], [], []
    
    ========= Dump for incident 584434 (ORA 600 [kjucvl:!busy]) ========
    ----- Beginning of Customized Incident Dump(s) -----
    ----------enqueue 7000001cbbcdaf0------------------------
    lock version     : 311947
    Owner node       : 1
    grant_level      : KJUSEREX
    req_level        : KJUSEREX
    bast_level       : KJUSERNL
    notify_func      : 0
    resp             : 70000018f124ac0
    procp            : 7000001cb0e3960
    pid              : 46924312
    proc version     : 922
    oprocp           : 0
    opid             : 0
    group lock owner : 0
    xid              : 0000-0000-00000000
    dd_time          : 0.0 secs
    dd_count         : 0
    timeout          : 0.0 secs
    On_timer_q?      : N
    On_dd_q?         : N
    lock_state       : CONVERTING
    Open Options     :  KJUSERPROCESS_OWNED
    Convert options  : KJUSERGETVALUE KJUSERNOQUEUE KJUSEREXPRESS KJUSERNODEADLOCKWAIT KJUSERNODEADLOCKBLOCK
    History          : 0x955252b5
    Msg_Seq          : 0x0
    res_seq          : 41
    valblk           : 0x514e5844970d5d33d156c8d80004ac1e QNXD]3V
    ----------resource 70000018f124ac0----------------------
    resname       : [0x970d5d33][0x0],[DX]
    Local node    : 1
    dir_node      : 0
    master_node   : 0
    hv idx        : 18
    hv last r.inc : 98
    current inc   : 98
    hv status     : 0
    hv master     : 0
    open options  :
    Held mode     : KJUSEREX
    Cvt mode      : KJUSERNL
    Next Cvt mode : KJUSERNL
    msg_seq       : 1
    res_seq       : 41
    grant_bits    : KJUSERNL KJUSEREX
    grant mode    : KJUSERNL  KJUSERCR  KJUSERCW  KJUSERPR  KJUSERPW  KJUSEREX
    count         : 1         0         0         0         0         1
    val_state     : KJUSERVS_VALUE
    valblk        : 0x514e5844970d5d33d156c8d80004ac1e QNXD]3V
    access_node   : 1
    vbreq_state   : 0
    state         : x8
    resp          : 70000018f124ac0
    On Scan_q?    : N
    Total accesses: 5475
    Imm.  accesses: 5057
    Granted_locks : 2
    Cvting_locks  : 0
    value_block:  51 4e 58 44 97 0d 5d 33 d1 56 c8 d8 00 04 ac 1e
    GRANTED_Q :
    lp 7000001cbbcdaf0 gl KJUSEREX rp 70000018f124ac0 [0x970d5d33][0x0],[DX]
     master 0 pid 46924312 bast 0 rseq 41 mseq 0 history 0x955252b5
     open opt  KJUSERPROCESS_OWNED
    CONVERT_Q:
    ----- End of Customized Incident Dump(s) -----
    
    *** 2015-02-05 10:24:01.286
    ----- SQL Statement (None) -----
    Current SQL information unavailable - no cursor.
    
    ----- Call Stack Trace -----
    calling              call     entry                argument values in hex
    location             type     point                (? means dubious value)
    -------------------- -------- -------------------- ----------------------------
    skdstdst()+002c      bl       105e6a628
    ksedst1()+0064       bl       101f9c14c
    ksedst()+0028        bl       ksedst1()            FFFFFFFFFFF4880 ? 000000000 ?
    dbkedDefDump()+0874  bl       ksedst()             110EA2180 ?
    ksedmp()+0048        bl       dbkedDefDump()       3FFFF51D0 ? 110000A90 ?
    ksfdmp()+0058        bl       ksedmp()             000000020 ?
    dbgexPhaseII()+0130  bl       _ptrgl()
    dbgexExplicitEndInc  bl       dbgexPhaseII()       000000000 ? 000000000 ?
    ()+0210                                            000000000 ?
    dbgeEndDDEInvocatio  bl       dbgexExplicitEndInc  110378410 ? 11037F370 ?
    nImpl()+0224                  ()
    dbgeEndDDEInvocatio  bl       dbgeEndDDEInvocatio  110378410 ? 11037F370 ?
    n()+0038                      nImpl()
    kjucvl()+0590        bl       101f9d824
    kjuscv()+0c90        bl       kjucvl()             000000002 ? FFFFFFFFFFF9700 ?
     FFFFFFFFFFF9570 ?
     24244228CB0E3960 ?
     102196C80 ? FFFFFFFFFFF9220 ?
     000000000 ? 110323C28 ?
    ksicon_v()+06cc      bl       kjuscv()             7000001CBBCDAF0 ?
     500000000000000 ?
     79FFFF9D00 ? 0105358E0 ?
     7000001BDF32FB4 ? 000000000 ?
     000000000 ? 000000000 ?
    k2qsod()+0250        bl       ksicon_v()           1071DED20 ? 000000000 ?
     000000000 ? 084242240 ?
     70000017FEF01A0 ? 000000046 ?
     70000017FEF01C0 ?
    kssxdl()+076c        bl       _ptrgl()
    kssdel()+0048        bl       kssxdl()             7000001BDF32F38 ? 382242228 ?
    kssdch()+0d30        bl       kssdel()             FFFFFFFFFFFA550 ?
     4224822800000003 ?
    ksuxds()+08b8        bl       kssdch()             3CD1FEAE8 ? 7000001CD3ACCA0 ?
    kssxdl()+076c        bl       _ptrgl()
    kssdel()+0048        bl       kssxdl()             7000001CD3ACCA0 ? 3CE4C7B78 ?
    kssdch()+0d30        bl       kssdel()             FFFFFFFFFFFF5D0 ?
     FFFFFFFFFFFF5E0 ?
    ksudlp()+0180        bl       kssdch()             000000000 ? 000000000 ?
    kssxdl()+076c        bl       _ptrgl()
    kssdel()+0048        bl       kssxdl()             7000001CF9B5E48 ? 300000000 ?
    ksuxdl()+029c        bl       kssdel()             033339530 ? 093339530 ?
    ksuxda()+02ac        bl       ksuxdl()             7000001CF9B5E48 ? 0FFFFC1D0 ?
    ksucln()+0938        bl       ksuxda()
    ksbrdp()+075c        bl       _ptrgl()
    opirip()+0444        bl       105e65300
    opidrv()+0418        bl       opirip()             732F6373706F7261 ?
     410323C28 ? FFFFFFFFFFFF420 ?
    sou2o()+0090         bl       opidrv()             3204C0756C ? 4A0071E60 ?
     FFFFFFFFFFFF420 ?
    opimai_real()+0148   bl       101f9be30
    main()+0090          bl       opimai_real()        000000000 ? 000000000 ?
    __start()+0070       bl       main()               000000000 ? 000000000 ?
    

    从上述信息来看,出现了DX锁,对于DX锁,这是一种分布式事务锁。通过分析call stack,我们可以定位到
    该问题其实是一个Oracle bug:11868640。

    通过对比Bug 11868640 – ORA-600 [kjucvl:!busy] possible in PMON in RAC (文档 ID 11868640.8) ,
    我们可以发现pmon trace中的call stack与上述bug描述完全一致。实际上该问题,可以通过安装patch或psu来解决。

    很简单的一个问题,供大家参考!

    Related posts:

    1. ora-00600 [kkslgbv0]
    2. BUG 10008092 caused instance crash
    3. database crash with ora-00494
    4. PMON failed to acquire latch导致crash的案例联想
    5. Instance immediate crash after open


沪ICP备19023445号-2号
友情链接