IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    11.2.0.4 RAC CRS diskgroup auto dismount问题

    admin发表于 2015-09-04 06:57:22
    love 0

    本站文章除注明转载外,均为本站原创: 转载自love wife & love life —Roger 的Oracle技术博客

    本文链接地址: 11.2.0.4 RAC CRS diskgroup auto dismount问题

    这几天出差到甘肃,某客户的集群有点小问题。其中节点的crs检测发现有点异常,不过怪异的是业务却没有受到任何影响。如下是我检测的结果:

    [grid@tysy1 bin]$ ocrcheck
    PROT-602: Failed to retrieve data from the cluster registry
    PROC-26: Error while accessing the physical storage
    
    [grid@tysy1 bin]$ crsctl query css votedisk
    ##  STATE    File Universal Id                File Name Disk group
    --  -----    -----------------                --------- ---------
     1. ONLINE   6fe50eed90fe4f44bf68c2da8459bfc1 (ORCL:OCR_VOTE01) [CRS]
     2. ONLINE   7afb21a9c8c94f27bf6aecb55d4e6e58 (ORCL:OCR_VOTE02) [CRS]
     3. ONLINE   77c7c00e799a4f47bf41316f27a1d95c (ORCL:OCR_VOTE03) [CRS]
    Located 3 voting disk(s).
    

    我检测发现crsct status res -t -init 执行ok,执行crsctl status res -t报错。由于客户使用的是asmlib,我检查发现asmlib的disk都是正常的,如下:

    [grid@tysy1 bin]$  /etc/init.d/oracleasm  listdisks
    ARCHVOL
    DATAVOL01
    DATAVOL02
    DATAVOL03
    DATAVOL04
    FRAVOL
    OCR_VOTE01
    OCR_VOTE02
    OCR_VOTE03
    
    [root@tysy1 disks]# /etc/init.d/oracleasm querydisk -d OCR_VOTE01
    Disk "OCR_VOTE01" is a valid ASM disk on device [8,21]
    [root@tysy1 disks]# /etc/init.d/oracleasm querydisk -d OCR_VOTE02
    Disk "OCR_VOTE02" is a valid ASM disk on device [8,22]
    [root@tysy1 disks]# /etc/init.d/oracleasm querydisk -d OCR_VOTE03
    Disk "OCR_VOTE03" is a valid ASM disk on device [8,23]
    [root@tysy1 disks]# ls -ltr /dev/sdb*
    brw-rw---- 1 root disk 8, 16 Sep  1 20:23 /dev/sdb
    brw-rw---- 1 root disk 8, 17 Sep  1 20:23 /dev/sdb1
    brw-rw---- 1 root disk 8, 21 Sep  1 20:23 /dev/sdb5
    brw-rw---- 1 root disk 8, 22 Sep  1 20:23 /dev/sdb6
    brw-rw---- 1 root disk 8, 23 Sep  1 20:23 /dev/sdb7
    brw-rw---- 1 root disk 8, 24 Sep  1 20:23 /dev/sdb8
    brw-rw---- 1 root disk 8, 25 Sep  1 20:23 /dev/sdb9
    brw-rw---- 1 root disk 8, 26 Sep  1 20:23 /dev/sdb10
    brw-rw---- 1 root disk 8, 27 Sep  1 20:23 /dev/sdb11
    brw-rw---- 1 root disk 8, 28 Sep  1 20:23 /dev/sdb12

    检查asmlib的状态都是ok的,同时在节点1的查看asm disk,发现crs磁盘组都无法识别:

    [grid@tysy1 bin]$ 
    
    NAME       PATH              STATE    GROUP_NUMBER DISK_NUMBER MOUNT_S HEADER_STATU MODE_ST LABEL
    ---------- ----------------- -------- ------------ ----------- ------- ------------ ------- ------------
               ORCL:OCR_VOTE01   NORMAL              0           0 CLOSED  MEMBER       ONLINE  OCR_VOTE01
               ORCL:OCR_VOTE02   NORMAL              0           1 CLOSED  MEMBER       ONLINE  OCR_VOTE02
               ORCL:OCR_VOTE03   NORMAL              0           2 CLOSED  MEMBER       ONLINE  OCR_VOTE03
    ARCHVOL    ORCL:ARCHVOL      NORMAL              1           0 CACHED  MEMBER       ONLINE  ARCHVOL
    DATAVOL01  ORCL:DATAVOL01    NORMAL              5           0 CACHED  MEMBER       ONLINE  DATAVOL01
    DATAVOL02  ORCL:DATAVOL02    NORMAL              5           1 CACHED  MEMBER       ONLINE  DATAVOL02
    DATAVOL03  ORCL:DATAVOL03    NORMAL              6           0 CACHED  MEMBER       ONLINE  DATAVOL03
    DATAVOL04  ORCL:DATAVOL04    NORMAL              6           1 CACHED  MEMBER       ONLINE  DATAVOL04
    FRAVOL     ORCL:FRAVOL       NORMAL              7           0 CACHED  MEMBER       ONLINE  FRAVOL
    

    同时查看crs日志也会发现Oracle 认为crs磁盘中无法访问:

    015-09-02 14:06:19.302: [OCRCHECK][1069037312]ocrcheck starts...
    2015-09-02 14:06:19.369: [  OCRASM][1069037312]proprasmo: kgfoCheckMount return [6]. Cannot proceed with dirty open.
    2015-09-02 14:06:19.369: [  OCRASM][1069037312]proprasmo: Error in open/create file in dg [CRS]
    [  OCRASM][1069037312]SLOS : SLOS: cat=6, opn=kgfo, dep=0, loc=kgfoCkMt03
    
    2015-09-02 14:06:19.369: [  OCRASM][1069037312]ASM Error Stack :
    2015-09-02 14:06:19.391: [  OCRASM][1069037312]proprasmo: kgfoCheckMount returned [6]
    2015-09-02 14:06:19.391: [  OCRASM][1069037312]proprasmo: The ASM disk group CRS is not found or not mounted
    2015-09-02 14:06:19.392: [  OCRRAW][1069037312]proprioo: Failed to open [+CRS]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
    2015-09-02 14:06:19.392: [  OCRRAW][1069037312]proprioo: No OCR/OLR devices are usable
    2015-09-02 14:06:19.392: [  OCRASM][1069037312]proprasmcl: asmhandle is NULL
    2015-09-02 14:06:19.392: [  OCRRAW][1069037312]proprinit: Could not open raw device
    2015-09-02 14:06:19.392: [  OCRASM][1069037312]proprasmcl: asmhandle is NULL
    2015-09-02 14:06:19.392: [ default][1069037312]a_init:7!: Backend init unsuccessful : [26]
    2015-09-02 14:06:19.392: [OCRCHECK][1069037312]initreboot: Failed to initialize OCR in REBOOT level. Retval:[26] Error:[PROC-26: Error while accessing the physical storage
    ]
    [OCRCHECK][1069037312]initreboot: Attempting to initialize OCR in DEFAULT level and update a key so that vote information is updated.
    2015-09-02 14:06:19.393: [  OCRMSG][1069037312]prom_waitconnect: CONN NOT ESTABLISHED (0,29,1,2)
    2015-09-02 14:06:19.393: [  OCRMSG][1069037312]GIPC error [29] msg [gipcretConnectionRefused]
    2015-09-02 14:06:19.393: [  OCRMSG][1069037312]prom_connect: error while waiting for connection complete [24]
    2015-09-02 14:06:19.394: [OCRCHECK][1069037312]initreboot: Failed to initialize OCR in DEFAULT level. Retval:[32] Error:[PROC-32: Cluster Ready Services on the local node is not running Messaging error [gipcretConnectionRefused] [29]]
    2015-09-02 14:06:19.394: [OCRCHECK][1069037312]Failed to access OCR repository. Retval [26] Error [PROC-26: Error while accessing the physical storage
    ]
    2015-09-02 14:06:19.394: [OCRCHECK][1069037312]Failed to initialize ocrchek2
    2015-09-02 14:06:19.394: [OCRCHECK][1069037312]Exiting [status=failed]...

    到这里完全没有任何思路,我们还是来仔细分析下asm alert log,如下所示:

    Tue Sep 01 23:07:03 2015
    NOTE: process _b000_+asm1 (19088) initiating offline of disk 0.3915955003 (OCR_VOTE01) with mask 0x7e in group 4
    NOTE: process _b000_+asm1 (19088) initiating offline of disk 1.3915955004 (OCR_VOTE02) with mask 0x7e in group 4
    NOTE: process _b000_+asm1 (19088) initiating offline of disk 2.3915955005 (OCR_VOTE03) with mask 0x7e in group 4
    NOTE: checking PST: grp = 4
    GMON checking disk modes for group 4 at 25 for pid 30, osid 19088
    ERROR: no read quorum in group: required 2, found 0 disks
    NOTE: checking PST for grp 4 done.
    NOTE: initiating PST update: grp = 4, dsk = 0/0xe968bb3b, mask = 0x6a, op = clear
    NOTE: initiating PST update: grp = 4, dsk = 1/0xe968bb3c, mask = 0x6a, op = clear
    NOTE: initiating PST update: grp = 4, dsk = 2/0xe968bb3d, mask = 0x6a, op = clear
    GMON updating disk modes for group 4 at 26 for pid 30, osid 19088
    ERROR: no read quorum in group: required 2, found 0 disks
    Tue Sep 01 23:07:03 2015
    NOTE: cache dismounting (not clean) group 4/0xD7184BC4 (CRS)
    WARNING: Offline for disk OCR_VOTE01 in mode 0x7f failed.
    WARNING: Offline for disk OCR_VOTE02 in mode 0x7f failed.
    WARNING: Offline for disk OCR_VOTE03 in mode 0x7f failed.
    NOTE: messaging CKPT to quiesce pins Unix process pid: 19090, image: oracle@tysy1 (B001)
    Tue Sep 01 23:07:03 2015
    NOTE: halting all I/Os to diskgroup 4 (CRS)
    Tue Sep 01 23:07:03 2015
    NOTE: LGWR doing non-clean dismount of group 4 (CRS)
    NOTE: LGWR sync ABA=15.189 last written ABA 15.189
    Tue Sep 01 23:07:03 2015
    kjbdomdet send to inst 2
    detach from dom 4, sending detach message to inst 2
    Tue Sep 01 23:07:03 2015
    List of instances:
     1 2
    Dirty detach reconfiguration started (new ddet inc 1, cluster inc 4)
     Global Resource Directory partially frozen for dirty detach
    * dirty detach - domain 4 invalid = TRUE
     0 GCS resources traversed, 0 cancelled
    Dirty Detach Reconfiguration complete
    Tue Sep 01 23:07:03 2015
    WARNING: dirty detached from domain 4
    NOTE: cache dismounted group 4/0xD7184BC4 (CRS)
    SQL> alter diskgroup CRS dismount force /* ASM SERVER:3608693700 */
    Tue Sep 01 23:07:03 2015
    NOTE: cache deleting context for group CRS 4/0xd7184bc4
    GMON dismounting group 4 at 27 for pid 35, osid 19090
    NOTE: Disk OCR_VOTE01 in mode 0x7f marked for de-assignment
    NOTE: Disk OCR_VOTE02 in mode 0x7f marked for de-assignment
    NOTE: Disk OCR_VOTE03 in mode 0x7f marked for de-assignment
    NOTE:Waiting for all pending writes to complete before de-registering: grpnum 4
    Tue Sep 01 23:07:18 2015
    ASM Health Checker found 1 new failures
    Tue Sep 01 23:07:33 2015
    SUCCESS: diskgroup CRS was dismounted
    SUCCESS: alter diskgroup CRS dismount force /* ASM SERVER:3608693700 */
    Tue Sep 01 23:07:33 2015
    NOTE: diskgroup resource ora.CRS.dg is offline
    SUCCESS: ASM-initiated MANDATORY DISMOUNT of group CRS
    Tue Sep 01 23:07:33 2015
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_4208.trc:
    ORA-15078: ASM diskgroup was forcibly dismounted
    ......
    ......
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_4208.trc:
    ORA-15078: ASM diskgroup was forcibly dismounted
    Wed Sep 02 00:40:16 2015
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_4208.trc:
    ORA-15078: ASM diskgroup was forcibly dismounted
    WARNING: requested mirror side 1 of virtual extent 4 logical extent 0 offset 294912 is not allocated; I/O request failed
    WARNING: requested mirror side 2 of virtual extent 4 logical extent 1 offset 294912 is not allocated; I/O request failed
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_4208.trc:
    ORA-15078: ASM diskgroup was forcibly dismounted
    ORA-15078: ASM diskgroup was forcibly dismounted
    Wed Sep 02 00:40:16 2015
    SQL> alter diskgroup CRS check /* proxy */
    ORA-15032: not all alterations performed
    ORA-15001: diskgroup "CRS" does not exist or is not mounted
    ERROR: alter diskgroup CRS check /* proxy */
    Wed Sep 02 00:41:33 2015
    NOTE: client exited [4197]
    Wed Sep 02 00:41:34 2015
    NOTE: [crsd.bin@tysy1 (TNS V1-V3) 13821] opening OCR file
    Wed Sep 02 00:41:36 2015
    NOTE: [crsd.bin@tysy1 (TNS V1-V3) 13834] opening OCR file
    ......
    ......
    Wed Sep 02 14:06:08 2015
    NOTE: [ocrcheck.bin@tysy1 (TNS V1-V3) 16749] opening OCR file
    Wed Sep 02 14:06:19 2015
    NOTE: [ocrcheck.bin@tysy1 (TNS V1-V3) 16815] opening OCR file
    Wed Sep 02 14:13:58 2015
    NOTE: [ocrcheck.bin@tysy1 (TNS V1-V3) 18667] opening OCR file
    Wed Sep 02 14:17:52 2015
    NOTE: [ocrcheck.bin@tysy1 (TNS V1-V3) 19633] opening OCR file

    我们可以发现,asm做了一个check,发现失败(ASM Health Checker found 1 new failures),然后就强行把crs 磁盘组dismount了。我们进一步来分析下trace 文件的内容:

    [grid@tysy1 client]$ cat /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_4208.trc
    Trace file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_4208.trc
    Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
    With the Real Application Clusters and Automatic Storage Management options
    ORACLE_HOME = /u01/app/11.2.0/grid
    System name:    Linux
    Node name:      tysy1
    Release:        2.6.32-358.el6.x86_64
    Version:        #1 SMP Tue Jan 29 11:47:41 EST 2013
    Machine:        x86_64
    VM name:        VMWare Version: 6
    Instance name: +ASM1
    Redo thread mounted by this instance: 0 <none>
    Oracle process number: 26
    Unix process pid: 4208, image: oracle@tysy1 (TNS V1-V3)
    
    *** 2015-09-01 23:07:33.936
    *** SESSION ID:(289.3) 2015-09-01 23:07:33.936
    *** CLIENT ID:() 2015-09-01 23:07:33.936
    *** SERVICE NAME:() 2015-09-01 23:07:33.936
    *** MODULE NAME:(crsd.bin@tysy1 (TNS V1-V3)) 2015-09-01 23:07:33.936
    *** ACTION NAME:() 2015-09-01 23:07:33.936
    
    WARNING:failed xlate 1
    ORA-15078: ASM diskgroup was forcibly dismounted
    ......
    ......
    WARNING:failed xlate 1
    ORA-15078: ASM diskgroup was forcibly dismounted
    
    *** 2015-09-02 00:40:16.217
    WARNING:failed xlate 1
    ORA-15078: ASM diskgroup was forcibly dismounted
    ksfdrfms:Mirror Read file=+CRS.255.4294967295 fob=0x9f403518 bufp=0x7f48c1381a00 blkno=1096 nbytes=4096
    WARNING:failed xlate 1
    WARNING: requested mirror side 1 of virtual extent 4 logical extent 0 offset 294912 is not allocated; I/O request failed
    ksfdrfms:Read failed from mirror side=1 logical extent number=0 dskno=65535
    WARNING:failed xlate 1
    WARNING: requested mirror side 2 of virtual extent 4 logical extent 1 offset 294912 is not allocated; I/O request failed
    ksfdrfms:Read failed from mirror side=2 logical extent number=1 dskno=65535
    ORA-15078: ASM diskgroup was forcibly dismounted
    ORA-15078: ASM diskgroup was forcibly dismounted

    进一步我分析gmon进程trace 文件内容发现其中存在call stack的内容,如下:

    ---+ASM1_gmon_4091.trc
    *** 2015-09-01 22:55:11.686
    ERROR: GMON could not validate any PST Hbeat (grp 6) for Validate
    2015-09-01 22:55:11.686182 : kfdpHbeatblk_transfer: Found pending Hbeat I/O in grp=6 @
            HBeatCblk=0x7f2bf4f337f8 iop=0x7f2bf4f33808 iostate=3.
    kfdpHbeatCB_dump: at 0x7f2bf4f337f8 with ts=09/01/2015 22:55:11 iop=0x7f2bf4f33808, grp=6, disk=0/3915955008, isWrite=0 Hbeat #0 state=3 iostate=3
    ----- Abridged Call Stack Trace -----
    ksedsts()+465<-kfdpHbeatblk_transfer()+953<-kfdpHbeat_doCheck()+662<-kfdpHbeat_timeout()+854<-kfdp_timeoutBg()+1642<-ksbcti()+513<-ksbabs()+1735<-ksbrdp()+1045<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201<-__libc_start_main()+253
    
    ----- End of Abridged Call Stack Trace -----
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
    
    *** 2015-09-01 23:07:03.571
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 4.
    WARNING: Waited 15 secs for write IO to PST disk 1 in group 4.
    WARNING: Waited 15 secs for write IO to PST disk 2 in group 4.
    NOTE: Set to be offline flag for disk OCR_VOTE01 only locally: flag 0x3211
    NOTE: Set to be offline flag for disk OCR_VOTE02 only locally: flag 0x3211
    NOTE: Set to be offline flag for disk OCR_VOTE03 only locally: flag 0x3211
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 4.
    WARNING: Waited 15 secs for write IO to PST disk 1 in group 4.
    WARNING: Waited 15 secs for write IO to PST disk 2 in group 4.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 5.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 5.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 6.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 6.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 7.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 7.
    ----- Abridged Call Stack Trace -----
    ksedsts()+465<-kfdpGc_doTobeoflnAsync()+37<-kfdpGc_checkTobeofln()+623<-kfdpGc_timeout()+17<-kfdp_timeoutBg()+1657<-ksbcti()+513<-ksbabs()+1735<-ksbrdp()+1045<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201<-__libc_start_main()+253
    。。。。。。
    。。。。。。
    --------------- HBEAT ------------------
    kfdpHbeat_dump: state=1, inst=1, ts=33023031.450354176,
            rnd=1036706912.2853146286.3238582348.2977351792.
    kfk io-queue:    (nil)
    kfdpHbeatCB_dump: at 0x7f2bf4d28248 with ts=09/01/2015 23:06:47 iop=0x7f2bf4d28258, grp=4, disk=0/3915955003, isWrite=1 Hbeat #17 state=4 iostate=3
    kfdpHbeatCB_dump: at 0x7f2bf4d28080 with ts=09/01/2015 23:06:47 iop=0x7f2bf4d28090, grp=4, disk=1/3915955004, isWrite=1 Hbeat #17 state=4 iostate=3
    kfdpHbeatCB_dump: at 0x7f2bf4d27eb8 with ts=09/01/2015 23:06:47 iop=0x7f2bf4d27ec8, grp=4, disk=2/3915955005, isWrite=1 Hbeat #17 state=4 iostate=3
    NOTE: kfdp_updateInt: forceDismount grp 4
    NOTE: GMON: failed to update modes: triggering force dismount of group 4
    2015-09-01 23:07:03.625783 : kfdpHbeatblk_transfer: Found pending Hbeat I/O in grp=4 @
            HBeatCblk=0x7f2bf4d28248 iop=0x7f2bf4d28258 iostate=3.
    kfdpHbeatCB_dump: at 0x7f2bf4d28248 with ts=09/01/2015 23:06:47 iop=0x7f2bf4d28258, grp=4, disk=0/3915955003, isWrite=1 Hbeat #17 state=4 iostate=3
    2015-09-01 23:07:03.625823 : kfdpHbeatblk_transfer: Found pending Hbeat I/O in grp=4 @
            HBeatCblk=0x7f2bf4d28080 iop=0x7f2bf4d28090 iostate=3.
    kfdpHbeatCB_dump: at 0x7f2bf4d28080 with ts=09/01/2015 23:06:47 iop=0x7f2bf4d28090, grp=4, disk=1/3915955004, isWrite=1 Hbeat #17 state=4 iostate=3
    2015-09-01 23:07:03.625844 : kfdpHbeatblk_transfer: Found pending Hbeat I/O in grp=4 @
            HBeatCblk=0x7f2bf4d27eb8 iop=0x7f2bf4d27ec8 iostate=3.
    kfdpHbeatCB_dump: at 0x7f2bf4d27eb8 with ts=09/01/2015 23:06:47 iop=0x7f2bf4d27ec8, grp=4, disk=2/3915955005, isWrite=1 Hbeat #17 state=4 iostate=3
    ----- Abridged Call Stack Trace -----
    ksedsts()+465<-kfdpHbeatblk_transfer()+953<-kfdp_forceDismountAsyncGmon()+619<-kfdp_updateInt()+664<-kfdp_updateDskBg()+814<-ksbabs()+771<-ksbrdp()+1045<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201<-__libc_start_main()+253
    
    ----- End of Abridged Call Stack Trace -----
    
    *** 2015-09-01 23:07:03.906
    GMON dismounting group 4 at 27 for pid 35, osid 19090
    NOTE: kfdp_doDismount: dismount grp 4
    
    *** 2015-09-01 23:07:06.609
    NOTE: kfdpUtil_freeSlMsg: ksvtst for grp: 4 KSV status 0x10
    
    *** 2015-09-01 23:07:36.732
    NOTE: kfdpUtil_freeSlMsg: ksvtst for grp: 4 KSV status 0x10
    ERROR: GMON could not validate any PST Hbeat (grp 1) for Validate
    
    *** 2015-09-01 23:11:03.981
    2015-09-01 23:11:03.981362 : kfdpHbeatblk_transfer: Found pending Hbeat I/O in grp=1 @
            HBeatCblk=0x7f2bf4f32f10 iop=0x7f2bf4f32f20 iostate=3.
    kfdpHbeatCB_dump: at 0x7f2bf4f32f10 with ts=09/01/2015 23:11:03 iop=0x7f2bf4f32f20, grp=1, disk=0/3915955002, isWrite=0 Hbeat #0 state=3 iostate=3
    ----- Abridged Call Stack Trace -----
    ksedsts()+465<-kfdpHbeatblk_transfer()+953<-kfdpHbeat_doCheck()+662<-kfdpHbeat_timeout()+854<-kfdp_timeoutBg()+1642<-ksbcti()+513<-ksbabs()+1735<-ksbrdp()+1045<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201<-__libc_start_main()+253
    
    ----- End of Abridged Call Stack Trace -----

    根据上述内容,分析发现与Oracle  Bug 18694414 完全一致。如下是该bug的call stack的描述:

    ksedsts()+465<-kfdpHbeatblk_transfer()+953<-kfdpHbeat_doCheck()+662<-kfdpHbeat
    @ _timeout()+854<-kfdp_timeoutBg()+1642<-ksbcti()+513<-ksbabs()+1735<-ksbrdp()+1
    @ 045<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+
    @ 265<-main()+201<-__libc_start_main()+253

    根据Oracle mos文档的描述,建议将_asm_hbeatiowait 参数调整为185,因为LInux环境disk 默认的timeout为180,如下:

    [root@tysy1 ~]# cat /sys/block/sdb/device/timeout
    180
    [root@tysy1 ~]#

    Related posts:

    1. How to extract datafiles from asm diskgroup?
    2. 1.4TB ASM(RAC) 磁盘损坏恢复小记
    3. Oracle ASM crash & ORA-15196
    4. ORA-15196: invalid ASM block header [kfc.c:26076] [hard_kfbh]
    5. ocssd.log:clssgmpcBuildNodeList: nodename for node x is NULL


沪ICP备19023445号-2号
友情链接