IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    asm 加磁盘导致磁盘组损坏恢复

    惜分飞发表于 2017-03-08 15:02:24
    love 0

    联系:手机(+86 13429648788) QQ(107644445)QQ咨询惜分飞

    标题:asm 加磁盘导致磁盘组损坏恢复

    作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

    接到客户恢复case请求,希望我们接入恢复数据。大概过程是这样的,16年9月份由于硬件问题,导致normal磁盘组(只有2个磁盘)中的一个磁盘丢失,然后在17年3月6日,运维方尝试增加该磁盘进入磁盘组,结果通过force命令加入成功之后,磁盘组dismount,然后再也无法mount成功。
    磁盘组创建信息

    Fri Jun 24 19:31:38 2016
    NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 2
    SUCCESS: diskgroup DATADG was mounted
    SUCCESS: CREATE DISKGROUP DATADG NORMAL REDUNDANCY  DISK '/dev/asm-diskdata01' SIZE 1048576M ,
    '/dev/asm-diskdata02' SIZE 1048576M  ATTRIBUTE 'compatible.asm'='11.2.0.0.0','au_size'='4M' /* ASMCA */
    

    这里可以看出来datadg是一个normal的au为4M的一个磁盘组

    自动drop异常asm disk

    Mon Sep 12 11:41:54 2016
    WARNING: Waited 15 secs for write IO to PST disk 1 in group 1.
    WARNING: Waited 15 secs for write IO to PST disk 1 in group 1.
    Mon Sep 12 11:41:55 2016
    NOTE: process _b000_+asm1 (19491) initiating offline of disk 1.3915923833 (DATADG_0001) with mask 0x7e in group 1
    NOTE: checking PST: grp = 1
    GMON checking disk modes for group 1 at 9 for pid 29, osid 19491
    NOTE: group DATADG: updated PST location: disk 0000 (PST copy 0)
    NOTE: checking PST for grp 1 done.
    NOTE: sending set offline flag message 2870990318 to 1 disk(s) in group 1
    WARNING: Disk DATADG_0001 in mode 0x7f is now being offlined
    NOTE: initiating PST update: grp = 1, dsk = 1/0xe9684179, mask = 0x6a, op = clear
    GMON updating disk modes for group 1 at 10 for pid 29, osid 19491
    NOTE: group DATADG: updated PST location: disk 0000 (PST copy 0)
    NOTE: group DATADG: updated PST location: disk 0000 (PST copy 0)
    NOTE: PST update grp = 1 completed successfully 
    NOTE: initiating PST update: grp = 1, dsk = 1/0xe9684179, mask = 0x7e, op = clear
    GMON updating disk modes for group 1 at 11 for pid 29, osid 19491
    NOTE: group DATADG: updated PST location: disk 0000 (PST copy 0)
    NOTE: group DATADG: updated PST location: disk 0000 (PST copy 0)
    NOTE: cache closing disk 1 of grp 1: DATADG_0001
    NOTE: PST update grp = 1 completed successfully 
    Mon Sep 12 11:42:55 2016
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
    Mon Sep 12 11:44:58 2016
    WARNING: PST-initiated drop of 1 disk(s) in group 1(.1137226115))
    SQL> alter diskgroup DATADG drop disk DATADG_0001 force /* ASM SERVER */ 
    NOTE: GroupBlock outside rolling migration privileged region
    NOTE: requesting all-instance membership refresh for group=1
    Mon Sep 12 11:44:59 2016
    GMON updating for reconfiguration, group 1 at 12 for pid 29, osid 19491
    NOTE: cache closing disk 1 of grp 1: (not open) DATADG_0001
    NOTE: group DATADG: updated PST location: disk 0000 (PST copy 0)
    NOTE: group 1 PST updated.
    Mon Sep 12 11:44:59 2016
    NOTE: membership refresh pending for group 1/0x43c8b183 (DATADG)
    Mon Sep 12 11:45:02 2016
    NOTE: successfully read ACD block gn=1 blk=0 via retry read
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lgwr_3526.trc:
    ORA-15062: ASM disk is globally closed
    GMON querying group 1 at 13 for pid 18, osid 3532
    NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
    SUCCESS: refreshed membership for 1/0x43c8b183 (DATADG)
    SUCCESS: alter diskgroup DATADG drop disk DATADG_0001 force /* ASM SERVER */
    NOTE: starting rebalance of group 1/0x43c8b183 (DATADG) at power 1
    SUCCESS: PST-initiated drop disk in group 1(1137226115))
    Starting background process ARB0
    Mon Sep 12 11:45:03 2016
    ARB0 started with pid=35, OS id=19945 
    NOTE: assigning ARB0 to group 1/0x43c8b183 (DATADG) with 1 parallel I/O
    cellip.ora not found.
    NOTE: Rebalance has restored redundancy for any existing control file or redo log in disk group DATADG
    NOTE: Attempting voting file refresh on diskgroup DATADG
    NOTE: Refresh completed on diskgroup DATADG. No voting file found.
    Mon Sep 12 11:46:21 2016
    NOTE: GroupBlock outside rolling migration privileged region
    NOTE: requesting all-instance membership refresh for group=1
    Mon Sep 12 11:46:24 2016
    GMON updating for reconfiguration, group 1 at 14 for pid 36, osid 20110
    NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
    NOTE: group 1 PST updated.
    WARNING: offline disk number 1 has references (54679 AUs)
    Mon Sep 12 11:46:24 2016
    NOTE: membership refresh pending for group 1/0x43c8b183 (DATADG)
    GMON querying group 1 at 15 for pid 18, osid 3532
    NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
    SUCCESS: refreshed membership for 1/0x43c8b183 (DATADG)
    NOTE: Attempting voting file refresh on diskgroup DATADG
    NOTE: Refresh completed on diskgroup DATADG. No voting file found.
    NOTE: stopping process ARB0
    SUCCESS: rebalance completed for group 1/0x43c8b183 (DATADG)
    

    这里我们可以看出来磁盘组在2016年9月12日由于disk 1 无法响应,直接被asm 踢出了磁盘组

    把被强制删除的磁盘重新加回去

    Mon Mar 06 15:36:54 2017
    SQL> alter diskgroup DATADG add disk '/dev/asm-diskdata01' name DATADG_0000 
    NOTE: GroupBlock outside rolling migration privileged region
    ORA-15032: not all alterations performed
    ORA-15029: disk '/dev/asm-diskdata01' is already mounted by this instance
    ERROR: alter diskgroup DATADG add disk '/dev/asm-diskdata01' name DATADG_0000
    Mon Mar 06 15:38:27 2017
    SQL>  alter diskgroup DATADG add disk '/dev/asm-diskdata02' name DATADG_0001 
    NOTE: GroupBlock outside rolling migration privileged region
    NOTE: Assigning number (1,2) to disk (/dev/asm-diskdata02)
    NOTE: requesting all-instance membership refresh for group=1
    NOTE: Disk DATADG_0001 in mode 0x7f marked for de-assignment
    ERROR: ORA-15033 signalled during reconfiguration of diskgroup DATADG
    Mon Mar 06 15:38:28 2017
    NOTE: membership refresh pending for group 1/0x31584f6b (DATADG)
    Mon Mar 06 15:38:31 2017
    GMON querying group 1 at 7 for pid 18, osid 3468
    NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
    GMON querying group 1 at 8 for pid 18, osid 3468
    NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
    SUCCESS: refreshed membership for 1/0x31584f6b (DATADG)
    ORA-15032: not all alterations performed
    ORA-15033: disk '/dev/asm-diskdata02' belongs to diskgroup "DATADG"
    ERROR:  alter diskgroup DATADG add disk '/dev/asm-diskdata02' name DATADG_0001
    NOTE: Attempting voting file refresh on diskgroup DATADG
    NOTE: Refresh completed on diskgroup DATADG. No voting file found.
    Mon Mar 06 16:04:14 2017
    SQL> alter diskgroup DATADG add disk '/dev/asm-diskdata02' name DATADG_0001 
    NOTE: GroupBlock outside rolling migration privileged region
    NOTE: Assigning number (1,2) to disk (/dev/asm-diskdata02)
    NOTE: requesting all-instance membership refresh for group=1
    NOTE: Disk DATADG_0001 in mode 0x7f marked for de-assignment
    ERROR: ORA-15033 signalled during reconfiguration of diskgroup DATADG
    Mon Mar 06 16:04:15 2017
    NOTE: membership refresh pending for group 1/0x31584f6b (DATADG)
    Mon Mar 06 16:04:18 2017
    GMON querying group 1 at 9 for pid 18, osid 3468
    NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
    GMON querying group 1 at 10 for pid 18, osid 3468
    NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
    SUCCESS: refreshed membership for 1/0x31584f6b (DATADG)
    ORA-15032: not all alterations performed
    ORA-15033: disk '/dev/asm-diskdata02' belongs to diskgroup "DATADG"
    ERROR: alter diskgroup DATADG add disk '/dev/asm-diskdata02' name DATADG_0001
    NOTE: Attempting voting file refresh on diskgroup DATADG
    NOTE: Refresh completed on diskgroup DATADG. No voting file found.
    Mon Mar 06 16:23:28 2017
    SQL> alter diskgroup DATADG add FAILGROUP DATA_0001 disk '/dev/adm-diskdata02' name DATA_0001 
    NOTE: GroupBlock outside rolling migration privileged region
    ORA-15032: not all alterations performed
    ORA-15031: disk specification '/dev/adm-diskdata02' matches no disks
    ERROR: alter diskgroup DATADG add FAILGROUP DATA_0001 disk '/dev/adm-diskdata02' name DATA_0001
    Mon Mar 06 16:24:48 2017
    SQL> alter diskgroup DATADG add FAILGROUP DATA_0001 disk '/dev/asm-diskdata02' name DATA_0001 
    NOTE: GroupBlock outside rolling migration privileged region
    NOTE: Assigning number (1,2) to disk (/dev/asm-diskdata02)
    NOTE: requesting all-instance membership refresh for group=1
    NOTE: Disk DATA_0001 in mode 0x7f marked for de-assignment
    ERROR: ORA-15033 signalled during reconfiguration of diskgroup DATADG
    Mon Mar 06 16:24:49 2017
    NOTE: membership refresh pending for group 1/0x31584f6b (DATADG)
    Mon Mar 06 16:24:52 2017
    GMON querying group 1 at 11 for pid 18, osid 3468
    NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
    GMON querying group 1 at 12 for pid 18, osid 3468
    NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
    SUCCESS: refreshed membership for 1/0x31584f6b (DATADG)
    ORA-15032: not all alterations performed
    ORA-15033: disk '/dev/asm-diskdata02' belongs to diskgroup "DATADG"
    ERROR: alter diskgroup DATADG add FAILGROUP DATA_0001 disk '/dev/asm-diskdata02' name DATA_0001
    NOTE: Attempting voting file refresh on diskgroup DATADG
    NOTE: Refresh completed on diskgroup DATADG. No voting file found.
    Mon Mar 06 16:26:07 2017
    SQL> alter diskgroup DATADG add FAILGROUP DATA_0001 disk '/dev/asm-diskdata02' name DATA_0001 force  
    NOTE: GroupBlock outside rolling migration privileged region
    NOTE: Assigning number (1,2) to disk (/dev/asm-diskdata02)
    NOTE: requesting all-instance membership refresh for group=1
    NOTE: initializing header on grp 1 disk DATA_0001
    NOTE: requesting all-instance disk validation for group=1
    Mon Mar 06 16:26:10 2017
    NOTE: skipping rediscovery for group 1/0x31584f6b (DATADG) on local instance.
    NOTE: requesting all-instance disk validation for group=1
    NOTE: skipping rediscovery for group 1/0x31584f6b (DATADG) on local instance.
    Mon Mar 06 16:26:15 2017
    GMON updating for reconfiguration, group 1 at 13 for pid 28, osid 12861
    NOTE: group 1 PST updated.
    NOTE: initiating PST update: grp = 1
    GMON updating group 1 at 14 for pid 28, osid 12861
    NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
    NOTE: group DATADG: updated PST location: disk 0000 (PST copy 0)
    NOTE: group DATADG: updated PST location: disk 0002 (PST copy 1)
    NOTE: PST update grp = 1 completed successfully 
    NOTE: membership refresh pending for group 1/0x31584f6b (DATADG)
    GMON querying group 1 at 15 for pid 18, osid 3468
    NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
    NOTE: cache opening disk 2 of grp 1: DATA_0001 path:/dev/asm-diskdata02
    NOTE: Attempting voting file refresh on diskgroup DATADG
    NOTE: Refresh completed on diskgroup DATADG. No voting file found.
    GMON querying group 1 at 16 for pid 18, osid 3468
    NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
    SUCCESS: refreshed membership for 1/0x31584f6b (DATADG)
    Mon Mar 06 16:26:19 2017
    SUCCESS: alter diskgroup DATADG add FAILGROUP DATA_0001 disk '/dev/asm-diskdata02' name DATA_0001 force 
    NOTE: starting rebalance of group 1/0x31584f6b (DATADG) at power 1
    Mon Mar 06 16:26:20 2017
    Starting background process ARB0
    Mon Mar 06 16:26:20 2017
    ARB0 started with pid=32, OS id=25833 
    NOTE: assigning ARB0 to group 1/0x31584f6b (DATADG) with 1 parallel I/O
    cellip.ora not found.
    WARNING:cache read  a corrupt block: group=1(DATADG) dsk=0 blk=0 disk=0 
            (DATADG_0000)incarn=3915956130 au=0 blk=0 count=1
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_25833.trc:
    ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483648] [0] [0 != 1]
    NOTE:a corrupted block from group DATADG was dumped to /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_25833.trc
    WARNING:cache read (retry)a corrupt block:group=1(DATADG) 
             dsk=0 blk=0 disk=0(DATADG_0000)incarn=3915956130 au=0 blk=0 count=1
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_25833.trc:
    ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483648] [0] [0 != 1]
    ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483648] [0] [0 != 1]
    ERROR: cache failed to read group=1(DATADG) dsk=0 blk=0 from disk(s): 0(DATADG_0000)
    ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483648] [0] [0 != 1]
    ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483648] [0] [0 != 1]
    NOTE: cache initiating offline of disk 0 group DATADG
    NOTE:process _arb0_+asm1 (25833) initiating offline of disk 0.3915956130(DATADG_0000)with mask 0x7e in group 1
    NOTE: checking PST: grp = 1
    GMON checking disk modes for group 1 at 17 for pid 32, osid 25833
    NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
    ERROR: too many offline disks in PST (grp 1)
    NOTE: checking PST for grp 1 done.
    NOTE: initiating PST update: grp = 1, dsk = 0/0xe968bfa2, mask = 0x6a, op = clear
    GMON updating disk modes for group 1 at 18 for pid 32, osid 25833
    NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
    ERROR: Disk 0 cannot be offlined, since all the disks [0, 1] with mirrored data would be offline.
    ERROR: too many offline disks in PST (grp 1)
    Mon Mar 06 16:26:23 2017
    NOTE: cache dismounting (not clean) group 1/0x31584F6B (DATADG) 
    NOTE: messaging CKPT to quiesce pins Unix process pid: 25889, image: oracle@DBN01 (B000)
    Mon Mar 06 16:26:23 2017
    NOTE: halting all I/Os to diskgroup 1 (DATADG)
    Mon Mar 06 16:26:23 2017
    NOTE: LGWR doing non-clean dismount of group 1 (DATADG)
    NOTE: LGWR sync ABA=19.2851 last written ABA 19.2851
    WARNING: Offline for disk DATADG_0000 in mode 0x7f failed.
    Mon Mar 06 16:26:23 2017
    kjbdomdet send to inst 2
    detach from dom 1, sending detach message to inst 2
    Mon Mar 06 16:26:23 2017
    List of instances:
     1 2
    Dirty detach reconfiguration started (new ddet inc 1, cluster inc 8)
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_25833.trc  (incident=65537):
    ORA-15335: ASM metadata corruption detected in disk group 'DATADG'
    ORA-15130: diskgroup "DATADG" is being dismounted
    ORA-15066: offlining disk "DATADG_0000" in group "DATADG" may result in a data loss
    ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483648] [0] [0 != 1]
    ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483648] [0] [0 != 1]
    Incident details in:/u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_65537/+ASM1_arb0_25833_i65537.trc
     Global Resource Directory partially frozen for dirty detach
    * dirty detach - domain 1 invalid = TRUE 
     3189 GCS resources traversed, 0 cancelled
    Dirty Detach Reconfiguration complete
    ERROR: ORA-15130 in COD recovery for diskgroup 1/0x31584f6b (DATADG)
    ERROR: ORA-15130 thrown in RBAL for group number 1
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_3468.trc:
    ORA-15130: diskgroup "DATADG" is being dismounted
    Mon Mar 06 16:26:23 2017
    WARNING: dirty detached from domain 1
    NOTE: cache dismounted group 1/0x31584F6B (DATADG) 
    
    ---后续mount报错
    SQL> ALTER DISKGROUP DATADG MOUNT  /* asm agent *//* {1:18003:2} */ 
    NOTE: cache registered group DATADG number=1 incarn=0xb368408f
    NOTE: cache began mount (first) of group DATADG number=1 incarn=0xb368408f
    NOTE: Assigning number (1,2) to disk (/dev/asm-diskdata02)
    WARNING:GMON has insufficient disks to maintain consensus.
    Minimum required is 2:updating 1 PST copies from a total of 2.
    ERROR: GMON failed to obtain a quorum ofsupporting disks in group 1
    NOTE: cache dismounting (clean) group 1/0xB368408F (DATADG) 
    NOTE: messaging CKPT to quiesce pins Unix process pid: 27651, image: oracle@DBN01 (TNS V1-V3)
    NOTE: dbwr not being msg'd to dismount
    NOTE: lgwr not being msg'd to dismount
    NOTE: cache dismounted group 1/0xB368408F (DATADG) 
    NOTE: cache ending mount (fail) of group DATADG number=1 incarn=0xb368408f
    NOTE: cache deleting context for group DATADG 1/0xb368408f
    GMON dismounting group 1 at 12 for pid 30, osid 27651
    NOTE: Disk DATA_0001 in mode 0x9 marked for de-assignment
    ERROR: diskgroup DATADG was not mounted
    ORA-15032: not all alterations performed
    ORA-15017: diskgroup "DATADG" cannot be mounted
    ORA-15315: Write errors in disk group DATADG could lead to inconsistent ASM metadata.
    ERROR: ALTER DISKGROUP DATADG MOUNT  /* asm agent *//* {1:18003:2} */
    
    

    从这里我们可以看出来,前几次加asm disk 由于各种原因都失败了,最后一次通过加force关键字,使得被自动drop的disk重新强制加到datadg里面.可悲的是在加入成功之后,开始做rebalance的时候,发现disk 0出现坏块,从而引起ORA-15196的错误,使得rebalance无法进行下去,进而整个asm 磁盘组datadg自动dismount.后面再次尝试mount datadg的时候,直接提示元数据库不一致,因为disk 0 的磁盘头已经异常.

    通过kfed分析disk 0信息
    这里是通过dd命令备份的磁盘头到win进行分析的,以前正常的disk 0的磁盘头损坏(全0)
    asm-kfed


    对于这个故障已经比较清楚,恢复思路也基本上确定:依次递进
    方案1:通过kfed修改文件头,然后尝试mount磁盘头手工修复ASM DISK HEADER 异常
    方案2:直接通过amdu,dul之类的工具拷贝出来数据文件找回ASM中数据文件
    方案3:通过底层au重组出来数据文件asm disk header 彻底损坏恢复
    在我们的实际恢复中运气比较好,通过方案1就完成了恢复工作,通过kfed修复磁盘头之后,然后报错如下

    SQL> alter diskgroup DATADG mount 
    NOTE: cache registered group DATADG number=1 incarn=0x5134d0d4
    NOTE: cache began mount (first) of group DATADG number=1 incarn=0x5134d0d4
    NOTE: Assigning number (1,2) to disk (/dev/asm-diskdata02)
    NOTE: Assigning number (1,0) to disk (/dev/asm-diskdata01)
    Tue Mar 07 19:03:40 2017
    NOTE: GMON heartbeating for grp 1
    GMON querying group 1 at 27 for pid 28, osid 13837
    NOTE: Assigning number (1,1) to disk ()
    GMON querying group 1 at 28 for pid 28, osid 13837
    NOTE: cache closing disk 1 of grp 1: (not open) 
    NOTE: cache opening disk 0 of grp 1: DATADG_0000 path:/dev/asm-diskdata01
    NOTE: F1X0 found on disk 0 au 2 fcn 0.178802
    NOTE: cache opening disk 2 of grp 1: DATA_0001 path:/dev/asm-diskdata02
    NOTE: cache mounting (first) normal redundancy group 1/0x5134D0D4 (DATADG)
    Tue Mar 07 19:03:40 2017
    * allocate domain 1, invalid = TRUE 
    kjbdomatt send to inst 2
    Tue Mar 07 19:03:40 2017
    NOTE: attached to recovery domain 1
    NOTE: starting recovery of thread=1 ckpt=19.2851 group=1 (DATADG)
    NOTE: starting recovery of thread=2 ckpt=13.5327 group=1 (DATADG)
    NOTE: advancing ckpt for group 1 (DATADG) thread=2 ckpt=13.5327
    NOTE: advancing ckpt for group 1 (DATADG) thread=1 ckpt=19.2852
    NOTE: cache recovered group 1 to fcn 0.365868
    NOTE: redo buffer size is 512 blocks (2101760 bytes)
    Tue Mar 07 19:03:40 2017
    NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (DATADG)
    NOTE: LGWR found thread 1 closed at ABA 19.2851
    NOTE: LGWR mounted thread 1 for diskgroup 1 (DATADG)
    NOTE: LGWR opening thread 1 at fcn 0.365868 ABA 20.2852
    NOTE: cache mounting group 1/0x5134D0D4 (DATADG) succeeded
    NOTE: cache ending mount (success) of group DATADG number=1 incarn=0x5134d0d4
    Tue Mar 07 19:03:40 2017
    NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1
    SUCCESS: diskgroup DATADG was mounted
    SUCCESS: alter diskgroup DATADG mount
    Tue Mar 07 19:03:40 2017
    NOTE: diskgroup resource ora.DATADG.dg is online
    Tue Mar 07 19:03:41 2017
    ASM Health Checker found 1 new failures
    NOTE: ASM did background COD recovery for group 1/0x5134d0d4 (DATADG)
    NOTE: starting rebalance of group 1/0x5134d0d4 (DATADG) at power 1
    Starting background process ARB0
    Tue Mar 07 19:03:42 2017
    ARB0 started with pid=30, OS id=13905 
    NOTE: assigning ARB0 to group 1/0x5134d0d4 (DATADG) with 1 parallel I/O
    cellip.ora not found.
    WARNING: cache read  a corrupt block: group=1(DATADG) dsk=0 blk=0 disk=0 
             (DATADG_0000) incarn=2202280062 au=0 blk=0 count=1
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_13905.trc:
    ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [2147483648] [0] [1022 != 0]
    NOTE: a corrupted block from group DATADG was dumped to /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_13905.trc
    WARNING:cache read (retry)a corrupt block:group=1(DATADG) dsk=0 blk=0 disk=0
            (DATADG_0000)incarn=2202280062 au=0 blk=0 count=1
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_13905.trc:
    ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [2147483648] [0] [1022 != 0]
    ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [2147483648] [0] [1022 != 0]
    ERROR: cache failed to read group=1(DATADG) dsk=0 blk=0 from disk(s): 0(DATADG_0000)
    ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [2147483648] [0] [1022 != 0]
    ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [2147483648] [0] [1022 != 0]
    Tue Mar 07 19:03:52 2017
    NOTE: client oradb1:oradb registered, osid 13989, mbr 0x1
    NOTE: cache initiating offline of disk 0 group DATADG
    NOTE:process _arb0_+asm1 (13905) initiating offline of disk 0.2202280062(DATADG_0000)with mask 0x7e in group 1
    NOTE: checking PST: grp = 1
    Tue Mar 07 19:03:52 2017
    GMON checking disk modes for group 1 at 30 for pid 30, osid 13905
    NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
    ERROR: too many offline disks in PST (grp 1)
    NOTE: checking PST for grp 1 done.
    NOTE: initiating PST update: grp = 1, dsk = 0/0x8344207e, mask = 0x6a, op = clear
    GMON updating disk modes for group 1 at 31 for pid 30, osid 13905
    NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
    ERROR: Disk 0 cannot be offlined, since all the disks [0, 1] with mirrored data would be offline.
    ERROR: too many offline disks in PST (grp 1)
    Tue Mar 07 19:03:52 2017
    NOTE: cache dismounting (not clean) group 1/0x5134D0D4 (DATADG) 
    WARNING: Offline for disk DATADG_0000 in mode 0x7f failed.
    Tue Mar 07 19:03:52 2017
    NOTE: halting all I/Os to diskgroup 1 (DATADG)
    NOTE: messaging CKPT to quiesce pins Unix process pid: 14002, image: oracle@DBN01 (B000)
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_13905.trc  (incident=76402):
    ORA-15335: ASM metadata corruption detected in disk group 'DATADG'
    ORA-15130: diskgroup "DATADG" is being dismounted
    ORA-15066: offlining disk "DATADG_0000" in group "DATADG" may result in a data loss
    ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [2147483648] [0] [1022 != 0]
    ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [2147483648] [0] [1022 != 0]
    Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_76402/+ASM1_arb0_13905_i76402.trc
    Tue Mar 07 19:03:52 2017
    NOTE: LGWR doing non-clean dismount of group 1 (DATADG)
    NOTE: LGWR sync ABA=20.2857 last written ABA 20.2857
    

    这里比较比较幸运,datadg已经mount成功了,但是由于rab依旧读取到disk header异常信息(没有完全修复成功,而且在日志中不光这个block异常,还有其他block异常,因此不考虑进一步修复),因此直接通过屏蔽asm的acd和cod实现该磁盘组mount,而且不会dismount。

    SQL> alter diskgroup DATADG mount 
    NOTE: cache registered group DATADG number=1 incarn=0x9c94d0eb
    NOTE: cache began mount (first) of group DATADG number=1 incarn=0x9c94d0eb
    NOTE: Assigning number (1,2) to disk (/dev/asm-diskdata02)
    NOTE: Assigning number (1,0) to disk (/dev/asm-diskdata01)
    NOTE: skip COD recovery as part of test at kfrc.c:1639 
    NOTE: skip COD recovery as part of test at kfrc.c:1639 
    Tue Mar 07 19:12:45 2017
    NOTE: GMON heartbeating for grp 1
    GMON querying group 1 at 75 for pid 28, osid 15615
    NOTE: Assigning number (1,1) to disk ()
    GMON querying group 1 at 76 for pid 28, osid 15615
    NOTE: cache closing disk 1 of grp 1: (not open) 
    NOTE: cache opening disk 0 of grp 1: DATADG_0000 path:/dev/asm-diskdata01
    NOTE: F1X0 found on disk 0 au 2 fcn 0.178802
    NOTE: cache opening disk 2 of grp 1: DATA_0001 path:/dev/asm-diskdata02
    NOTE: cache mounting (first) normal redundancy group 1/0x9C94D0EB (DATADG)
    Tue Mar 07 19:12:45 2017
    * allocate domain 1, invalid = TRUE 
    kjbdomatt send to inst 2
    Tue Mar 07 19:12:45 2017
    NOTE: attached to recovery domain 1
    NOTE: starting recovery of thread=1 ckpt=25.2870 group=1 (DATADG)
    NOTE: advancing ckpt for group 1 (DATADG) thread=1 ckpt=25.2873
    NOTE: cache recovered group 1 to fcn 0.365897
    NOTE: redo buffer size is 512 blocks (2101760 bytes)
    Tue Mar 07 19:12:45 2017
    NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (DATADG)
    NOTE: LGWR found thread 1 closed at ABA 25.2872
    NOTE: LGWR mounted thread 1 for diskgroup 1 (DATADG)
    NOTE: LGWR opening thread 1 at fcn 0.365897 ABA 26.2873
    NOTE: cache mounting group 1/0x9C94D0EB (DATADG) succeeded
    NOTE: cache ending mount (success) of group DATADG number=1 incarn=0x9c94d0eb
    Tue Mar 07 19:12:45 2017
    NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1
    SUCCESS: diskgroup DATADG was mounted
    SUCCESS: alter diskgroup DATADG mount
    Tue Mar 07 19:12:45 2017
    NOTE: diskgroup resource ora.DATADG.dg is online
    NOTE: skip COD recovery as part of test at kfrc.c:1639 
    NOTE: skip COD recovery as part of test at kfrc.c:1639 
    NOTE: skip COD recovery as part of test at kfrc.c:1639 
    NOTE: skip COD recovery as part of test at kfrc.c:1639 
    

    asm的问题解决后,然后登录数据库,发现运气比较好,两个数据库正常open成功,而且alert日志无任何报错,直接通过rman备份出来数据,重建asm磁盘组,还原数据,恢复完成,而且实现数据0丢失。

    • ORA-15042: ASM disk “N” is missing from group number “M” 故障恢复
    • ASM中磁盘组权限设置
    • 分区无法识别导致asm diskgroup无法mount
    • 存储精简卷导致asm磁盘组异常
    • hp平台rdisk中磁盘丢失导致asm启动报ORA-15042恢复
    • ORA-10562 故障恢复—allow 1 corruption
    • ASM简单管理(1)
    • oracle asm disk格式化恢复—格式化为ext4文件系统
    • oracle asm disk格式化恢复—格式化为ntfs文件系统
    • 因asm sga_target设置不当导致11gr2 rac无法正常启动
    • asmlib异常报ORA-00600[kfklLibFetchNext00]
    • 分享oracleasm createdisk重新创建asm disk后数据0丢失恢复案例


沪ICP备19023445号-2号
友情链接