IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    ORA-15042: ASM disk “N” is missing from group number “M” 故障恢复

    惜分飞发表于 2015-10-08 02:28:23
    love 0

    联系:手机(13429648788) QQ(107644445)QQ咨询惜分飞

    标题:ORA-15042: ASM disk “N” is missing from group number “M” 故障恢复

    作者:惜分飞©版权所有[未经本人同意,请不得以任何形式转载,否则有进一步追究法律责任的权利.]

    接到一个朋友恢复请求,19个lun的asm 磁盘组,由于其中一个lun有问题,他们进行了增加一个新lun,删除老lun的方法操作,但是操作一半hang住了(因为坏的lun是底层损坏,无法完成rebalance),然后存储工程师继续修复异常lun,非常幸运异常lun修复好了,但是高兴过了头,直接从存储上删除了新加入的lun(已经rebalance一部分数据进去了),这个时候asm dg彻底趴下了,不能mount成功,请求恢复支持。由于某种原因,无法从lun层面恢复,只能让我们提供数据库层面恢复

    Mon Sep 21 19:52:35 2015
    SQL> alter diskgroup  dg_XFF add disk '/dev/rhdisk116' size 716800M drop disk dg_XFF_0012 
    NOTE: Assigning number (1,20) to disk (/dev/rhdisk116)
    NOTE: requesting all-instance membership refresh for group=1
    NOTE: initializing header on grp 1 disk DG_XFF_0020
    NOTE: requesting all-instance disk validation for group=1
    Mon Sep 21 19:52:44 2015
    NOTE: skipping rediscovery for group 1/0xb94738f1 (DG_XFF) on local instance.
    NOTE: requesting all-instance disk validation for group=1
    NOTE: skipping rediscovery for group 1/0xb94738f1 (DG_XFF) on local instance.
    NOTE: initiating PST update: grp = 1
    Mon Sep 21 19:52:44 2015
    GMON updating group 1 at 25 for pid 27, osid 12124486
    NOTE: PST update grp = 1 completed successfully 
    NOTE: membership refresh pending for group 1/0xb94738f1 (DG_XFF)
    GMON querying group 1 at 26 for pid 18, osid 10092734
    NOTE: cache opening disk 20 of grp 1: DG_XFF_0020 path:/dev/rhdisk116
    GMON querying group 1 at 27 for pid 18, osid 10092734
    SUCCESS: refreshed membership for 1/0xb94738f1 (DG_XFF)
    Mon Sep 21 19:52:47 2015
    SUCCESS: alter diskgroup  dg_XFF add disk '/dev/rhdisk116' size 716800M drop disk dg_XFF_0012
    NOTE: starting rebalance of group 1/0xb94738f1 (DG_XFF) at power 1
    Starting background process ARB0
    Mon Sep 21 19:52:47 2015
    ARB0 started with pid=28, OS id=10944804 
    NOTE: assigning ARB0 to group 1/0xb94738f1 (DG_XFF) with 1 parallel I/O
    NOTE: Attempting voting file refresh on diskgroup DG_XFF
    Mon Sep 21 20:35:06 2015
    
    SQL> ALTER DISKGROUP DG_XFF MOUNT  /* asm agent *//* {1:51107:7083} */ 
    NOTE: cache registered group DG_XFF number=1 incarn=0xdd6f975a
    NOTE: cache began mount (first) of group DG_XFF number=1 incarn=0xdd6f975a
    NOTE: Assigning number (1,0) to disk (/dev/rhdisk10)
    NOTE: Assigning number (1,1) to disk (/dev/rhdisk11)
    NOTE: Assigning number (1,2) to disk (/dev/rhdisk16)
    NOTE: Assigning number (1,3) to disk (/dev/rhdisk17)
    NOTE: Assigning number (1,4) to disk (/dev/rhdisk22)
    NOTE: Assigning number (1,5) to disk (/dev/rhdisk23)
    NOTE: Assigning number (1,6) to disk (/dev/rhdisk28)
    NOTE: Assigning number (1,7) to disk (/dev/rhdisk29)
    NOTE: Assigning number (1,8) to disk (/dev/rhdisk33)
    NOTE: Assigning number (1,9) to disk (/dev/rhdisk34)
    NOTE: Assigning number (1,10) to disk (/dev/rhdisk4)
    NOTE: Assigning number (1,11) to disk (/dev/rhdisk40)
    NOTE: Assigning number (1,12) to disk (/dev/rhdisk41)
    NOTE: Assigning number (1,13) to disk (/dev/rhdisk45)
    NOTE: Assigning number (1,14) to disk (/dev/rhdisk46)
    NOTE: Assigning number (1,15) to disk (/dev/rhdisk5)
    NOTE: Assigning number (1,16) to disk (/dev/rhdisk52)
    NOTE: Assigning number (1,17) to disk (/dev/rhdisk53)
    NOTE: Assigning number (1,18) to disk (/dev/rhdisk57)
    NOTE: Assigning number (1,19) to disk (/dev/rhdisk58)
    Wed Sep 30 11:08:07 2015
    NOTE: start heartbeating (grp 1)
    GMON querying group 1 at 33 for pid 35, osid 4194488
    NOTE: Assigning number (1,20) to disk ()
    GMON querying group 1 at 34 for pid 35, osid 4194488
    NOTE: cache dismounting (clean) group 1/0xDD6F975A (DG_XFF) 
    NOTE: dbwr not being msg'd to dismount
    NOTE: lgwr not being msg'd to dismount
    NOTE: cache dismounted group 1/0xDD6F975A (DG_XFF) 
    NOTE: cache ending mount (fail) of group DG_XFF number=1 incarn=0xdd6f975a
    NOTE: cache deleting context for group DG_XFF 1/0xdd6f975a
    GMON dismounting group 1 at 35 for pid 35, osid 4194488
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    ERROR: diskgroup DG_XFF was not mounted
    ORA-15032: not all alterations performed
    ORA-15040: diskgroup is incomplete
    ORA-15042: ASM disk "20" is missing from group number "1" 
    ERROR: ALTER DISKGROUP DG_XFF MOUNT  /* asm agent *//* {1:51107:7083} */
    

    这里比较明显,由于存储工程师直接删除了lun,这里导致磁盘组DG_XFF丢失asm disk 20,使得磁盘组无法直接mount,由于该磁盘组已经进行了较长时间的rebalance,丢失的盘中已经有大量数据(包括元数据),因此就算修改pst让磁盘组mount起来(不一定成功),也会丢失大量数据,也不一定可以直接拿出来里面的数据,如果只是加入盘,但是由于某种原因没有做rebalance,那我们直接可以通过修改pst,使得磁盘组mount起来。因此对于这样的情况,我们能够做的,只能从底层扫描磁盘,生成数据文件(因为有部分文件的元数据在丢失lun之上,如果直接使用现存元数据信息,直接拷贝,或者unload数据都会丢失大量数据),然后再进一步unload数据,完成恢复。需要恢复磁盘信息

    grp# dsk# bsize ausize disksize diskname        groupname       path
    ---- ---- ----- ------ -------- --------------- --------------- -------------
       1    0  4096  4096K   179200 DG_XFF_0000     DG_XFF          /dev/rhdisk10
       1    1  4096  4096K   179200 DG_XFF_0001     DG_XFF          /dev/rhdisk11
       1    2  4096  4096K   179200 DG_XFF_0002     DG_XFF          /dev/rhdisk16
       1    3  4096  4096K   179200 DG_XFF_0003     DG_XFF          /dev/rhdisk17
       1    4  4096  4096K   179200 DG_XFF_0004     DG_XFF          /dev/rhdisk22
       1    5  4096  4096K   179200 DG_XFF_0005     DG_XFF          /dev/rhdisk23
       1    6  4096  4096K   179200 DG_XFF_0006     DG_XFF          /dev/rhdisk28
       1    7  4096  4096K   179200 DG_XFF_0007     DG_XFF          /dev/rhdisk29
       1    8  4096  4096K   179200 DG_XFF_0008     DG_XFF          /dev/rhdisk33
       1    9  4096  4096K   179200 DG_XFF_0009     DG_XFF          /dev/rhdisk34
       1   10  4096  4096K   179200 DG_XFF_0010     DG_XFF          /dev/rhdisk4
       1   11  4096  4096K   179200 DG_XFF_0011     DG_XFF          /dev/rhdisk40
       1   12  4096  4096K   179200 DG_XFF_0012     DG_XFF          /dev/rhdisk41
       1   13  4096  4096K   179200 DG_XFF_0013     DG_XFF          /dev/rhdisk45
       1   14  4096  4096K   179200 DG_XFF_0014     DG_XFF          /dev/rhdisk46
       1   15  4096  4096K   179200 DG_XFF_0015     DG_XFF          /dev/rhdisk5
       1   16  4096  4096K   179200 DG_XFF_0016     DG_XFF          /dev/rhdisk52
       1   17  4096  4096K   179200 DG_XFF_0017     DG_XFF          /dev/rhdisk53
       1   18  4096  4096K   179200 DG_XFF_0018     DG_XFF          /dev/rhdisk57
       1   19  4096  4096K   179200 DG_XFF_0019     DG_XFF          /dev/rhdisk58
    

    这次运气比较好,丢失的磁盘组只是一个业务磁盘组,而且里面只有19个表空间,10个分区表,因此在数据字典完成的情况下,恢复10个分区表(一共6443个分区)的数据,整体恢复效果如下:
    RECOVER


    从整体数据量看恢复比例为:6003.26953/6027.26935*100%=99.6018127%,对于丢失了一个已经rebalance的大部分的lun,依旧能够恢复如此的数据,整体看非常理想.
    • ASM中磁盘组权限设置
    • ASM简单管理(1)
    • asmlib异常报ORA-00600[kfklLibFetchNext00]
    • asm disk误设置pvid导致asm diskgroup无法mount恢复
    • asm备份元数据之md_backup和md_restore
    • select max(id),min(id) from table优化
    • 找回ASM中数据文件
    • 监控asm disk磁盘性能
    • oracle之redo file
    • RBA和实例恢复关系
    • 使用asm disk header 自动备份信息恢复异常asm disk header
    • Data Guard Failover 处理


沪ICP备19023445号-2号
友情链接