IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    ORA-15032 ORA-15040 ORA-15042 asm故障恢复

    惜分飞发表于 2015-10-08 15:44:55
    love 0

    联系:手机(13429648788) QQ(107644445)

    链接:http://www.orasos.com/ora-15032-ora-15040-ora-15042.html

    标题:ORA-15032 ORA-15040 ORA-15042 asm故障恢复

    作者:惜分飞©版权所有[文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.]

    接到一个朋友恢复请求,19个lun的asm 磁盘组,由于其中一个lun有问题,他们进行了增加一个新lun,删除老lun的方法操作,但是操作一半hang住了(因为坏的lun是底层损坏,无法完成rebalance),然后存储工程师继续修复异常lun,非常幸运异常lun修复好了,但是高兴过了头,直接从存储上删除了新加入的lun(已经rebalance一部分数据进去了),这个时候asm dg彻底趴下了,不能mount成功,请求恢复支持。由于某种原因,无法从lun层面恢复,只能让我们提供数据库层面恢复

    Mon Sep 21 19:52:35 2015
    SQL> alter diskgroup  dg_XFF add disk '/dev/rhdisk116' size 716800M drop disk dg_XFF_0012
    NOTE: Assigning number (1,20) to disk (/dev/rhdisk116)
    NOTE: requesting all-instance membership refresh for group=1
    NOTE: initializing header on grp 1 disk DG_XFF_0020
    NOTE: requesting all-instance disk validation for group=1
    Mon Sep 21 19:52:44 2015
    NOTE: skipping rediscovery for group 1/0xb94738f1 (DG_XFF) on local instance.
    NOTE: requesting all-instance disk validation for group=1
    NOTE: skipping rediscovery for group 1/0xb94738f1 (DG_XFF) on local instance.
    NOTE: initiating PST update: grp = 1
    Mon Sep 21 19:52:44 2015
    GMON updating group 1 at 25 for pid 27, osid 12124486
    NOTE: PST update grp = 1 completed successfully
    NOTE: membership refresh pending for group 1/0xb94738f1 (DG_XFF)
    GMON querying group 1 at 26 for pid 18, osid 10092734
    NOTE: cache opening disk 20 of grp 1: DG_XFF_0020 path:/dev/rhdisk116
    GMON querying group 1 at 27 for pid 18, osid 10092734
    SUCCESS: refreshed membership for 1/0xb94738f1 (DG_XFF)
    Mon Sep 21 19:52:47 2015
    SUCCESS: alter diskgroup  dg_XFF add disk '/dev/rhdisk116' size 716800M drop disk dg_XFF_0012
    NOTE: starting rebalance of group 1/0xb94738f1 (DG_XFF) at power 1
    Starting background process ARB0
    Mon Sep 21 19:52:47 2015
    ARB0 started with pid=28, OS id=10944804
    NOTE: assigning ARB0 to group 1/0xb94738f1 (DG_XFF) with 1 parallel I/O
    NOTE: Attempting voting file refresh on diskgroup DG_XFF
    Mon Sep 21 20:35:06 2015
    SQL> ALTER DISKGROUP DG_XFF MOUNT  /* asm agent *//* {1:51107:7083} */
    NOTE: cache registered group DG_XFF number=1 incarn=0xdd6f975a
    NOTE: cache began mount (first) of group DG_XFF number=1 incarn=0xdd6f975a
    NOTE: Assigning number (1,0) to disk (/dev/rhdisk10)
    NOTE: Assigning number (1,1) to disk (/dev/rhdisk11)
    NOTE: Assigning number (1,2) to disk (/dev/rhdisk16)
    NOTE: Assigning number (1,3) to disk (/dev/rhdisk17)
    NOTE: Assigning number (1,4) to disk (/dev/rhdisk22)
    NOTE: Assigning number (1,5) to disk (/dev/rhdisk23)
    NOTE: Assigning number (1,6) to disk (/dev/rhdisk28)
    NOTE: Assigning number (1,7) to disk (/dev/rhdisk29)
    NOTE: Assigning number (1,8) to disk (/dev/rhdisk33)
    NOTE: Assigning number (1,9) to disk (/dev/rhdisk34)
    NOTE: Assigning number (1,10) to disk (/dev/rhdisk4)
    NOTE: Assigning number (1,11) to disk (/dev/rhdisk40)
    NOTE: Assigning number (1,12) to disk (/dev/rhdisk41)
    NOTE: Assigning number (1,13) to disk (/dev/rhdisk45)
    NOTE: Assigning number (1,14) to disk (/dev/rhdisk46)
    NOTE: Assigning number (1,15) to disk (/dev/rhdisk5)
    NOTE: Assigning number (1,16) to disk (/dev/rhdisk52)
    NOTE: Assigning number (1,17) to disk (/dev/rhdisk53)
    NOTE: Assigning number (1,18) to disk (/dev/rhdisk57)
    NOTE: Assigning number (1,19) to disk (/dev/rhdisk58)
    Wed Sep 30 11:08:07 2015
    NOTE: start heartbeating (grp 1)
    GMON querying group 1 at 33 for pid 35, osid 4194488
    NOTE: Assigning number (1,20) to disk ()
    GMON querying group 1 at 34 for pid 35, osid 4194488
    NOTE: cache dismounting (clean) group 1/0xDD6F975A (DG_XFF)
    NOTE: dbwr not being msg'd to dismount
    NOTE: lgwr not being msg'd to dismount
    NOTE: cache dismounted group 1/0xDD6F975A (DG_XFF)
    NOTE: cache ending mount (fail) of group DG_XFF number=1 incarn=0xdd6f975a
    NOTE: cache deleting context for group DG_XFF 1/0xdd6f975a
    GMON dismounting group 1 at 35 for pid 35, osid 4194488
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    NOTE: Disk  in mode 0x8 marked for de-assignment
    ERROR: diskgroup DG_XFF was not mounted
    ORA-15032: not all alterations performed
    ORA-15040: diskgroup is incomplete
    ORA-15042: ASM disk "20" is missing from group number "1"
    ERROR: ALTER DISKGROUP DG_XFF MOUNT  /* asm agent *//* {1:51107:7083} */

    这里比较明显,由于存储工程师直接删除了lun,这里导致磁盘组DG_XFF丢失asm disk 20,使得磁盘组无法直接mount,由于该磁盘组已经进行了较长时间的rebalance,丢失的盘中已经有大量数据(包括元数据),因此就算修改pst让磁盘组mount起来(不一定成功),也会丢失大量数据,也不一定可以直接拿出来里面的数据,如果只是加入盘,但是由于某种原因没有做rebalance,那我们直接可以通过修改pst,使得磁盘组mount起来。因此对于这样的情况,我们能够做的,只能从底层扫描磁盘,生成数据文件(因为有部分文件的元数据在丢失lun之上,如果直接使用现存元数据信息,直接拷贝,或者unload数据都会丢失大量数据),然后再进一步unload数据,完成恢复。需要恢复磁盘信息

    grp# dsk# bsize ausize disksize diskname        groupname       path
    ---- ---- ----- ------ -------- --------------- --------------- -------------
       1    0  4096  4096K   179200 DG_XFF_0000     DG_XFF          /dev/rhdisk10
       1    1  4096  4096K   179200 DG_XFF_0001     DG_XFF          /dev/rhdisk11
       1    2  4096  4096K   179200 DG_XFF_0002     DG_XFF          /dev/rhdisk16
       1    3  4096  4096K   179200 DG_XFF_0003     DG_XFF          /dev/rhdisk17
       1    4  4096  4096K   179200 DG_XFF_0004     DG_XFF          /dev/rhdisk22
       1    5  4096  4096K   179200 DG_XFF_0005     DG_XFF          /dev/rhdisk23
       1    6  4096  4096K   179200 DG_XFF_0006     DG_XFF          /dev/rhdisk28
       1    7  4096  4096K   179200 DG_XFF_0007     DG_XFF          /dev/rhdisk29
       1    8  4096  4096K   179200 DG_XFF_0008     DG_XFF          /dev/rhdisk33
       1    9  4096  4096K   179200 DG_XFF_0009     DG_XFF          /dev/rhdisk34
       1   10  4096  4096K   179200 DG_XFF_0010     DG_XFF          /dev/rhdisk4
       1   11  4096  4096K   179200 DG_XFF_0011     DG_XFF          /dev/rhdisk40
       1   12  4096  4096K   179200 DG_XFF_0012     DG_XFF          /dev/rhdisk41
       1   13  4096  4096K   179200 DG_XFF_0013     DG_XFF          /dev/rhdisk45
       1   14  4096  4096K   179200 DG_XFF_0014     DG_XFF          /dev/rhdisk46
       1   15  4096  4096K   179200 DG_XFF_0015     DG_XFF          /dev/rhdisk5
       1   16  4096  4096K   179200 DG_XFF_0016     DG_XFF          /dev/rhdisk52
       1   17  4096  4096K   179200 DG_XFF_0017     DG_XFF          /dev/rhdisk53
       1   18  4096  4096K   179200 DG_XFF_0018     DG_XFF          /dev/rhdisk57
       1   19  4096  4096K   179200 DG_XFF_0019     DG_XFF          /dev/rhdisk58

    这次运气比较好,丢失的磁盘组只是一个业务磁盘组,而且里面只有19个表空间,10个分区表,因此在数据字典完成的情况下,恢复10个分区表(一共6443个分区)的数据,整体恢复效果如下:
    RECOVER


    从整体数据量看恢复比例为:6003.26953/6027.26935*100%=99.6018127%,对于丢失了一个已经rebalance的大部分的lun,依旧能够恢复如此的数据,整体看非常理想.不得不说,老熊威武

    • 两表连接,取出其中某些项不重复的数据
    • OER 7451 in Load Indicator : Error Code = OSD-04500:指定了非法选项
    • WARNING: inbound connection timed out (ORA-3136)
    • Fatal NI connect error 12170
    • spfile被覆盖导致ORA-600[kmgs_parameter_update_timeout_1]
    • ORA-600[4194]/[4193]解决
    • 记录另一起ORA-00600[13013]处理
    • 创建控制文件遭遇ORA-600 kccscf_1
    • ORA-607/ORA-600[4194]不一定是重大灾难
    • ORA-600 kghstack_free2异常恢复
    • rac redo log file被意外覆盖数据库恢复
    • ORA-00600[kcfrbd_3]故障解决
    • ORA-600 [12235]
    • ORA-00600[kccpb_sanity_check_2]
    • linux上安装oracle10g注意事项


沪ICP备19023445号-2号
友情链接