IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    3TB 非归档Oracle数据库恢复小case(windows)

    admin发表于 2015-12-26 04:37:19
    love 0

    本站文章除注明转载外,均为本站原创: 转载自love wife & love life —Roger 的Oracle技术博客

    本文链接地址: 3TB 非归档Oracle数据库恢复小case(windows)

    这是友情支持一个朋友的数据库恢复case,昨天圣诞节的时候来个求助,只能速度帮忙解决了好过节去了。首先我们来看下的alert log都有哪些信息:

    Wed Dec 23 15:19:21 2015
    SMON: enabling tx recovery
    Wed Dec 23 15:19:21 2015
    Database Characterset is ZHS16GBK
    Opening with internal Resource Manager plan
    where NUMA PG = 2, CPUs = 8
    Wed Dec 23 15:19:22 2015
    Errors in file d:\oracle\product\10.2.0\admin\ajzq\udump\ajzq_ora_1712.trc:
    ORA-00600: internal error code, arguments: [4194], [45], [45], [], [], [], [], []
    
    Doing block recovery for file 2 block 1618943
    Block recovery from logseq 4, block 63 to scn 7662329350
    Wed Dec 23 15:19:23 2015
    Recovery of Online Redo Log: Thread 1 Group 1 Seq 4 Reading mem 0
      Mem# 0: D:\ORACLE\PRODUCT\10.2.0\ORADATA\xxxx\REDO01.LOG
    Block recovery completed at rba 4.4749.16, scn 1.3367362055
    Doing block recovery for file 2 block 121
    Block recovery from logseq 4, block 63 to scn 7662329346
    Wed Dec 23 15:19:23 2015
    Recovery of Online Redo Log: Thread 1 Group 1 Seq 4 Reading mem 0
      Mem# 0: D:\ORACLE\PRODUCT\10.2.0\ORADATA\xxxx\REDO01.LOG
    Block recovery completed at rba 4.3977.16, scn 1.3367362051
    Wed Dec 23 15:19:24 2015
    SMON: Restarting fast_start parallel rollback
    Wed Dec 23 15:19:24 2015
    Errors in file d:\oracle\product\10.2.0\admin\xxxx\udump\xxxx_ora_1712.trc:
    ORA-00600: internal error code, arguments: [4194], [45], [45], [], [], [], [], []
    
    Wed Dec 23 15:19:24 2015
    DEBUG: Replaying xcb 0xf0adfbe0, pmd 0x20588f48 for failed op 8
    Doing block recovery for file 2 block 1618943
    Block recovery from logseq 4, block 63 to scn 7662329350
    Wed Dec 23 15:19:24 2015
    Recovery of Online Redo Log: Thread 1 Group 1 Seq 4 Reading mem 0
      Mem# 0: D:\ORACLE\PRODUCT\10.2.0\ORADATA\xxxx\REDO01.LOG
    Block recovery completed at rba 4.4749.16, scn 1.3367362055
    Wed Dec 23 15:21:28 2015
    Errors in file d:\oracle\product\10.2.0\admin\xxxx\udump\xxxx_ora_1712.trc:
    ORA-00600: internal error code, arguments: [4194], [45], [45], [], [], [], [], []
    ORA-00600: internal error code, arguments: [4194], [45], [45], [],
    .....
    Wed Dec 23 16:47:26 2015
    Errors in file d:\oracle\product\10.2.0\admin\xxxx\bdump\xxxx_p005_8980.trc:
    ORA-00600: internal error code, arguments: [2037], [141460651], [141460651], [162], [6], [1], [3367403517], [3141222167]
    Wed Dec 23 16:47:26 2015
    Hex dump of (file 92, block 3926982) in trace file d:\oracle\product\10.2.0\admin\xxxx\bdump\xxxx_dbw0_8864.trc
    Corrupt block relative dba: 0x173bebc6 (file 92, block 3926982)
    Bad header found during preparing block for write
    Data in bad block:
     type: 23 format: 7 rdba: 0xbd3b3f17
     last change scn: 0x0001.c8b6be51 seq: 0x5 flg: 0x00
     spare1: 0x3b spare2: 0xbc spare3: 0xc
     consistency value in tail: 0xbe511705
     check value in block header: 0x0
     block checksum disabled
    Wed Dec 23 16:47:26 2015
    Errors in file d:\oracle\product\10.2.0\admin\xxxx\bdump\xxxx_p005_8980.trc:
    ORA-00600: internal error code, arguments: [2037], [141460651], [141460651], [162], [6], [1], [3367403517], [3141222167]
    
    Wed Dec 23 16:47:27 2015
    Errors in file d:\oracle\product\10.2.0\admin\xxxx\bdump\xxxx_dbw0_8864.trc:
    ORA-00600: internal error code, arguments: [kcbzpbuf_1], [4], [1], [], [], [], [], []
    
    Wed Dec 23 16:47:27 2015
    Errors in file d:\oracle\product\10.2.0\admin\xxxx\bdump\xxxx_dbw0_8864.trc:
    ORA-00600: internal error code, arguments: [kcbzpbuf_1], [4], [1], [], [], [], [], []
    
    DBW0: terminating instance due to error 471

    我们可以看到,这个数据库已经宕机好几天了,在23号的时候就已经无法正常open了。上述的结果错误也很常见很简单。朋友经过处理后顺利把数据库open了,可是打开之后创建新的undo 表空间之后尝试重启后就再也打开不开了。遇到了如下的错误:

    Fri Dec 25 11:44:17 2015
    Recovery of Online Redo Log: Thread 1 Group 3 Seq 6 Reading mem 0
      Mem# 0: D:\ORACLE\PRODUCT\10.2.0\ORADATA\xxxx\REDO03.LOG
    Fri Dec 25 11:44:17 2015
    Completed redo application
    Fri Dec 25 11:44:17 2015
    Hex dump of (file 44, block 518322) in trace file d:\oracle\product\10.2.0\admin\xxxx\bdump\xxxx_dbw2_21232.trc
    Corrupt block relative dba: 0x0b07e8b2 (file 44, block 518322)
    Bad header found during preparing block for write
    Data in bad block:
     type: 82 format: 2 rdba: 0x1c4ee895
     last change scn: 0x0001.c8bbad54 seq: 0x21 flg: 0xf0
     spare1: 0x52 spare2: 0x4d spare3: 0x895b
     consistency value in tail: 0xad545221
     check value in block header: 0x0
     block checksum disabled
    Fri Dec 25 11:44:18 2015
    Errors in file d:\oracle\product\10.2.0\admin\xxxx\bdump\xxxx_dbw2_21232.trc:
    ORA-00600: internal error code, arguments: [kcbzpbuf_1], [4], [1], [], [], [], [], []
    
    Fri Dec 25 11:44:18 2015
    Errors in file d:\oracle\product\10.2.0\admin\xxxx\bdump\xxxx_dbw2_21232.trc:
    ORA-00600: internal error code, arguments: [kcbzpbuf_1], [4], [1], [], [], [], [], []
    
    DBW2: terminating instance due to error 471
    Fri Dec 25 11:44:18 2015
    Errors in file d:\oracle\product\10.2.0\admin\xxxx\bdump\xxxx_p014_20704.trc:
    ORA-00471: DBWR process terminated with error

    这个错误我也是第一次遇见。从kcbzpbuf函数来看,是dbwr进出写脏块时发现了坏块。很明显,从上面的坏块信息来看,dba地址0x0b07e8b2 和 rdba地址0x1c4ee895是不匹配,这肯定没法正常写入的。
    针对这个错误我事后在metalink 搜索了一下,还是有不少相关的文档甚至是bug,这里不多说了。处理方式很简单,首先加入如下2个隐含参数尝试打开数据库:

    *._allow_resetlogs_corruption=TRUE

    *._allow_error_simulation=TRUE
    做了不完全恢复后,尝试open resetlogs时发现遇到了如下错误:

    ri Dec 25 12:20:03 2015
    SMON: enabling cache recovery
    Fri Dec 25 12:20:04 2015
    Errors in file d:\oracle\product\10.2.0\admin\xxxx\udump\xxxx_ora_20176.trc:
    ORA-00600: internal error code, arguments: [2662], [1], [3367742663], [1], [3367742798], [4194313], [], []
    
    Fri Dec 25 12:20:05 2015
    Errors in file d:\oracle\product\10.2.0\admin\xxxx\udump\xxxx_ora_20176.trc:
    ORA-00600: internal error code, arguments: [2662], [1], [3367742663], [1], [3367742798], [4194313], [], []
    
    Fri Dec 25 12:20:05 2015
    Error 600 happened during db open, shutting down database
    USER: terminating instance due to error 600
    Fri Dec 25 12:20:05 2015
    Errors in file d:\oracle\product\10.2.0\admin\xxxx\bdump\xxxx_p015_21472.trc:
    ORA-00600: internal error code, arguments: [15784], [600], [], [], [], [], [], []
    ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], []
    
    Fri Dec 25 12:20:05 2015
    Errors in file d:\oracle\product\10.2.0\admin\xxxx\bdump\xxxx_p014_9072.trc:
    ORA-00600: internal error code, arguments: [15784], [600], [], [], [], [], [], []
    ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], []

    看到这里2个错误,很多人可能会很迷惑,到底数据库无法open是因为哪个错误导致的呢? 这里稍微注意一下前后报错的顺序就知道了,很明显是前面的ora-00600 [2662]错误导致无法open resetlogs完成。
    对于ora-00600 [2662]错误这里不再解释了,SCN的问题。这里直接推进scn即可。
    将数据库启动到mount状态,执行如下命令:

    SQL> alter system set job_queue_processes=0;
    
    Session altered.
    
    SQL> startup mount
    ORACLE instance started.
    
    Total System Global Area 7516192768 bytes
    Fixed Size                  2175728 bytes
    Variable Size            1232850192 bytes
    Database Buffers         6274678784 bytes
    Redo Buffers                6488064 bytes
    Database mounted.
    SQL> alter session set events 'IMMEDIATE trace name adjust_scn level 10';
    
    Session altered.
    
    SQL> alter database open;
    
    Database altered.
    

    很顺利就打开了数据库。由于之前朋友已经创建了新的undo表空间,我只需要帮忙把undo重建就完事了,结束支持任务。尝试进行drop tablespace undotbs1 including contents and datafiles时,遇到ora-01548错误。
    在alter session set “_smu_debug_mode” = 4; 后都无法进行drop,那只能继续使用下隐含参数了:

    _offline_rollback_segments
    _corrupted_rollback_segments

    通过这2个隐含参数,可以顺利将旧的undo 表空间drop掉,然后修改pfile文件去掉不需要的一些隐含参数,顺利打开数据库。
    到这里结束我的半小时支持友情支持任务!悲剧的下午又开始优化某客户的SQL。。。。。

    备注:

    1) 对于这种异常强制打开的数据库,建议进行全库检查,包括是否有坏块等,甚至检查数据字典是否一致等等(实际上检查之前的alert log发现,    之前数据库在恢复过程中就报了不少的坏块)

    2) 在这个年代,还有3TB的数据库,而且是windows,更悲剧的还是非归档,没有备份,这是不应该的。

    3) 我们应该多考虑下系统的容灾建设,哪怕是有备份。

    Related posts:

    1. 记一次非归档RAC恢复(大量的ORA-600 错误)
    2. ora-00600 kccpb_sanity_check_2和kclchkblk_4的恢复case
    3. Oracle 11gR2 for Windows遭遇ora-600[4194]的恢复
    4. how to modify scn with oradebug(windows)?
    5. windows Oracle数据文件大小为0的恢复case


沪ICP备19023445号-2号
友情链接