IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    存储异常导致数据库不能启动恢复

    惜分飞发表于 2015-12-20 17:52:09
    love 0

    联系:手机(13429648788) QQ(107644445)

    链接:http://www.orasos.com/%e5%ad%98%e5%82%a8%e5%bc%82%e5%b8%b8%e5%af%bc%e8%87%b4%e6%95%b0%e6%8d%ae%e5%ba%93%e4%b8%8d%e8%83%bd%e5%90%af%e5%8a%a8%e6%81%a2%e5%a4%8d.html

    标题:存储异常导致数据库不能启动恢复

    作者:惜分飞©版权所有[文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.]

    xx医院存储突然掉线,导致数据库异常,现场工程师折腾了一天,问题依旧没有解决,无奈之下找到我们,希望我们能够帮忙恢复数据库.
    启动报ORA-00600[2131]错误

    Fri Nov 06 14:50:59 2015
    ALTER DATABASE   MOUNT
    This instance was first to mount
    Fri Nov 06 14:50:59 2015
    ALTER SYSTEM SET local_listener=' (ADDRESS=(PROTOCOL=TCP)(HOST=192.168.4.4)(PORT=1521))' SCOPE=MEMORY SID='xifenfei1';
    NOTE: Loaded library: System 
    SUCCESS: diskgroup DATA was mounted
    NOTE: dependency between database xifenfei and diskgroup resource ora.DATA.dg is established
    Errors in file /home/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_13221.trc  (incident=191085):
    ORA-00600: internal error code, arguments: [2131], [33], [32], [], [], [], [], [], [], [], [], []
    Incident details in: /home/app/oracle/diag/rdbms/xifenfei/xifenfei1/incident/incdir_191085/xifenfei1_ora_13221_i191085.trc
    Fri Nov 06 14:51:10 2015
    Use ADRCI or Support Workbench to package the incident.
    See Note 411.1 at My Oracle Support for error and packaging details.
    ORA-600 signalled during: ALTER DATABASE   MOUNT...
    

    出现该错误的原因是由于:We are attempting to write a controlfile checkpoint progress record, but find we do not have the progress record generating this exception.由于控制文件异常导致,出现此类情况,我们一般使用单个控制文件一次尝试,如果都不可以考虑重建控制文件

    由于坏块(逻辑/物理)导致数据库实例恢复无法进行

    Beginning crash recovery of 2 threads
    Started redo scan
    kcrfr_rnenq: use log nab 393216
    kcrfr_rnenq: use log nab 2
    Completed redo scan
     read 4427 KB redo, 500 data blocks need recovery
    Started redo application at
     Thread 1: logseq 5731, block 391398
     Thread 2: logseq 4252, block 520815
    Recovery of Online Redo Log: Thread 1 Group 2 Seq 5731 Reading mem 0
      Mem# 0: +DATA/xifenfei/onlinelog/group_2.266.835331047
    Recovery of Online Redo Log: Thread 2 Group 8 Seq 4252 Reading mem 0
      Mem# 0: +DATA/xifenfei/onlinelog/group_8.331.835330421
    Errors in file /home/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_13770.trc  (incident=197486):
    ORA-00600: internal error code, arguments: [kdxlin:psno out of range], [], [], [], [], [], [], [], [], [], [], []
    Incident details in:/home/app/oracle/diag/rdbms/xifenfei/xifenfei1/incident/incdir_197486/xifenfei1_ora_13770_i197486.trc
    Fri Nov 06 15:03:09 2015
    Use ADRCI or Support Workbench to package the incident.
    See Note 411.1 at My Oracle Support for error and packaging details.
    Errors in file /home/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_13770.trc  (incident=197487):
    ORA-01578: ORACLE data block corrupted (file # 2, block # 65207)
    ORA-01110: data file 2: '+DATA/xifenfei/datafile/sysaux.257.835324753'
    ORA-10564: tablespace SYSAUX
    ORA-01110: data file 2: '+DATA/xifenfei/datafile/sysaux.257.835324753'
    ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 81045
    ORA-00600: internal error code, arguments: [kdxlin:psno out of range], [], [], [], [], [], [], [], [], [], [], []
    Incident details in:/home/app/oracle/diag/rdbms/xifenfei/xifenfei1/incident/incdir_197487/xifenfei1_ora_13770_i197487.trc
    Errors in file /home/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_13770.trc:
    ORA-01578: ORACLE data block corrupted (file # 2, block # 65207)
    ORA-01110: data file 2: '+DATA/xifenfei/datafile/sysaux.257.835324753'
    ORA-10564: tablespace SYSAUX
    ORA-01110: data file 2: '+DATA/xifenfei/datafile/sysaux.257.835324753'
    ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 81045
    ORA-00600: internal error code, arguments: [kdxlin:psno out of range], [], [], [], [], [], [], [], [], [], [], []
    Recovery of Online Redo Log: Thread 2 Group 3 Seq 4253 Reading mem 0
      Mem# 0: +DATA/xifenfei/onlinelog/group_3.332.835330505
    Hex dump of (file 14, block 62536) in trace file /home/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_13770.trc
    Reading datafile '+DATA/xifenfei/datafile/ht01.dbf' for corruption at rdba: 0x0380f448 (file 14, block 62536)
    Reread (file 14, block 62536) found same corrupt data (logically corrupt)
    RECOVERY OF THREAD 1 STUCK AT BLOCK 62536 OF FILE 14
    Fri Nov 06 15:03:13 2015
    Abort recovery for domain 0
    Aborting crash recovery due to error 1172
    Errors in file /home/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_13770.trc:
    ORA-01172: recovery of thread 1 stuck at block 62536 of file 14
    ORA-01151: use media recovery to recover block, restore backup if needed
    Abort recovery for domain 0
    Errors in file /home/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_13770.trc:
    ORA-01172: recovery of thread 1 stuck at block 62536 of file 14
    ORA-01151: use media recovery to recover block, restore backup if needed
    ORA-1172 signalled during: ALTER DATABASE OPEN...
    

    查看资料发现和Bug 14301592 – Several errors by corrupt blocks shifted by 2 bytes in buffer cache during recovery caused by INDEX redo apply,可以通过ALLOW 1 CORRUPTION临时解决

    使用ALLOW 1 CORRUPTION进行恢复,出现ORA-07445[kdxlin]错误

    Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
    +DATA/xifenfei/onlinelog/group_3.332.835330505     
    ORA-00279: change 700860458 generated at 11/05/2015 21:20:15 needed for thread
    1
    ORA-00289: suggestion : +ARCHIVE/xifenfei/xifenfei_1_5731_835324843.arc
    ORA-00280: change 700860458 for thread 1 is in sequence #5731
    
    
    Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
    +DATA/xifenfei/onlinelog/group_2.266.835331047
    ORA-00283: recovery session canceled due to errors
    ORA-10562: Error occurred while applying redo to data block (file# 2, block#
    70104)
    ORA-10564: tablespace SYSAUX
    ORA-01110: data file 2: '+DATA/xifenfei/datafile/sysaux.257.835324753'
    ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 82289
    ORA-00607: Internal error occurred while making a change to a data block
    ORA-00602: internal programming exception
    ORA-07445: exception encountered: core dump [kdxlin()+4088] [SIGSEGV]
    [ADDR:0xC] [PC:0x95FB572] [Address not mapped to object] []
    
    
    ORA-01112: media recovery not started
    

    ORA-07445[kdxlin()+4088]未找到类似说明,到了这一步,无法简单的恢复成功,只能通过设置隐含参数跳过实例恢复,尝试resetlog库

    通过设置_allow_resetlogs_corruption参数继续恢复

    SQL> startup pfile='/tmp/pfile.ora' mount;
    ORACLE instance started.
    
    Total System Global Area 7315603456 bytes
    Fixed Size                  2267384 bytes
    Variable Size            2566915848 bytes
    Database Buffers         4731174912 bytes
    Redo Buffers               15245312 bytes
    Database mounted.
    SQL> alter database open resetlogs;
    alter database open resetlogs
    *
    ERROR at line 1:
    ORA-01092: ORACLE instance terminated. Disconnection forced
    ORA-00600: internal error code, arguments: [kclchkblk_4], [0], [700869927],
    [0], [700860464], [], [], [], [], [], [], []
    Process ID: 13563
    Session ID: 157 Serial number: 3
    

    alert日志报错

    Fri Nov 06 19:26:39 2015
    SMON: enabling cache recovery
    Instance recovery: looking for dead threads
    Instance recovery: lock domain invalid but no dead threads
    Errors in file /home/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_13563.trc  (incident=319140):
    ORA-00600: internal error code, arguments: [kclchkblk_4], [0], [700869927], [0], [700860464], [], [], [], [], [], [], []
    Incident details in:/home/app/oracle/diag/rdbms/xifenfei/xifenfei1/incident/incdir_319140/xifenfei1_ora_13563_i319140.trc
    Use ADRCI or Support Workbench to package the incident.
    See Note 411.1 at My Oracle Support for error and packaging details.
    Redo thread 2 internally disabled at seq 1 (CKPT)
    ARC1: Archiving disabled thread 2 sequence 1
    Archived Log entry 9956 added for thread 2 sequence 1 ID 0x0 dest 1:
    ARC3: Archival started
    ARC0: STARTING ARCH PROCESSES COMPLETE
    Errors in file /home/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_13563.trc:
    ORA-00600: internal error code, arguments: [kclchkblk_4], [0], [700869927], [0], [700860464], [], [], [], [], [], [], []
    Errors in file /home/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_13563.trc:
    ORA-00600: internal error code, arguments: [kclchkblk_4], [0], [700869927], [0], [700860464], [], [], [], [], [], [], []
    Error 600 happened during db open, shutting down database
    USER (ospid: 13563): terminating the instance due to error 600
    Fri Nov 06 19:26:42 2015
    Instance terminated by USER, pid = 13563
    ORA-1092 signalled during: alter database open resetlogs...
    opiodr aborting process unknown ospid (13563) as a result of ORA-1092
    Fri Nov 06 19:26:42 2015
    ORA-1092 : opitsk aborting process
    

    这里是比较熟悉的ora-600[kclchkblk_4]错误,和ora-600[2662]错误类似,需要调整scn,由于数据库版本为11.2.0.4,无法使用常规方法调整scn,在修改控制文件,oradebug,bbed方法可供选择

    使用oradebug方法处理
    因为是asm环境,其他方法处理起来都相对麻烦

    [oracle@wisetop1 ~]$ sqlplus / as sysdba
    
    SQL*Plus: Release 11.2.0.4.0 Production on Fri Nov 6 19:30:59 2015
    
    Copyright (c) 1982, 2013, Oracle.  All rights reserved.
    
    Connected to an idle instance.
    
    SQL> startup pfile='/tmp/pfile.ora' mount;
    ORACLE instance started.
    
    Total System Global Area 7315603456 bytes
    Fixed Size                  2267384 bytes
    Variable Size            2566915848 bytes
    Database Buffers         4731174912 bytes
    Redo Buffers               15245312 bytes
    Database mounted.
    SQL> oradebug setmypid
    Statement processed.
    SQL> oradebug poke 0x06001AE70 4 0x2FAF0800
    BEFORE: [06001AE70, 06001AE74) = 00000000
    AFTER:  [06001AE70, 06001AE74) = 2FAF0800
    SQL> alter database open;
    
    Database altered.
    
    

    至此数据库open成功,后续就是处理一些坏块的工作,并建议客户逻辑重建库.

    • 存储异常导致ORA-10562故障恢复
    • ORA-00600[17182],ORA-00600[25027],ORA-00600[kghfrempty:ds]故障处理
    • Oracle bug ORA-600 k2vcbk_2故障恢复
    • 分享一次ORA-01113 ORA-01110故障处理过程
    • 记录一次 ORA-600 2663 故障恢复
    • 一起ORA-600 3020故障恢复的大体思路
    • 记录一次ORA-00600[kdxlin:psno out of range]/ORA-00600[3020]/ORA-00600[4000]/ORA-00600[4193]的数据库恢复
    • 异常断电导致current redo损坏处理
    • 记录一次数据库异常导致ipc未释放案例
    • ORA-600 kcratr_nab_less_than_odr故障解决
    • 记录一次由于坏块和不恰当恢复引起各种ORA-600案例
    • ORACLE 12C ORA-07445[ktuHistRecUsegCrtMain()+1173]恢复
    • ORA-01115 ORA-01110 ORA-27067故障恢复案例
    • ORA-600 kclchkblk_4 故障恢复
    • 乱用_allow_resetlogs_corruption参数导致悲剧


沪ICP备19023445号-2号
友情链接