IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    ORA-600 krhpfh_03-1210故障处理

    惜分飞发表于 2024-08-26 04:12:01
    love 0

    联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

    标题:ORA-600 krhpfh_03-1210故障处理

    作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

    rac数据库多个节点均处于open状态,数据查询正常,但是应用入库有些时候会失败报类似ORA-01187: cannot read from file because it failed verification tests错误:
    ora-01187


    故障最初原因是由于有坏盘,换盘之后,有两个节点数据实例crash

    Mon Aug 19 21:16:47 2024
    Read of datafile '+DATA/xifenfei99.dbf' (fno 1399) header failed with ORA-01207
    Rereading datafile 1399 header failed with ORA-01207
    Errors in file /u01/app/oracle/diag/rdbms/xff/xff5/trace/xff5_ckpt_75779.trc:
    ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode
    ORA-01122: database file 1399 failed verification check
    ORA-01110: data file 1399: '+DATA/xifenfei99.dbf'
    ORA-01207: file is more recent than control file - old control file
    Errors in file /u01/app/oracle/diag/rdbms/xff/xff5/trace/xff5_ckpt_75779.trc:
    ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode
    ORA-01122: database file 1399 failed verification check
    ORA-01110: data file 1399: '+DATA/xifenfei99.dbf'
    ORA-01207: file is more recent than control file - old control file
    CKPT (ospid: 75779): terminating the instance due to error 1242
    Mon Aug 19 21:16:47 2024
    System state dump requested by (instance=5, osid=75779 (CKPT)), summary=[abnormal instance termination].
    System State dumped to trace file /u01/app/oracle/diag/rdbms/xff/xff5/trace/xff5_diag_75725.trc
    Mon Aug 19 21:16:52 2024
    ORA-1092 : opitsk aborting process
    Mon Aug 19 21:16:53 2024
    ORA-1092 : opitsk aborting process
    Mon Aug 19 21:16:53 2024
    License high water mark = 131
    Termination issued to instance processes. Waiting for the processes to exit
    Mon Aug 19 21:17:02 2024
    Instance termination failed to kill one or more processes
    Instance terminated by CKPT, pid = 75779
    Mon Aug 19 21:17:03 2024
    USER (ospid: 33495): terminating the instance
    Termination issued to instance processes. Waiting for the processes to exit
    Mon Aug 19 21:17:13 2024
    Instance termination failed to kill one or more processes
    Instance terminated by USER, pid = 33495
    

    但是数据库人工启动成功,查询所有数据文件均处于online状态
    20240820-182825


    可是有部分入库进程非常慢大量等待在enq:HW – contention
    20240826-120804

    所有数据库节点alert日志偶尔报ORA-01186: file 1399 failed verification tests等错

    Tue Aug 20 21:30:02 2024
    Read of datafile '+DATA/xifenfei99.dbf' (fno 1399) header failed with ORA-01207
    Rereading datafile 1399 header failed with ORA-01207
    Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_dbw0_43828.trc:
    ORA-01186: file 1399 failed verification tests
    ORA-01122: database file 1399 failed verification check
    ORA-01110: data file 1399: '+DATA/xifenfei99.dbf'
    ORA-01207: file is more recent than control file - old control file
    File 1399 not verified due to error ORA-01122
    Read of datafile '+DATA/xifenfei99.dbf' (fno 1399) header failed with ORA-01207
    Rereading datafile 1399 header failed with ORA-01207
    Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_dbw0_43828.trc:
    ORA-01186: file 1399 failed verification tests
    ORA-01122: database file 1399 failed verification check
    ORA-01110: data file 1399: '+DATA/xifenfei99.dbf'
    ORA-01207: file is more recent than control file - old control file
    File 1399 not verified due to error ORA-01122
    

    基于这种情况,初步判断:
    1. 是由于该集群本身多节点(6个节点),只要有节点是open状态,其他节点关闭再启动依旧可以正常启动,但是无法写入数据到报ORA-01207错误的数据文件中(可以读取数据).
    2. 如果所有节点关闭关闭,然后数据库无法正常启动会报ORA-01207: file is more recent than control file错误

    这样的情况,根据以往经验,ORA-01207: file is more recent than control file通过重建ctl即可恢复,先关闭所有节点,然后尝试启动一个节点

    SQL> alter database open;
    alter database open
    *
    ERROR at line 1:
    ORA-01122: database file 1399 failed verification check
    ORA-01110: data file 1399: '+DATA/xifenfei99.dbf'
    ORA-01207: file is more recent than control file - old control file
    
    alter database open
    Wed Aug 21 14:14:22 2024
    SUCCESS: diskgroup REDO was mounted
    Wed Aug 21 14:14:22 2024
    NOTE: dependency between database xff and diskgroup resource ora.REDO.dg is established
    Wed Aug 21 14:14:27 2024
    Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_47884.trc:
    ORA-01122: database file 1399 failed verification check
    ORA-01110: data file 1399: '+DATA/xifenfei99.dbf'
    ORA-01207: file is more recent than control file - old control file
    ORA-1122 signalled during: alter database open...
    

    和预期的一样,重试重建ctl,然后数据库报ORA-00600 [krhpfh_03-1210]错误

    SQL> shutdown immediate;
    ORA-01109: database not open
    
    
    Database dismounted.
    ORACLE instance shut down.
    SQL> startup nomount pfile='/tmp/xff/pfile';
    ORACLE instance started.
    
    Total System Global Area 1.3255E+11 bytes
    Fixed Size		    2244832 bytes
    Variable Size		 9.7442E+10 bytes
    Database Buffers	 3.4897E+10 bytes
    Redo Buffers		  208654336 bytes
    SQL> @rectl
    
    Control file created.
    
    SQL> 
    SQL> 
    SQL> 
    SQL> recover database;
    ORA-00283: recovery session canceled due to errors
    ORA-01610: recovery using the BACKUP CONTROLFILE option must be done
    
    
    SQL> recover database using backup controlfile;
    ORA-00283: recovery session canceled due to errors
    ORA-00600: internal error code, arguments: [krhpfh_03-1210], [fno =], [1399],
    [fhcpc =], [274968], [fhccc =], [274983], [], [], [], [], []
    ORA-01110: data file 1399: '+DATA/xifenfei99.dbf'
    

    这里的提示是有fhcpc和fhccc值不对导致,通过bbed查看相关值

    BBED> set file 1399
    	FILE#          	1399
    
    BBED> p kcvfhccc
    ub4 kcvfhccc                                @148      0x00043227 ===>274983(10进制)
    
    BBED> p kcvfhcpc
    ub4 kcvfhcpc                                @140      0x00043218 ===>274968(10进制)
    

    报错比较明显通过bbed修改这两个值

    BBED> m /x 2a390400 offset 148
    Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
     File: /tmp/xff/1399.dbf.header (1399)
     Block: 1                Offsets:  148 to  659           Dba:0x5dc00001
    ------------------------------------------------------------------------
     2a390400 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
     00000000 00000000 00000000 00000000 00000000 00000000 0c000000 0f004441 
     5441315f 5442535f 45515f30 31000000 00000000 00000000 00000000 78010000 
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
     00000000 00000000 00000000 cfebdd33 01000000 00000000 00000000 00000000 
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
     00000000 00000000 00000000 00000000 419333df 81001c0a 6ab13046 06000000 
     c1520400 02000000 10000000 7e000000 00000000 00000000 00000000 00000000 
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
     00000000 00000000 00000000 00000000 0d000d00 0d000100 00000000 00000000 
    
     <32 bytes per line>
    
    BBED> m /x 2b390400 offset 140
     File: /tmp/xff/1399.dbf.header (1399)
     Block: 1                Offsets:  140 to  651           Dba:0x5dc00001
    ------------------------------------------------------------------------
     2b390400 e6ef524d 2a390400 00000000 00000000 00000000 00000000 00000000 
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
     0c000000 0f004441 5441315f 5442535f 45515f30 31000000 00000000 00000000 
     00000000 78010000 00000000 00000000 00000000 00000000 00000000 00000000 
     00000000 00000000 00000000 00000000 00000000 cfebdd33 01000000 00000000 
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
     00000000 00000000 00000000 00000000 00000000 00000000 419333df 81001c0a 
     6ab13046 06000000 c1520400 02000000 10000000 7e000000 00000000 00000000 
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
     00000000 00000000 00000000 00000000 00000000 00000000 0d000d00 0d000100 
    
     <32 bytes per line>
    

    修改好这些值之后,recover database和open数据库成功,检查字典正常,业务读写也正常,完成本次恢复任务

    SQL> @hcheck
    HCheck Version 07MAY18 on 21-AUG-2024 15:13:02
    ----------------------------------------------
    Catalog Version 11.2.0.3.0 (1102000300)
    db_name: XFF
    
    				   Catalog	 Fixed
    Procedure Name			   Version    Vs Release    Timestamp
    Result
    ------------------------------ ... ---------- -- ---------- --------------
    ------
    .- LobNotInObj		       ... 1102000300 <=  *All Rel* 08/21 15:13:02 PASS
    .- MissingOIDOnObjCol	       ... 1102000300 <=  *All Rel* 08/21 15:13:02 PASS
    .- SourceNotInObj	       ... 1102000300 <=  *All Rel* 08/21 15:13:02 PASS
    .- OversizedFiles	       ... 1102000300 <=  *All Rel* 08/21 15:13:02 PASS
    .- PoorDefaultStorage	       ... 1102000300 <=  *All Rel* 08/21 15:13:02 PASS
    .- PoorStorage		       ... 1102000300 <=  *All Rel* 08/21 15:13:02 PASS
    .- TabPartCountMismatch        ... 1102000300 <=  *All Rel* 08/21 15:13:02 PASS
    .- OrphanedTabComPart	       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
    .- MissingSum$		       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
    .- MissingDir$		       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
    .- DuplicateDataobj	       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
    .- ObjSynMissing	       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
    .- ObjSeqMissing	       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
    .- OrphanedUndo 	       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
    .- OrphanedIndex	       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
    .- OrphanedIndexPartition      ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
    .- OrphanedIndexSubPartition   ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
    .- OrphanedTable	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
    .- OrphanedTablePartition      ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
    .- OrphanedTableSubPartition   ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
    .- MissingPartCol	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
    .- OrphanedSeg$ 	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
    .- OrphanedIndPartObj#	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
    .- DuplicateBlockUse	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
    .- FetUet		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
    .- Uet0Check		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
    .- SeglessUET		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
    .- BadInd$		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
    .- BadTab$		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
    .- BadIcolDepCnt	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
    .- ObjIndDobj		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
    .- TrgAfterUpgrade	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
    .- ObjType0		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
    .- BadOwner		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
    .- StmtAuditOnCommit	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
    .- BadPublicObjects	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
    .- BadSegFreelist	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
    .- BadDepends		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
    .- CheckDual		       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
    .- ObjectNames		       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
    .- BadCboHiLo		       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
    .- ChkIotTs		       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
    .- NoSegmentIndex	       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
    .- BadNextObject	       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
    .- DroppedROTS		       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
    .- FilBlkZero		       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
    .- DbmsSchemaCopy	       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
    .- OrphanedObjError	       ... 1102000300 >  1102000000 08/21 15:13:05 PASS
    .- ObjNotLob		       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
    .- MaxControlfSeq	       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
    .- SegNotInDeferredStg	       ... 1102000300 >  1102000000 08/21 15:13:06 PASS
    .- SystemNotRfile1	       ... 1102000300 >   902000000 08/21 15:13:06 PASS
    .- DictOwnNonDefaultSYSTEM     ... 1102000300 <=  *All Rel* 08/21 15:13:07 PASS
    .- OrphanTrigger	       ... 1102000300 <=  *All Rel* 08/21 15:13:07 PASS
    .- ObjNotTrigger	       ... 1102000300 <=  *All Rel* 08/21 15:13:07 PASS
    ---------------------------------------
    21-AUG-2024 15:13:07  Elapsed: 5 secs
    ---------------------------------------
    Found 0 potential problem(s) and 0 warning(s)
    
    PL/SQL procedure successfully completed.
    
    Statement processed.
    
    Complete output is in trace file:
    /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_70961_HCHECK.trc
    
    • dd操作数据文件
    • 在UltraEdit中定位数据文件内容
    • 记录因磁盘头被重写,抢救redo恢复经历
    • 系统中数据文件第一个数据块和oracle 中第一个数据块关系
    • 200T 数据库非归档无备份恢复
    • 校验代码为 6054 坏块故障修复
    • 表空间online出现ORA-00600[kcbz_check_objd_typ]处理过程
    • ORA-00333 ORA-01595 恢复
    • oradebug poke ORA-32521/ORA-32519故障解决
    • bbed简单替换block测试
    • file 1 block 128 corrupted/坏块恢复—system rollback坏块修复
    • 使用oradebug修改数据库scn


沪ICP备19023445号-2号
友情链接