本站文章除注明转载外,均为本站原创: 转载自love wife & love life —Roger 的Oracle技术博客
本文链接地址: oracle 9201 for windows 恢复一例
这是最近我们处理的某客户的数据恢复案例,客户环境为windows,Oracle 版本为9201。由于磁盘故障最后导致数据库无法打开。经过了解才知道,该客户在6月份就出故障了,期间找过多个厂家进行恢复,均未解决问题。最后客户通过很早之前的一个exp 备份恢复了业务,虽然目前业务已经正常,然而差了半年的数据,因此仍然需要进行恢复。
我们首先来看下数据库的告警日志信息,如下所示:
Tue May 26 18:00:18 2015 Database Characterset is US7ASCII replication_dependency_tracking turned off (no async multimaster replication found) Completed: ALTER DATABASE OPEN Wed May 27 07:59:36 2015 Thread 1 advanced to log sequence 1059 Current log# 3 seq# 1059 mem# 0: D:\ORACLE\ORADATA\SJYYKF\REDO03.LOG Wed May 27 09:39:21 2015 Errors in file d:\oracle\admin\sjyykf\udump\sjyykf_ora_5276.trc: Wed May 27 09:49:09 2015 Errors in file d:\oracle\admin\sjyykf\udump\sjyykf_ora_5212.trc: ORA-00600: internal error code, arguments: [qerrmOFBu1], [1013], [], [], [], [], [], [] Wed May 27 09:49:11 2015 Error 3120 trapped in 2PC on transaction 3.44.42846. Cleaning up. Error stack returned to user: ORA-03120: two-task conversion routine: integer overflow ORA-02063: preceding line from CTAIS Wed May 27 10:16:49 2015 Errors in file d:\oracle\admin\sjyykf\udump\sjyykf_ora_5212.trc: ORA-00600: internal error code, arguments: [729], [12224], [space leak], [], [], [], [], [] Wed May 27 10:16:49 2015 Errors in file d:\oracle\admin\sjyykf\udump\sjyykf_ora_5212.trc: ORA-00600: internal error code, arguments: [ncodeffsheap], [], [], [], [], [], [], [] ORA-00600: internal error code, arguments: [729], [12224], [space leak], [], [], [], [], [] Wed May 27 11:40:16 2015 Errors in file d:\oracle\admin\sjyykf\udump\sjyykf_ora_2640.trc: ORA-24343: user defined callback error ORA-02063: preceding line from CTAIS Wed May 27 11:48:35 2015 Errors in file d:\oracle\admin\sjyykf\udump\sjyykf_ora_6096.trc: ORA-24343: user defined callback error ORA-02063: preceding line from CTAIS Wed May 27 13:57:57 2015 Error 2068 trapped in 2PC on transaction 7.42.40674. Cleaning up. Error stack returned to user: ORA-02068: following severe error from CTAIS ORA-03113: end-of-file on communication channel Wed May 27 17:42:48 2015 Errors in file d:\oracle\admin\sjyykf\udump\sjyykf_ora_5604.trc: ORA-02068: following severe error from CTAIS ORA-03113: end-of-file on communication channel 。。。。。 Wed May 27 18:00:18 2015 Database Characterset is US7ASCII replication_dependency_tracking turned off (no async multimaster replication found) Completed: ALTER DATABASE OPEN Thu May 28 06:29:49 2015 Thread 1 advanced to log sequence 1060 Current log# 1 seq# 1060 mem# 0: D:\ORACLE\ORADATA\SJYYKF\REDO01.LOG Thu May 28 10:59:05 2015 Errors in file d:\oracle\admin\sjyykf\udump\sjyykf_ora_5364.trc: ORA-00600: internal error code, arguments: [qerrmOFBu1], [1013], [], [], [], [], [], [] Thu May 28 11:28:03 2015 Errors in file d:\oracle\admin\sjyykf\udump\sjyykf_ora_3340.trc: ORA-00600: internal error code, arguments: [qerrmOFBu1], [1013], [], [], [], [], [], [] 。。。。。 。。。。。 Mon Jun 01 14:43:50 2015 Error 2068 trapped in 2PC on transaction 6.24.44274. Cleaning up. Error stack returned to user: ORA-02068: following severe error from CTAIS ORA-03113: end-of-file on communication channel Mon Jun 01 16:52:00 2015 Errors in file d:\oracle\admin\sjyykf\udump\sjyykf_ora_5424.trc: ORA-00600: internal error code, arguments: [qerrmOFBu1], [1013], [], [], [], [], [], [] Mon Jun 01 16:52:01 2015 Error 3106 trapped in 2PC on transaction 1.25.40549. Cleaning up. Error stack returned to user: ORA-03106: fatal two-task communication protocol error ORA-02063: preceding line from CTAIS Tue Jun 02 02:52:39 2015 Thread 1 advanced to log sequence 1064 Current log# 2 seq# 1064 mem# 0: D:\ORACLE\ORADATA\SJYYKF\REDO02.LOG Tue Jun 02 09:10:36 2015 Errors in file d:\oracle\admin\sjyykf\udump\sjyykf_ora_4596.trc: ORA-02068: following severe error from CTAIS ORA-03113: end-of-file on communication channel Tue Jun 02 09:16:01 2015 Error 2068 trapped in 2PC on transaction 6.22.44357. Cleaning up. Error stack returned to user: ORA-02068: following severe error from CTAIS ORA-03113: end-of-file on communication channel Shutting down instance: further logons disabled Shutting down instance (immediate)
从上面的信息可以看出,客户的环境其实在5月份就出现过多次问题,alert log报了不少的错误,其中关于ORA-00600 [qerrmOFBu1]这个错误,根据mos 文档来看,极有可能是9201版本的bug。另外我们还可以看到存在ORA-00600 [729] 错误,该错误通常跟内存泄露有关,根据这一点也可以大致判断,客户的数据库环境内存可能不足。
事实证明,后面到现场了解情况发现,该数据库环境在虚拟机环境中,CPU和内存都比较少
我们继续来看最近一次的故障信息,如下:
Tue Jun 02 18:00:21 2015 SMON: enabling cache recovery Tue Jun 02 18:00:22 2015 Undo Segment 1 Onlined Undo Segment 2 Onlined Undo Segment 3 Onlined Undo Segment 4 Onlined Undo Segment 5 Onlined Undo Segment 6 Onlined Undo Segment 7 Onlined Undo Segment 8 Onlined Undo Segment 9 Onlined Undo Segment 10 Onlined Successfully onlined Undo Tablespace 1. Tue Jun 02 18:00:22 2015 SMON: enabling tx recovery Tue Jun 02 18:00:22 2015 Database Characterset is US7ASCII replication_dependency_tracking turned off (no async multimaster replication found) Completed: ALTER DATABASE OPEN *** Corrupt block relative dba: 0x01058531 (file 4, block 361777) Fractured block found during buffer read Data in bad block - type: 6 format: 2 rdba: 0x01058531 last change scn: 0x0d21.3c28cf0a seq: 0x2 flg: 0x04 consistency value in tail: 0x480a0601 check value in block header: 0xd46e, computed block checksum: 0x8703 spare1: 0x0, spare2: 0x0, spare3: 0x0 *** Reread of rdba: 0x01058531 (file 4, block 361777) found same corrupted data *** Corrupt block relative dba: 0x010683c9 (file 4, block 426953) Bad check value found during buffer read Data in bad block - type: 6 format: 2 rdba: 0x010683c9 last change scn: 0x0d21.b84eab23 seq: 0x2 flg: 0x04 consistency value in tail: 0xab230602 check value in block header: 0xca31, computed block checksum: 0x3454 spare1: 0x0, spare2: 0x0, spare3: 0x0 *** Reread of rdba: 0x010683c9 (file 4, block 426953) found same corrupted data *** Corrupt block relative dba: 0x010fdf29 (file 4, block 1040169) Bad check value found during buffer read Data in bad block - type: 6 format: 2 rdba: 0x010fdf29 last change scn: 0x0d21.b851943f seq: 0x2 flg: 0x04 consistency value in tail: 0x943f0602 check value in block header: 0x92bd, computed block checksum: 0x300a spare1: 0x0, spare2: 0x0, spare3: 0x0 *** Reread of rdba: 0x010fdf29 (file 4, block 1040169) found same corrupted data *** Corrupt block relative dba: 0x010a4782 (file 4, block 673666) Fractured block found during buffer read Data in bad block - type: 6 format: 2 rdba: 0x010a4782 last change scn: 0x0d21.4c552c11 seq: 0x2 flg: 0x04 consistency value in tail: 0xeb550602 check value in block header: 0xb524, computed block checksum: 0xc744 spare1: 0x0, spare2: 0x0, spare3: 0x0 *** Reread of rdba: 0x010a4782 (file 4, block 673666) found same corrupted data Wed Jun 03 05:22:16 2015 Errors in file d:\oracle\admin\sjyykf\udump\sjyykf_ora_4916.trc: ORA-00600: internal error code, arguments: [rworupo.1], [2560], [60], [], [], [], [], [] *** Corrupt block relative dba: 0x01051131 (file 4, block 332081) Fractured block found during buffer read Data in bad block - type: 6 format: 2 rdba: 0x01051131 last change scn: 0x0d21.3c252e20 seq: 0x2 flg: 0x04 consistency value in tail: 0x32840602 check value in block header: 0xcfd5, computed block checksum: 0x1ca4 spare1: 0x0, spare2: 0x0, spare3: 0x0 *** Reread of rdba: 0x01051131 (file 4, block 332081) found same corrupted data Wed Jun 03 05:22:56 2015 Thread 1 advanced to log sequence 1065 Current log# 3 seq# 1065 mem# 0: D:\ORACLE\ORADATA\SJYYKF\REDO03.LOG Wed Jun 03 07:12:17 2015 Errors in file d:\oracle\admin\sjyykf\udump\sjyykf_ora_4748.trc: ORA-00600: internal error code, arguments: [15851], [8], [8], [1], [2], [], [], [] Wed Jun 03 10:00:55 2015 Errors in file d:\oracle\admin\sjyykf\udump\sjyykf_ora_2120.trc: *** Corrupt block relative dba: 0x011e6669 (file 4, block 1992297) Bad check value found during buffer read Data in bad block - type: 6 format: 2 rdba: 0x011e6669 last change scn: 0x0d21.e670ecbc seq: 0x2 flg: 0x04 consistency value in tail: 0xecbc0602 check value in block header: 0xc93, computed block checksum: 0x6090 spare1: 0x0, spare2: 0x0, spare3: 0x0 *** Reread of rdba: 0x011e6669 (file 4, block 1992297) found same corrupted data Wed Jun 03 10:15:14 2015 Shutting down instance: further logons disabled Starting ORACLE instance (normal) Dump file d:\oracle\admin\sjyykf\bdump\alert_sjyykf.log Wed Jun 03 15:28:24 2015 ORACLE V9.2.0.1.0 - Production vsnsta=0 vsnsql=12 vsnxtr=3 Windows 2000 Version 5.2 Service Pack 2, CPU type 586 Wed Jun 03 15:28:24 2015 Starting ORACLE instance (normal) 。。。。。。 。。。。。。 Wed Jun 03 17:48:25 2015 Database mounted in Exclusive Mode. Completed: ALTER DATABASE MOUNT Wed Jun 03 17:48:51 2015 ALTER DATABASE RECOVER datafile 2 Wed Jun 03 17:48:51 2015 Media Recovery Datafile: 2 Media Recovery Start WARNING! Recovering data file 2 from a fuzzy backup. It might be an online backup taken without entering the begin backup command. Starting datafile 2 recovery in thread 1 sequence 1064 Datafile 2: 'D:\ORACLE\ORADATA\SJYYKF\UNDOTBS01.DBF' Media Recovery Log Recovery of Online Redo Log: Thread 1 Group 2 Seq 1064 Reading mem 0 Mem# 0 errs 0: D:\ORACLE\ORADATA\SJYYKF\REDO02.LOG *** Corrupt block relative dba: 0x008002c9 (file 2, block 713) Fractured block found during media/instance recovery Data in bad block - type: 2 format: 2 rdba: 0x008002c9 last change scn: 0x0d21.4d3eaf6b seq: 0x24 flg: 0x04 consistency value in tail: 0x2a090207 check value in block header: 0x19e5, computed block checksum: 0xfb49 spare1: 0x0, spare2: 0x0, spare3: 0x0 *** Reread of rdba: 0x008002c9 (file 2, block 713) found same corrupted data Errors with log . Media Recovery failed with error 354 ORA-283 signalled during: ALTER DATABASE RECOVER datafile 2 ... Wed Jun 03 17:50:41 2015 alter database datafile 2 offline Wed Jun 03 17:50:45 2015 ORA-1145 signalled during: alter database datafile 2 offline... Wed Jun 03 17:54:07 2015 alter database datafile 2 offline drop Wed Jun 03 17:54:07 2015 Completed: alter database datafile 2 offline drop Wed Jun 03 17:54:50 2015 alter database datafile 2 offline Wed Jun 03 17:54:50 2015 Completed: alter database datafile 2 offline Wed Jun 03 17:56:01 2015 alter database open Wed Jun 03 17:56:01 2015 Beginning crash recovery of 1 threads Wed Jun 03 17:56:01 2015 Started first pass scan Wed Jun 03 17:56:01 2015 Completed first pass scan 3030 redo blocks read, 797 data blocks need recovery Wed Jun 03 17:56:01 2015 Started recovery at Thread 1: logseq 1065, block 2, scn 3361.3097022559 Recovery of Online Redo Log: Thread 1 Group 3 Seq 1065 Reading mem 0 Mem# 0 errs 0: D:\ORACLE\ORADATA\SJYYKF\REDO03.LOG *** Corrupt block relative dba: 0x01108411 (file 4, block 1082385) Fractured block found during crash/instance recovery Data in bad block - type: 32 format: 2 rdba: 0x01108411 last change scn: 0x0d21.4caf8ef6 seq: 0x2 flg: 0x04 consistency value in tail: 0xa1c72001 check value in block header: 0x8274, computed block checksum: 0x2f32 spare1: 0x0, spare2: 0x0, spare3: 0x0 *** Reread of rdba: 0x01108411 (file 4, block 1082385) found same corrupted data *** Corrupt block relative dba: 0x0119bc11 (file 4, block 1686545) Fractured block found during crash/instance recovery Data in bad block - type: 32 format: 2 rdba: 0x0119bc11 last change scn: 0x0d21.4cc759c7 seq: 0x1 flg: 0x04 consistency value in tail: 0xba332001 check value in block header: 0xdc04, computed block checksum: 0xe3f4 spare1: 0x0, spare2: 0x0, spare3: 0x0 *** Reread of rdba: 0x0119bc11 (file 4, block 1686545) found same corrupted data Wed Jun 03 17:56:02 2015 Ended recovery at Thread 1: logseq 1065, block 3032, scn 3361.3097685436 803 data blocks read, 2 data blocks written, 3030 redo blocks read Crash recovery completed successfully Wed Jun 03 17:56:02 2015 Thread 1 advanced to log sequence 1066 Thread 1 opened at log sequence 1066 Current log# 1 seq# 1066 mem# 0: D:\ORACLE\ORADATA\SJYYKF\REDO01.LOG Successful open of redo thread 1. Wed Jun 03 17:56:02 2015 SMON: enabling cache recovery Wed Jun 03 17:56:02 2015 Errors in file d:\oracle\admin\sjyykf\udump\sjyykf_ora_3728.trc: ORA-00600: internal error code, arguments: [4000], [6], [], [], [], [], [], [] Wed Jun 03 17:56:03 2015 Errors in file d:\oracle\admin\sjyykf\udump\sjyykf_ora_3728.trc: ORA-00704: bootstrap process failure ORA-00704: bootstrap process failure ORA-00600: internal error code, arguments: [4000], [6], [], [], [], [], [], [] Wed Jun 03 17:56:03 2015 Error 704 happened during db open, shutting down database USER: terminating instance due to error 704 Instance terminated by USER, pid = 3728 ORA-1092 signalled during: alter database open... Wed Jun 03 18:06:48 2015 Starting ORACLE instance (normal) LICENSE_MAX_SESSION = 0 LICENSE_SESSIONS_WARNING = 0 SCN scheme 2 Using log_archive_dest parameter default value LICENSE_MAX_USERS = 0 SYS auditing is disabled Starting up ORACLE RDBMS Version: 9.2.0.1.0. System parameters with non-default values:
我们可以发现,出问题之后,有工程师尝试进行了多次恢复,其中包括offline datafile的动作。最后多次打开数据库时都出现ORA-00600 [4000]错误,也正是卡在这个错误上,导致无法进行下一步。 由于这个错误,导致多个厂家都未能解决该问题。 我们首先来看下该错误所对应的trace 文件内容:
** 2015-06-04 11:42:37.468 ksedmp: internal or fatal error ORA-00600: internal error code, arguments: [4000], [6], [], [], [], [], [], [] Current SQL statement for this session: select ctime, mtime, stime from obj$ where obj# = :1 ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- _ksedmp+327 CALLrel _ksedst+0 _ksfdmp.108+14 CALLrel _ksedmp+0 3 _kgeriv+137 CALLreg 00000000 49101D0 3 _kgeasi+189 CALLrel _kgeriv+0 49101D0 341219C FA0 1 490B1D4 _ktudba+541 CALLrel _kgeasi+0 49101D0 341219C FA0 2 1 4 6 __VInfreq__ktrget+1 CALLrel _ktudba+0 6 490B69C 0 0 647 _ktrgtc+339 CALLrel _ktrgcm.98+0 _kdsgrp.51+432 CALLrel _ktrgtc+0 342CAD0 342CA24 599948 490B800 200 598608 599598 _kdsfbrcb+420 CALLrel _kdsgrp.51+0 342CACC 0 342CACC ..1.15_3.filter.81+ CALLrel _kdsfbrcb+0 342CACC 342CE3C 0 1 0 0 949 342CA24 599F28 490B90C 0 _kpofrws+223 CALL??? 00000000 3B79D128 5240BC 490BB74 1 ..1.3_1.filter.30+7 CALLrel _kpofrws+0 343C11C 3B79ECA0 5240BC 42 490BB74 1 _opifch+76 CALLrel _opifch2+0 89 5 490BCAC _opiodr+1398 CALLreg 00000000 5 2 490C3BC _rpidrus.43+153 CALLrel _opiodr+0 5 2 490C3BC 5 _skgmstack+113 CALLreg 00000000 490BF84 _rpidru+109 CALLrel _skgmstack+0 490BF9C 4910058 F618 565D98 490BF84 _rpiswu2+382 CALLreg 00000000 490C2C0 _rpidrv+298 CALLrel _rpiswu2+0 _rpifch+28 CALLrel _rpidrv+0 5 5 490C3BC 8 _kqdpts+178 CALLrel _rpifch+0 5 5 ..1.56_6.filter.13+ CALLrel _kqdpts+0 43 _kqlbplc+113 CALLrel _kqrlfc+0 _kqlblfc+181 CALLrel _kqlbplc+0 0 _adbdrv+10013 CALLrel _kqlblfc+0 0 490D590 490D590 ..1.5_1.filter.29+6 CALLrel _adbdrv+0 55 _opiosq0+2507 CALLrel _opiexe+0 4 0 490D9C8 _kpooprx+236 CALLrel _opiosq0+0 3 E 490DA60 24 _kpoal8+561 CALLrel _kpooprx+0 490E338 490E278 13 1 0 24 _opiodr+1398 CALLreg 00000000 5E 14 490E334 _ttcpip+3024 CALLreg 00000000 5E 14 490E334 0 _opitsk+1907 CALLrel _ttcpip+0 _opiino+1480 CALLrel _opitsk+0 0 0 49164B0 441784C F4 0 _opiodr+1398 CALLreg 00000000 3C 4 490FBD8 _opidrv+563 CALLrel _opiodr+0 3C 4 490FBD8 0 _sou2o+25 CALLrel _opidrv+0 _opimai+266 CALLrel _sou2o+0 _OracleThreadStart@ CALLrel _opimai+0 4+961 7C82482C CALLreg 00000000
我们可以看到,在open 数据库时,Oracle 递归SQL执行失败,导致无法打开数据库。该错误本质上是因为在执行递归SQL的时候需要利用undo来构造一致性读,但是却发现undo存在损坏。由于是非归档环境,因此这里也就无法通过常规手段进行恢复。
该错误中,其中[6] 表示回滚段编号,也就是说访问出错的undo块属于第6号回滚段。
有人或许会说,如果使用隐含参数强制offline 第6号回滚段是否可以打开数据库呢? 实际上这里是不行的。
我们接着来看下该SQL访问的数据块为什么需要构造一致性读呢 ?
Block header dump: 0x0040007a Object id on Block? Y seg/obj: 0x12 csc: 0xd21.e67ade95 itc: 1 flg: - typ: 1 - DATA fsl: 0 fnx: 0x0 ver: 0x01 Itl Xid Uba Flag Lck Scn/Fsc 0x01 0x0006.00e.0000ad3c 0x008005d8.1e96.3c --U- 1 fsc 0x0000.e67ade96 data_block_dump,data header at 0x10fc0044 =============== tsiz: 0x1fb8 hsiz: 0xea pbl: 0x10fc0044 bdba: 0x0040007a 76543210 flag=-------- ntab=1 nrow=108 frre=-1 fsbo=0xea fseo=0x110 avsp=0x35c tosp=0x35c 0xe:pti[0] nrow=108 offs=0
从上面信息可以看出,object id=18的对象上,确实存在一个block,事务状态为U,虽然事务已经commit,然而还存在一行锁定记录。Oracle的块清除分为2种:快速块清除和延迟块清除。很明显,这里属于快速块清除的情况,大多数情况之下的数据块清除都是快速清除,因为这样的效率比较高。
这里如果要把数据库open,那么其实通过bbed修改该block的ITL信息,标记事务为提交,同时修改lck等信息即可。
由于我通过DBV 检查主要的业务数据表空间文件时,发现存在上百个坏块,而且DBV检测到一定位置后就终止了,这可能是物理坏道的问题,如下:
D:\oracle\oradata\sjyykf>dbv file=SJYY_DAT.DBF blocksize=8192 DBVERIFY: Release 9.2.0.8.0 - Production on Mon Aug 10 14:31:35 2015 Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved. DBVERIFY - Verification starting : FILE = SJYY_DAT.DBF Page 332081 is influx - most likely media corrupt *** Corrupt block relative dba: 0x01051131 (file 4, block 332081) Fractured block found during dbv: Data in bad block - type: 6 format: 2 rdba: 0x01051131 last change scn: 0x0d21.3c252e20 seq: 0x2 flg: 0x04 consistency value in tail: 0x32840602 check value in block header: 0xcfd5, computed block checksum: 0x1ca4 spare1: 0x0, spare2: 0x0, spare3: 0x0 *** Page 361777 is influx - most likely media corrupt *** Corrupt block relative dba: 0x01058531 (file 4, block 361777) Fractured block found during dbv: Data in bad block - type: 6 format: 2 rdba: 0x01058531 last change scn: 0x0d21.3c28cf0a seq: 0x2 flg: 0x04 consistency value in tail: 0x480a0601 check value in block header: 0xd46e, computed block checksum: 0x8703 spare1: 0x0, spare2: 0x0, spare3: 0x0 *** Page 387785 is marked corrupt *** Corrupt block relative dba: 0x0105eac9 (file 4, block 387785) Bad check value found during dbv: Data in bad block - type: 6 format: 2 rdba: 0x0105eac9 last change scn: 0x0d21.b83e4972 seq: 0x2 flg: 0x04 consistency value in tail: 0x49720602 check value in block header: 0x49ea, computed block checksum: 0xf2c2 spare1: 0x0, spare2: 0x0, spare3: 0x0 *** 。。。。。。。 。。。。。。。 Page 517425 is influx - most likely media corrupt *** Corrupt block relative dba: 0x0107e531 (file 4, block 517425) Fractured block found during dbv: Data in bad block - type: 6 format: 2 rdba: 0x0107e531 last change scn: 0x0d21.b85cf83a seq: 0x1 flg: 0x04 consistency value in tail: 0xf3ca0602 check value in block header: 0x1010, computed block checksum: 0xbf3 spare1: 0x0, spare2: 0x0, spare3: 0x0 *** DBV-00102: File I/O error on FILE (SJYY_DAT.DBF) during verification read operation (-2) D:\oracle\oradata\sjyykf>
同时考虑到数据量不大,不到30G,因此直接通过ODU恢复即可,因为system 检查发现只有1个坏块,数据字典是完好的。
备注:数据库的物理备份是必须的,否则一旦出现问题,将是灾难性的。
Related posts: