环境介绍:11g, linux 11.2.0.1
1. 检查数据库alert日志;
Thu Aug 27 10:28:12 2015
ALTER DATABASE OPEN
Errors in file /oracle/diag/rdbms/testoa/testoa/trace/testoa_dbw0_6392.trc:
ORA-01157: cannot identify/lock data file 3 – see DBWR trace file
ORA-01110: data file 3: '/oracle/oradata/testoa/undotbs01.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /oracle/diag/rdbms/testoa/testoa/trace/testoa_ora_6418.trc:
ORA-01157: cannot identify/lock data file 3 – see DBWR trace file
ORA-01110: data file 3: '/oracle/oradata/testoa/undotbs01.dbf'
ORA-1157 signalled during: ALTER DATABASE OPEN…
从上面看,我们可以发现数据库因UNDO文件丢失而导致数据库不能启动,询问最近重启主机还有就是存储异常,导致。
2、因无备份和归档,尝试加隐藏参数将数据库拉起;
create pfile=’/home/oracle/pfileoa.ora’ from spfile。
在pfileoa.ora中增加如下参数:
rollback_segments = "SYSTEM"
_offline_rollback_segments= "_SYSSMU10_3550978943"
_offline_rollback_segments= "_SYSSMU10_3904554333"
_offline_rollback_segments= "_SYSSMU11_286947212"
_offline_rollback_segments= "_SYSSMU12_3068564564"
_offline_rollback_segments= "_SYSSMU13_2761193625"
_offline_rollback_segments= "_SYSSMU1_3780397527"
_offline_rollback_segments= "_SYSSMU14_2421411996"
_offline_rollback_segments= "_SYSSMU15_1683924174"
_offline_rollback_segments= "_SYSSMU16_2313212396"
_offline_rollback_segments= "_SYSSMU17_2041439332"
_offline_rollback_segments= "_SYSSMU1_783380902"
_offline_rollback_segments= "_SYSSMU18_2800789714"
_offline_rollback_segments= "_SYSSMU19_53723967"
_offline_rollback_segments= "_SYSSMU20_3850939844"
_offline_rollback_segments= "_SYSSMU2_2232571081"
_offline_rollback_segments= "_SYSSMU2_3138176977"
_offline_rollback_segments= "_SYSSMU3_1645411166"
_offline_rollback_segments= "_SYSSMU3_2097677531"
_offline_rollback_segments= "_SYSSMU4_1152005954"
_offline_rollback_segments= "_SYSSMU4_870421980"
_offline_rollback_segments= "_SYSSMU5_1527469038"
_offline_rollback_segments= "_SYSSMU5_2525172762"
_offline_rollback_segments= "_SYSSMU6_2443381498"
_offline_rollback_segments= "_SYSSMU6_3753507049"
_offline_rollback_segments= "_SYSSMU7_1260614213"
_offline_rollback_segments= "_SYSSMU7_3286610060"
_offline_rollback_segments= "_SYSSMU8_2012382730"
_offline_rollback_segments= "_SYSSMU8_2806087761"
_offline_rollback_segments= "_SYSSMU9_1424341975"
_offline_rollback_segments= "_SYSSMU9_973944058"
event = "10513 trace name context forever,level 2"
undo_management = "MANUAL"
3、通过此参数尝试拉起数据库,拉起失败,报如下错误;
Thu Aug 27 14:22:54 2015
SMON: enabling cache recovery
Errors in file /oracle/diag/rdbms/testoa/testoa/trace/testoa_ora_9122.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/oracle/oradata/testoa/undotbs01.dbf'
Errors in file /oracle/diag/rdbms/testoa/testoa/trace/testoa_ora_9122.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/oracle/oradata/testoa/undotbs01.dbf'
Error 704 happened during db open, shutting down database
USER (ospid: 9122): terminating the instance due to error 704
Instance terminated by USER, pid = 9122
从上面的错误,我们可以看到数据库还是去找Undo,通过通过10046跟踪数据库不能启动的原因(我这里同时设置了10200逻辑读跟踪事件,确认问题块)
4、检查trc文件发现system01.dbf的225数据库存在问题,这个块为undo$对应数据块;
5、到此,我们只能通过BBED(oracle内部工具,不建议一般DBA使用,避免引起数据库更大的损害)来修改块,操作前,我已经备份了相应的数据文件到oradata.bak目录;
BBED> set file 1 block 225
FILE# 1
BLOCK# 225
BBED> map /v
File: /oracle/oradata/testoa/system01.dbf (1)
Block: 225 Dba:0x004000e1
————————————————————
KTB Data Block (Table/Cluster)
struct kcbh, 20 bytes @0
ub1 type_kcbh @0
ub1 frmt_kcbh @1
ub1 spare1_kcbh @2
ub1 spare2_kcbh @3
ub4 rdba_kcbh @4
ub4 bas_kcbh @8
ub2 wrp_kcbh @12
ub1 seq_kcbh @14
ub1 flg_kcbh @15
ub2 chkval_kcbh @16
ub2 spare3_kcbh @18
struct ktbbh, 48 bytes @20
ub1 ktbbhtyp @20
union ktbbhsid, 4 bytes @24
struct ktbbhcsc, 8 bytes @28
sb2 ktbbhict @36
ub1 ktbbhflg @38
ub1 ktbbhfsl @39
ub4 ktbbhfnx @40
struct ktbbhitl[1], 24 bytes @44
struct kdbh, 14 bytes @68
ub1 kdbhflag @68
sb1 kdbhntab @69
sb2 kdbhnrow @70
sb2 kdbhfrre @72
sb2 kdbhfsbo @74
sb2 kdbhfseo @76
sb2 kdbhavsp @78
sb2 kdbhtosp @80
struct kdbt[1], 4 bytes @82
sb2 kdbtoffs @82
sb2 kdbtnrow @84
sb2 kdbr[21] @86 ——表示有21条记录,从0开始一直到20,从1开始修改,需要修改20次!!
ub1 freespace[5167] @128
ub1 rowdata[2893] @5295
ub4 tailchk @8188
BBED> p kdbr
sb2 kdbr[0] @86 8078
sb2 kdbr[1] @88 6142
sb2 kdbr[2] @90 6880
sb2 kdbr[3] @92 6947
sb2 kdbr[4] @94 5227
sb2 kdbr[5] @96 7080
sb2 kdbr[6] @98 7147
sb2 kdbr[7] @100 7214
sb2 kdbr[8] @102 7280
sb2 kdbr[9] @104 6008
sb2 kdbr[10] @106 7413
sb2 kdbr[11] @108 5946
sb2 kdbr[12] @110 5880
sb2 kdbr[13] @112 5817
sb2 kdbr[14] @114 5751
sb2 kdbr[15] @116 5685
sb2 kdbr[16] @118 5622
sb2 kdbr[17] @120 5556
sb2 kdbr[18] @122 5490
sb2 kdbr[19] @124 5426
sb2 kdbr[20] @126 5360
BBED> p *kdbr[1]
rowdata[915]
————
ub1 rowdata[915] @6210 0x2c
6、修改好后,尝试通过pfile启动数据库,数据库启动成功;
7、收尾工作;
创建新的undo表空间UNDOTBS02,删除老的UNDOTBS1。
8、检查alert日志,发现有600 [kdsgrp1]错误,检查为坏块导致,检查坏块为索引,对其重建后,日志不再出现600错误。
9、修改数据库的Undo为自动管理,数据库重启,恢复正常。