In this Document
Applies to:
Oracle Database – Enterprise Edition – Version 11.1.0.7 to 11.1.0.7 [Release 11.1]
Information in this document applies to any platform.
***Checked for relevance on 26-Nov-2012***
Symptoms
Alert log keeps reporting:
Sat Feb 21 19:23:25 2009
Errors in file b:\oracle\diag\rdbms\<dbs>\<sid>\trace\<sid>_smon_4364.trc (incident=692119):
ORA-00600: internal error code, arguments: [ktecgeb-2], [1572], [0], [], [], [], [], [], [], [],[], []
Non-fatal internal error happenned while SMON was doing temporary segment drop.
Due to SMON crashed the database did crash and will crash again during startup of database.
Stacktrace of SMON is like:
… <- ktecgeb <- kteopshrink <- ktssdrbm_segment <- ktssdro_segment <- ktssdt_segs <- ktmmon <- …
Cause
There is a problem during cleaning up of temporary segment(s).
As known SMON is responsible for cleanup of temporary segments and in this specific case the bitmap of the temporary segment(s) is corrupt.
Likely cause of the error as hit is:
Oracle 11.1 and 11.2:
Bug 9495481 – ORA-600 [KTECGEB-2] WHILE SMON WAS DOING TEMPORARY SEGMENT DROP
Oracle 10.2:
Unpublished Bug 5743780 – ORA-600 [KTECGEB-2] BAD EXTENT MAP BLOCK IN ASSM APPLYING 9I REDO IN 10.2
The error did occur at moment that 9i redo was applied in 10.2/11.1 rdbms (issue hit at/during upgrade of database)
Solution
The solution of the problem as hit will consist of 2 actions:
a) actions to prevent the problem in reoccurring
b) resolve the current problem as hit
a) actions to prevent the problem in reoccurring
Bug 9495481 as referred before has been logged to address the problem as hit in Oracle 11.1.0.7.0 (but is also applicable to 11.2)
Checking this bug learns that the error can occur as well in Oracle 10.2 which is addressed in non-published Bug 5743780.
The error does occur when applying 9i redo against 10.2/11.1/11.2 version of database which will/might be the case during an upgrade.
In case of upgrade the action would be to ensure that the version being upgrading to does contain the fix for this issue before upgrade being done:
- in case of Oracle 10.2 install patchset 10.2.0.5.0
- in case of Oracle 11.1 install one-off fix for Bug 9495481 (on top of 11.1.0.7)
- in case of Oracle 11.2 install patchset 11.2.0.3.0
The fix is included in 12.1
In case there is no option to redo the upgrade the action would be to resolve the current problem and install the fix afterward.
b) resolve the current problem:
- thorough diagnose of problem as hit ==> ensuring that steps to repair are the correct ones
- repair actions
DIAGNOSE
=========
The 1st action is to startup the database without SMON trying to cleanup the temporary segment(s):
==> only ment to be used for this diagnose/repair action only!!
A PFILE will be used for all repair actions which makes it easy to back out all needed parameters
$> cp $ORACLE_HOME/dbs/init$ORACLE_SID.ora /tmp
OR
$> sqlplus /nolog << EOF
connect / as sysdba
startup mount;
create pfile='/tmp/init$ORACLE_SID.ora' from spfile;
shutdown immediate
EOF
Add the following parameter to PFILE in order to disable SMON in cleanup of temporary segments:
event="10061 trace name context forever, level 10"
Startup the database in restricted mode using the PFILE:
$> sqlplus /nolog << EOF
connect / as sysdba
startup restrict pfile=/tmp/init$ORACLE_SID.ora
EOF
Check which temporary segments are present in the database:
$> sqlplus /nolog << EOF
connect / as sysdba
set pagesize 9999
select * from dba_segments where segment_type='TEMPORARY';
EOF
Outcome of above will be something like:
OWNER SEGMENT_NAME PARTITION_NAME SEGMENT_TYPE SEGMENT_SU TABLESPACE_NAME HEADER_FILE HEADER_BLOCK BYTES BLOCKS EXTENTS INITIAL_EXTENT NEXT_EXTENT MIN_EXTENTS MAX_EXTENTS MAX_SIZE RETENTI MINRETENTION PCT_INCREASE FREELISTS FREELIST_GROUPS RELATIVE_FNO BUFFER_
DISCH_SUM 8.140238 TEMPORARY DISCH_SUM 8 140238 8471969792 1034176 16159 524288 524288 1 2147483645 0 8 DEFAULT
DISCH_SUM 8.148174 TEMPORARY DISCH_SUM 8 148174 2564292608 313024 4891 524288 524288 1 2147483645 0 8 DEFAULT
At this moment in time we do know which temporary segment is possibly causing a problem and in which tablespace the segment is located.
Rechecking the SMON tracefile to see which blocks were last visited shows:
…
1: waited for 'db file sequential read'
file#=8, block#=223ce, blocks=1 <…
==> so the likely segment giving a problem in this case is 8.140238
Check the tablespace with dbms_space_admin to dump the bitmap:
$> sqlplus /nolog << EOF
connect / as sysdba
alter session set tracefile_identifier='dbms_space_admin';
exec dbms_space_admin.assm_tablespace_verify(upper('&tablespace_name'),20);
EOF
*tracefile_identifier being used to easy identify the tracefile as created
Above command has created a tracefile located in user_dump_dest, check the tracefile for errors
(dependent of amount of objects in tracefile it might be a large tracefile, but you can use the block# as collected earlier to quickly search for the temporary segments).
Be aware of possible false reported errors, for instance Bug 7665990 which incorrect report:
The datablock [dba: 0x002001bcf, (file 8, block 7119)] has wrong L1 parent [dba: 0x000, (file 0, block 0)]
Example of tracefile:
*** 2009-02-23 11:31:17.716
Examining segment with segment header (8,78)
*********
verifying extent map and tablespace bitmap consistency
———
Verifying consistency of segment header, L3, L2 and L1 bitmap blocks
Verifying datablocks and L1 BMB consistency
———
Verifying Segment HWM Consistency
Verifying High HWM information
Verifying Low HWM information
———
Verifying consistency of hwm information in segment header and L1 bitmap block
———
Verifying extent map and auxilliary extent map consistency in the segment
Verifying extent map and L1 bitmap blocks consistency in the segment
———
Verifying segment header and seg$ information consistency of the segment
———
The temporary segments will NOT have an entry in the tracefile as created!
The check of the tablespace was to check if errors were present and thus excluding other issues.
A rebuild of the bitmap of the tablespace will not solve this issue.
REPAIR
======
Before doing any destructive action always make sure that there is a valid backup to rely on, which can be restored in case needed.
In order to resolve this problem we need to:
- find the relative_fno of the datafile in question
- corrupt the segment(s)
- drop the segment(s)
- rebuild the bitmap of the tablespace
- recheck the bitmap of the tablespace
- check if temporary segments have been cleaned up
- restart the database without event 10061
ActionPlan:
find the relative_fno
SQL> select relative_fno from dba_data_files where file_id=&file_id;
corrupt segment(s)
SQL> exec dbms_space_admin.segment_corrupt(upper('&tablespace_name'),&header_relative_file,&header_block,3);
…
drop segment(s)
SQL> exec dbms_space_admin.segment_drop_corrupt(upper('&tablespace_name'),&header_relative_file,&header_block);
…
rebuild the bitmap
SQL> exec dbms_space_admin.tablespace_rebuild_bitmaps(upper('&tablespace_name'));
==> check the tracefile as created!!
check if temporary segments are all cleaned up:
SQL> select * from dba_segments where segment_type='TEMPORARY';
restart the database without event 10061 so without PFILE:
SQL> shutdown immediate
SQL> startup
System is up and running again!
*In the specific TAR both temporary segments have been dropped using above steps, according the SMON tracefile the problem was with segment 8.140238.
I do advice to drop the segment as identified only and check if problem has been resolved, SMON will cleanup the other segment automatical in case it is not corrupt.
这里我采取的是保险的方法,找出select owner,segment_name from dba_segments where segment_type='TEMPORARY';
找到对应文件,然后让应用移走这个表空间的内容,然后干掉这个表空间。
最后重启数据库,取消10061事件,数据库恢复。