IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    分享某客户存储异常恢复之后oracle故障恢复—ORA-600 4155

    惜分飞发表于 2015-11-20 16:27:24
    love 0

    联系:手机(13429648788) QQ(107644445)QQ咨询惜分飞

    标题:分享某客户存储异常恢复之后oracle故障恢复—ORA-600 4155

    作者:惜分飞©版权所有[未经本人同意,请不得以任何形式转载,否则有进一步追究法律责任的权利.]

    某客户使用win 2003,Oracle 11.2.0.1+ASM架构方式,由于存储异常并且做了存储恢复之后,ASM可以正常mount起来,但是数据库无法打开
    使用dbv检查system发现有少量坏块

    DBVERIFY - 开始验证: FILE = +DATA/xifenfei/datafile/system.256.764288125
    页 3117 标记为损坏
    Corrupt block relative dba: 0x00400c2d (file 1, block 3117)
    Bad header found during dbv: 
    Data in bad block:
     type: 11 format: 2 rdba: 0x00400001
     last change scn: 0x0000.00000000 seq: 0x1 flg: 0x04
     spare1: 0x0 spare2: 0x0 spare3: 0x0
     consistency value in tail: 0x00000b01
     check value in block header: 0xfeec
     computed block checksum: 0x0
    
    Corrupt block relative dba: 0x0042002d (file 1, block 131117)
    Bad header found during dbv: 
    Data in bad block:
     type: 11 format: 2 rdba: 0x00400001
     last change scn: 0x0000.00000000 seq: 0x1 flg: 0x04
     spare1: 0x0 spare2: 0x0 spare3: 0x0
     consistency value in tail: 0x00000b01
     check value in block header: 0xfeec
     computed block checksum: 0x0
    
    Corrupt block relative dba: 0x0042003d (file 1, block 131133)
    Bad header found during dbv: 
    Data in bad block:
     type: 11 format: 2 rdba: 0x00400001
     last change scn: 0x0000.00000000 seq: 0x1 flg: 0x04
     spare1: 0x0 spare2: 0x0 spare3: 0x0
     consistency value in tail: 0x00000b01
     check value in block header: 0xfeec
     computed block checksum: 0x0
    
    
    DBVERIFY - 验证完成
    
    检查的页总数: 222208
    处理的页总数 (数据): 188939
    失败的页总数 (数据): 19
    处理的页总数 (索引): 17375
    失败的页总数 (索引): 0
    处理的页总数 (其他): 3190
    处理的总页数 (段)  : 1
    失败的总页数 (段)  : 0
    空的页总数: 12701
    标记为损坏的总页数: 3
    流入的页总数: 0
    加密的总页数        : 0
    最高块 SCN            : 0 (0.0)
    

    很多”页 131125 失败, 校验代码为 6125″类似错误忽略.
    我们对于这些坏块进行分析,这些坏块未涉及oracle 最核心的基表数据,从理论上可以open数据库

    尝试打开数据库

    ALTER DATABASE OPEN
    Beginning crash recovery of 1 threads
     parallel recovery started with 32 processes
    Started redo scan
    Mon Nov 16 14:12:45 2015
    NOTE: dependency between database xifenfei and diskgroup resource ora.DATA.dg is established
    Errors in file d:\oracle\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_ora_5672.trc  (incident=937262):
    ORA-00353: 日志损坏接近块 12509 更改 14199034494312 时间 02/05/2015 03:09:12
    ORA-00312: 联机日志 2 线程 1: '+DATA/xifenfei/onlinelog/group_2.265.764288315'
    ORA-00312: 联机日志 2 线程 1: '+DATA/xifenfei/onlinelog/group_2.264.764288315'
    Incident details in: d:\oracle\diag\rdbms\xifenfei\xifenfei\incident\incdir_937262\xifenfei_ora_5672_i937262.trc
    Media Recovery failed with error 399
    ORA-355 signalled during: ALTER DATABASE RECOVER  DATABASE  ...
    

    可以确定存储恢复的redo有问题(ORA-00353,ORA-00312),数据库无法直接打开

    使用参数_allow_resetlogs_corruption屏蔽redo异常resetlogs库

    Mon Nov 16 15:26:41 2015
    alter database open resetlogs
    Mon Nov 16 15:26:41 2015
    Starting background process ASMB
    Mon Nov 16 15:26:41 2015
    ASMB started with pid=25, OS id=6612 
    Starting background process RBAL
    Mon Nov 16 15:26:41 2015
    RBAL started with pid=26, OS id=6940 
    NOTE: initiating MARK startup 
    Starting background process MARK
    Mon Nov 16 15:26:41 2015
    MARK started with pid=27, OS id=1720 
    NOTE: MARK has subscribed 
    NOTE: Loaded library: System 
    SUCCESS: diskgroup DATA was mounted
    RESETLOGS is being done without consistancy checks. This may result
    in a corrupted database. The database should be recreated.
    Mon Nov 16 15:26:44 2015
    NOTE: dependency between database xifenfei and diskgroup resource ora.DATA.dg is established
    Archived Log entry 1 added for thread 1 sequence 1390429 ID 0xf6320db5 dest 1:
    Archived Log entry 2 added for thread 1 sequence 1390427 ID 0xf6320db5 dest 1:
    ARCH: Log corruption near block 16350 change 14199035261082 time ?
    CORRUPTION DETECTED: thread 1 sequence 1390428 log 3 at block 16350. Arch found corrupt blocks
    Errors in file d:\oracle\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_ora_8132.trc  (incident=951271):
    ORA-00353: 日志损坏接近块 16350 更改 14199035261082 时间 02/05/2015 03:12:49
    ORA-00312: 联机日志 3 线程 1: '+DATA/xifenfei/onlinelog/group_3.267.764288315'
    ORA-00312: 联机日志 3 线程 1: '+DATA/xifenfei/onlinelog/group_3.266.764288315'
    Incident details in: d:\oracle\diag\rdbms\xifenfei\xifenfei\incident\incdir_951271\xifenfei_ora_8132_i951271.trc
    Errors in file d:\oracle\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_ora_8132.trc:
    ORA-00354: 损坏重做日志块标头
    ORA-00353: 日志损坏接近块 16350 更改 14199035261082 时间 02/05/2015 03:12:49
    ORA-00312: 联机日志 3 线程 1: '+DATA/xifenfei/onlinelog/group_3.267.764288315'
    ORA-00312: 联机日志 3 线程 1: '+DATA/xifenfei/onlinelog/group_3.266.764288315'
    ARCH: All Archive destinations made inactive due to error 354
    Committing creation of archivelog '+DATA/xifenfei/archivelog/2015_11_16/thread_1_seq_1390428.276.895937207' (error 354)
    Deleted Oracle managed file +DATA/xifenfei/archivelog/2015_11_16/thread_1_seq_1390428.276.895937207
    ******************************************************
    Detected premature EOF of log 3 at block 16350; re-trying archival
    ******************************************************
    Mon Nov 16 15:26:49 2015
    Sweep [inc][951271]: completed
    Mon Nov 16 15:26:49 2015
    Trace dumping is performing id=[cdmp_20151116152649]
    Archived Log entry 3 added for thread 1 sequence 1390428 ID 0xf6320db5 dest 1:
    RESETLOGS after incomplete recovery UNTIL CHANGE 14199033899179
    Resetting resetlogs activation ID 4130475445 (0xf6320db5)
    Errors in file d:\oracle\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_m001_800.trc  (incident=951319):
    ORA-00353: log corruption near block 2270 change 14199035131016 time 02/05/2015 03:12:58
    ORA-00334: archived log: '+DATA/xifenfei/archivelog/2015_11_16/thread_1_seq_1390428.276.895937209'
    Incident details in: d:\oracle\diag\rdbms\xifenfei\xifenfei\incident\incdir_951319\xifenfei_m001_800_i951319.trc
    Errors in file d:\oracle\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_m001_800.trc  (incident=951320):
    ORA-00355: change numbers out of order
    ORA-00353: log corruption near block 2270 change 14199035131016 time 02/05/2015 03:12:58
    ORA-00334: archived log: '+DATA/xifenfei/archivelog/2015_11_16/thread_1_seq_1390428.276.895937209'
    Incident details in: d:\oracle\diag\rdbms\xifenfei\xifenfei\incident\incdir_951320\xifenfei_m001_800_i951320.trc
    Trace dumping is performing id=[cdmp_20151116152651]
    Mon Nov 16 15:26:51 2015
    Sweep [inc][951320]: completed
    Sweep [inc][951319]: completed
    Sweep [inc2][951320]: completed
    Sweep [inc2][951319]: completed
    Sweep [inc2][951271]: completed
    Trace dumping is performing id=[cdmp_20151116152653]
    Checker run found 1 new persistent data failures
    Mon Nov 16 15:26:53 2015
    Setting recovery target incarnation to 2
    Mon Nov 16 15:26:54 2015
    Assigning activation ID 4262085362 (0xfe0a42f2)
    LGWR: STARTING ARCH PROCESSES
    Mon Nov 16 15:26:54 2015
    ARC0 started with pid=29, OS id=2896 
    ARC0: Archival started
    LGWR: STARTING ARCH PROCESSES COMPLETE
    ARC0: STARTING ARCH PROCESSES
    Mon Nov 16 15:26:55 2015
    ARC1 started with pid=30, OS id=1748 
    Mon Nov 16 15:26:55 2015
    ARC2 started with pid=31, OS id=1920 
    ARC1: Archival started
    ARC1: Becoming the 'no FAL' ARCH
    ARC1: Becoming the 'no SRL' ARCH
    Thread 1 opened at log sequence 1
      Current log# 1 seq# 1 mem# 0: +DATA/xifenfei/onlinelog/group_1.262.764288315
      Current log# 1 seq# 1 mem# 1: +DATA/xifenfei/onlinelog/group_1.263.764288315
    Successful open of redo thread 1
    MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
    Mon Nov 16 15:26:55 2015
    SMON: enabling cache recovery
    Mon Nov 16 15:26:55 2015
    ARC3 started with pid=32, OS id=7236 
    ARC2: Archival started
    ARC3: Archival started
    ARC0: STARTING ARCH PROCESSES COMPLETE
    ARC0: Becoming the heartbeat ARCH
    Errors in file d:\oracle\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_ora_8132.trc  (incident=951272):
    ORA-00600: 内部错误代码, 参数: [4155], [], [], [], [], [], [], [], [], [], [], []
    Incident details in: d:\oracle\diag\rdbms\xifenfei\xifenfei\incident\incdir_951272\xifenfei_ora_8132_i951272.trc
    Errors in file d:\oracle\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_ora_8132.trc:
    ORA-00600: 内部错误代码, 参数: [4155], [], [], [], [], [], [], [], [], [], [], []
    Errors in file d:\oracle\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_ora_8132.trc:
    ORA-00600: 内部错误代码, 参数: [4155], [], [], [], [], [], [], [], [], [], [], []
    Error 600 happened during db open, shutting down database
    Trace dumping is performing id=[cdmp_20151116152658]
    USER (ospid: 8132): terminating the instance due to error 600
    Mon Nov 16 15:27:07 2015
    Instance terminated by USER, pid = 8132
    ORA-1092 signalled during: alter database open resetlogs...
    

    在resetlogs过程中由于遭遇了ORA-600[4155]导致数据库无法正常打开.

    分析相关trace文件

    *** 2015-11-16 15:26:56.921
    *** SESSION ID:(145.3) 2015-11-16 15:26:56.921
    *** CLIENT ID:() 2015-11-16 15:26:56.921
    *** SERVICE NAME:(SYS$USERS) 2015-11-16 15:26:56.921
    *** MODULE NAME:(sqlplus.exe) 2015-11-16 15:26:56.921
    *** ACTION NAME:() 2015-11-16 15:26:56.921
     
    Dump continued from file: d:\oracle\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_ora_8132.trc
    ORA-00600: 内部错误代码, 参数: [4155], [], [], [], [], [], [], [], [], [], [], []
    
    ========= Dump for incident 951272 (ORA 600 [4155]) ========
    ----- Beginning of Customized Incident Dump(s) -----
    XID passed in =xid: 0x000b.001.00fcfb45
    XID from Undo block = xid: 0x000b.009.00fcc561
    ----- End of Customized Incident Dump(s) -----
    
    *** 2015-11-16 15:26:57.203
    dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
    ----- Current SQL Statement for this session (sql_id=7j16t46cacjt9) -----
    alter database open resetlogs
    
    ----- Call Stack Trace -----
    calling              call     entry                argument values in hex      
    location             type     point                (? means dubious value)     
    -------------------- -------- -------------------- ----------------------------
    ksedst1()+129        CALL???  skdstdst()           009233DA2 000000000 000000000
                                                       000000000
    ksedst()+69          CALL???  ksedst1()            000000002 000000000 006F605E0
                                                       000000000
    dbkedDefDump()+4536  CALL???  ksedst()             000000287 000000000 000000000
                                                       000000000
    ksedmp()+43          CALL???  dbkedDefDump()       000000003 000000002 000000000
                                                       000468E71
    ksfdmp()+87          CALL???  ksedmp()             000000000 000000000 000000000
                                                       000000000
    dbgexPhaseII()+1819  CALL???  ksfdmp()             000000000 000000000 000000000
                                                       000000000
    dbgexExplicitEndInc  CALL???  dbgexPhaseII()       000000000 000000000 000000000
    ()+755                                             000000000
    dbgeEndDDEInvocatio  CALL???  dbgexExplicitEndInc  00CFB0570 00CFB7540 01F9012D8
    nImpl()+748                   ()                   01F901300
    dbgeEndDDEInvocatio  CALL???  dbgeEndDDEInvocatio  00CFB0570 00CFB7540 01F903130
    n()+47                        nImpl()              00000000A
    ktugce()+610         CALL???  dbgeEndDDEInvocatio  006E24498 01F8FF91E 00000002E
                                  n()                  035636366
    ktdgti()+609         CALL???  ktugce()             000000000 000001F68 000001F68
                                                       000000001
    k2vGetCollectingInf  CALL???  ktdgti()             000000000 008A34E71 01F902470
    o()+324                                            000000018
    k2vcbk()+182         CALL???  k2vGetCollectingInf  000000000 000000000 000000000
                                  o()                  000000008
    kturRecoverTxn()+82  CALL???  k2vcbk()             000000000 FCFB450000000B
    67                                                 000610001 7FFDF055860
    kturRecoverUndoSegm  CALL???  kturRecoverTxn()     01F903448 009460001 000000000
    ent()+1371                                         00147AE14
    ktuiup()+1520        CALL???  kturRecoverUndoSegm  7FF0000000B 01F903628
                                  ent()                000000000 00147AE14
    ktuini()+80          CALL???  ktuiup()             000000001 00000000E 01F9038A0
                                                       003CB3D06
    adbdrv()+44263       CALL???  ktuini()             000000000 01F9090C8 000000000
                                                       000000000
    opiexe()+20842       CALL???  adbdrv()             000000023 000000003
                                                       7FF00000102 000000000
    opiosq0()+5129       CALL???  opiexe()+16981       000000004 000000000 01F90A8E0
                                                       009361AB3
    kpooprx()+357        CALL???  opiosq0()            000000003 00000000E 01F90ABB0
                                                       0000000A4
    kpoal8()+940         CALL???  kpooprx()            000020C80 01E65CCD0 00CE91AD8
                                                       000000001
    opiodr()+1662        CALL???  kpoal8()             00000005E 00000001C 01F90E120
                                                       00A4EF224
    ttcpip()+1325        CALL???  opiodr()             480000000000005E
                                                       49004D000000001C 01F90E120
                                                       4100200000000000
    opitsk()+2040        CALL???  ttcpip()             01E735200 000000000 000000000
                                                       000000000
    opiino()+1258        CALL???  opitsk()             00000001E 000000000 000000000
                                                       01F90FA18
    opiodr()+1662        CALL???  opiino()             00000003C 000000004 01F90FAD0
                                                       000000000
    opidrv()+864         CALL???  opiodr()             00000003C 000000004 01F90FAD0
                                                       6F5C3A6400000000
    sou2o()+98           CALL???  opidrv()+150         00000003C 000000004 01F90FAD0
                                                       000000000
    opimai_real()+158    CALL???  sou2o()              01F90FB00 01F90FBC4
                                                       F0010000B07DF 1009C002B0019
    opimai()+191         CALL???  opimai_real()        00000001B 01F90FC88 000000036
                                                       000000000
    OracleThreadStart()  CALL???  opimai()             01F90FE90 01F60FF38 000000002
    +724                                               01F90FC88
    0000000078D3B6DA     CALL???  OracleThreadStart()  01F60FF38 000000000 000000000
                                                       01F90FFA8
     
    
    --------------------- Binary Stack Dump ---------------------
    
    UNDO BLK:  
    xid: 0x000b.009.00fcc561  seq: 0x513d cnt: 0x2f  irb: 0x2f  icl: 0x0   flg: 0x0000
     
     Rec Offset      Rec Offset      Rec Offset      Rec Offset      Rec Offset
    ---------------------------------------------------------------------------
    0x01 0x1f64     0x02 0x1ee4     0x03 0x1e30     0x04 0x1d7c     0x05 0x1cc8     
    0x06 0x1c14     0x07 0x1b60     0x08 0x1aac     0x09 0x19f8     0x0a 0x1944     
    0x0b 0x1890     0x0c 0x17dc     0x0d 0x1728     0x0e 0x1674     0x0f 0x15c0     
    0x10 0x150c     0x11 0x1458     0x12 0x13a4     0x13 0x12f0     0x14 0x123c     
    0x15 0x1188     0x16 0x10d4     0x17 0x1050     0x18 0x0fd0     0x19 0x0f1c     
    0x1a 0x0e98     0x1b 0x0de4     0x1c 0x0d30     0x1d 0x0c7c     0x1e 0x0bc8     
    0x1f 0x0b44     0x20 0x0ac4     0x21 0x0a10     0x22 0x095c     0x23 0x08a8     
    0x24 0x07f4     0x25 0x0770     0x26 0x06f0     0x27 0x063c     0x28 0x0588     
    0x29 0x04d4     0x2a 0x0420     0x2b 0x039c     0x2c 0x031c     0x2d 0x0298     
    0x2e 0x0218     0x2f 0x0164     
    

    ORA-600 4155是由于在恢复过程中发现事务的id和undo segment中的事务表中id不匹配从而出现此类问题.针对此问题,可以通过bbed修改事务表记录,或者直接丢弃该事务,从而绕过该错误.

    处理掉异常事务id后,继续open库

    Mon Nov 16 16:16:07 2015
    ALTER DATABASE OPEN
    Beginning crash recovery of 1 threads
     parallel recovery started with 32 processes
    Started redo scan
    Completed redo scan
     read 8 KB redo, 0 data blocks need recovery
    Started redo application at
     Thread 1: logseq 1, block 2, scn 14199161880578
    Recovery of Online Redo Log: Thread 1 Group 1 Seq 1 Reading mem 0
      Mem# 0: +DATA/xifenfei/onlinelog/group_1.262.764288315
      Mem# 1: +DATA/xifenfei/onlinelog/group_1.263.764288315
    Completed redo application of 0.00MB
    Completed crash recovery at
     Thread 1: logseq 1, block 19, scn 14199161900601
     0 data blocks read, 0 data blocks written, 8 redo k-bytes read
    Current SCN is not changed: _minimum_giga_scn (scn 14199161880576) is too small
    Mon Nov 16 16:16:08 2015
    LGWR: STARTING ARCH PROCESSES
    Mon Nov 16 16:16:08 2015
    ARC0 started with pid=62, OS id=7612 
    ARC0: Archival started
    LGWR: STARTING ARCH PROCESSES COMPLETE
    ARC0: STARTING ARCH PROCESSES
    Mon Nov 16 16:16:09 2015
    ARC1 started with pid=63, OS id=5620 
    Mon Nov 16 16:16:09 2015
    ARC2 started with pid=64, OS id=7308 
    ARC1: Archival started
    ARC1: Becoming the 'no FAL' ARCH
    ARC1: Becoming the 'no SRL' ARCH
    Thread 1 advanced to log sequence 2 (thread open)
    Thread 1 opened at log sequence 2
      Current log# 2 seq# 2 mem# 0: +DATA/xifenfei/onlinelog/group_2.264.764288315
      Current log# 2 seq# 2 mem# 1: +DATA/xifenfei/onlinelog/group_2.265.764288315
    Successful open of redo thread 1
    Archived Log entry 1 added for thread 1 sequence 1 ID 0xfe09f6df dest 1:
    MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
    Mon Nov 16 16:16:10 2015
    SMON: enabling cache recovery
    Dictionary check beginning
    Tablespace 'TEMP' #3 found in data dictionary,
    but not in the controlfile. Adding to controlfile.
    Dictionary check complete
    Verifying file header compatibility for 11g tablespace encryption..
    Verifying 11g file header compatibility for tablespace encryption completed
    SMON: enabling tx recovery
    *********************************************************************
    WARNING: The following temporary tablespaces contain no files.
             This condition can occur when a backup controlfile has
    Mon Nov 16 16:16:09 2015
    ARC3 started with pid=65, OS id=7064 
             been restored.  It may be necessary to add files to these
    ARC2: Archival started
             tablespaces.  That can be done using the SQL statement:
    ARC3: Archival started
     
    ARC0: STARTING ARCH PROCESSES COMPLETE
             ALTER TABLESPACE <tablespace_name> ADD TEMPFILE
    ARC0: Becoming the heartbeat ARCH
     
             Alternatively, if these temporary tablespaces are no longer
             needed, then they can be dropped.
               Empty temporary tablespace: TEMP
    *********************************************************************
    Database Characterset is ZHS16GBK
    No Resource Manager plan active
    **********************************************************
    WARNING: Files may exists in db_recovery_file_dest
    that are not known to the database. Use the RMAN command
    CATALOG RECOVERY AREA to re-catalog any such files.
    If files cannot be cataloged, then manually delete them
    using OS command.
    One of the following events caused this:
    1. A backup controlfile was restored.
    2. A standby controlfile was restored.
    3. The controlfile was re-created.
    4. db_recovery_file_dest had previously been enabled and
       then disabled.
    **********************************************************
    replication_dependency_tracking turned off (no async multimaster replication found)
    WARNING: AQ_TM_PROCESSES is set to 0. System operation                     might be adversely affected.
    LOGSTDBY: Validating controlfile with logical metadata
    LOGSTDBY: Validation complete
    Completed: ALTER DATABASE OPEN
    

    因为该数据库是经过了存储恢复,除system之外,其他文件也有大量坏块,因为恢复过程相对比较麻烦,除了上面列出来的ORA-00353,ORA-00312,ORA-600 4155等各种错误之外,还有大量的ORA-01578,ORA-01110.由于11G比较常见的ORA-00283,ORA-16433问题。

    • ORA-600[2037]与ORA-07445[kcbs_dump_adv_state]错误
    • ORA-600 [LibraryCacheNotEmptyOnClose] on shutdown
    • dbca创建数据库报ORA-00443
    • 一起ORA-600 3020故障恢复的大体思路
    • 分享一例由于主库逻辑坏块导致dataguard容灾失效
    • truncate table强制终止导致ORA-00600[ktspfundo-2]
    • 通过多次resetlogs规避类似ORA-01248: file N was created in the future of incomplete recovery错误
    • 某医院存储掉线导致Oracle数据库故障恢复
    • 重建控制文件丢失undo异常恢复—ORA-600 25025模拟与恢复
    • ORA-600 k2vcbk_2 故障恢复
    • ORA-00600[kgscLogOff-notempty]
    • 处理fast_recovery_area无剩余空间案例


沪ICP备19023445号-2号
友情链接