IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    某客户一套15TB的数据库恢复小记

    roger发表于 2016-03-02 15:36:19
    love 0

    本站文章除注明转载外,均为本站原创: 转载自love wife & love life —Roger 的Oracle技术博客

    本文链接地址: 某客户一套15TB的数据库恢复小记

    该客户数据库在春节之前就出现故障,后面经过多人尝试恢复后,均为打开数据库;数据库在open时报如下错误:

    Wed Jan 13 17:03:25 2016
    ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, Query Duration=1452675805 sec, SCN: 0x0d6a.46c6524f):
    Wed Jan 13 17:03:25 2016
    select ctime, mtime, stime from obj$ where obj# = :1
    Wed Jan 13 17:03:25 2016
    Errors in file /u01/app/oracle/admin/xxxx/udump/xxxx1_ora_18274.trc:
    ORA-00704: bootstrap process failure
    ORA-00704: bootstrap process failure
    ORA-00604: error occurred at recursive SQL level 1
    ORA-01555: snapshot too old: rollback segment number 20 with name "_SYSSMU20$" too small
    Error 704 happened during db open, shutting down database
    USER: terminating instance due to error 704
    Instance terminated by USER, pid = 18274
    ORA-1092 signalled during: alter database open resetlogs...
    Wed Jan 13 17:06:34 2016

    该错误其实很场景,也恢复过太多这种情况了,这里不再过多描述。不过这里让我感觉很诧异的是该SQL的Query Duration 太大了。
    根据经验,这种情况下可以直接推进SCN。可是当我们进行如下操作,发现不起作用:

    alter session set events ’10015 trace name adjust_scn level 13740′;

    进一步通过10046 trace跟踪发现该sql访问了如下几个block:

    PARSING IN CURSOR #5 len=52 dep=1 uid=0 oct=3 lid=0 tim=1422682994194207 hv=429618617 ad='395fa870'
    select ctime, mtime, stime from obj$ where obj# = :1
    END OF STMT
    PARSE #5:c=0,e=225,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1422682994194205
    BINDS #5:
    kkscoacd
     Bind#0
      oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
      oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
      kxsbbbfp=2b7694a4aea8  bln=22  avl=02  flg=05
      value=20
    EXEC #5:c=0,e=398,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1422682994194653
    WAIT #5: nam='db file sequential read' ela= 20378 file#=1 block#=218 blocks=1 obj#=-1 tim=1422682994215120
    WAIT #5: nam='db file sequential read' ela= 480   file#=1 block#=219 blocks=1 obj#=-1 tim=1422682994215712
    WAIT #5: nam='db file sequential read' ela= 18990 file#=1 block#=122 blocks=1 obj#=-1 tim=1422682994234841
    。。。。。。。
    EXEC #6:c=0,e=141,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=3,tim=1422682994267351
    FETCH #6:c=0,e=34,p=0,cr=2,cu=0,mis=0,r=1,dep=2,og=3,tim=1422682994267413
    STAT #6 id=1 cnt=1 pid=0 pos=1 obj=15 op='TABLE ACCESS BY INDEX ROWID UNDO$ (cr=2 pr=0 pw=0 time=28 us)'
    STAT #6 id=2 cnt=1 pid=1 pos=1 obj=34 op='INDEX UNIQUE SCAN I_UNDO1 (cr=1 pr=0 pw=0 time=16 us)'
    WAIT #5: nam='db file sequential read' ela= 12312 file#=7 block#=4993 blocks=1 obj#=-1 tim=1422682994279882
    WAIT #5: nam='db file sequential read' ela= 18776 file#=7 block#=4965 blocks=1 obj#=-1 tim=1422682994298789
    WAIT #5: nam='db file sequential read' ela= 13157 file#=7 block#=4801 blocks=1 obj#=-1 tim=1422682994312081
    WAIT #5: nam='db file sequential read' ela= 12519 file#=7 block#=4954 blocks=1 obj#=-1 tim=1422682994324726
    WAIT #5: nam='db file sequential read' ela= 410 file#=7 block#=4952 blocks=1 obj#=-1 tim=1422682994325259
    WAIT #5: nam='db file sequential read' ela= 5447 file#=7 block#=4778 blocks=1 obj#=-1 tim=1422682994330830
    WAIT #5: nam='db file sequential read' ela= 12349 file#=7 block#=5184 blocks=1 obj#=-1 tim=1422682994343291
    WAIT #5: nam='db file sequential read' ela= 11874 file#=5 block#=8645 blocks=1 obj#=-1 tim=1422682994355283
    WAIT #5: nam='db file sequential read' ela= 4925 file#=5 block#=8595 blocks=1 obj#=-1 tim=1422682994360323
    FETCH #5:c=4999,e=165865,p=15,cr=18,cu=0,mis=0,r=0,dep=1,og=4,tim=1422682994360535
    ORA-00704: bootstrap process failure
    ORA-00704: bootstrap process failure
    ORA-00604: error occurred at recursive SQL level 1
    ORA-01555: snapshot too old: rollback segment number 20 with name "_SYSSMU20$" too small

    通过分别dump上述block,我们发现file 1 block 122 有点小问题,如下:

     seg/obj: 0x12  csc: 0xd6c.1abf4d18  itc: 1  flg: -  typ: 1 - DATA
         fsl: 0  fnx: 0x0 ver: 0x01
    
     Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
    0x01   0x003f.02e.001cc47d  0x01c012f4.8562.29  --U-    1  fsc 0x0000.1abf4d19
    
    data_block_dump,data header at 0x8b3da44
    ===============
    tsiz: 0x1fb8
    hsiz: 0xea
    pbl: 0x08b3da44
    bdba: 0x0040007a
         76543210
    flag=--------
    ntab=1
    nrow=108

    通过脚本将该block copy到文件系统,bbed进行修改之后,再copy回asm diskgroup。
    接着再次进行scn的推进,可以很顺利打开数据库。

    这里需要注意的是,虽然打开了数据库,但是后面还有很多善后处理工作,比如我们dbv发现undo有坏块,那么就需要重建undo;同时检查alert log是否伴随其他的错误。

    其次,对于强制open的数据库,我们建议通过mos脚本检查下数据字典是否存在异常;如果数据字典有明显异常,那么通常是需要通过逻辑导出来重建数据库的;否则一般不需要重建库。

    对于是否需要重建库,我认为没有定论,安全起见是通常是建议重建数据库;或者数据库很小的时候也可以考虑重建;否则通过检查数据库告警,或者数据库运行一段时间没有其他异常,那么完全可以不重建数据库。

    Related posts:

    1. 非归档恢复的一个模拟例子
    2. 某客户的5TB RAC 恢复小记
    3. 11.2.0.4 ASM RAC 恢复一个例子
    4. 某大学的数据库恢复过程
    5. 某网友的数据库TB 数据库恢复


沪ICP备19023445号-2号
友情链接