IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    ORA-01171: datafile N going offline due to error advancing checkpoint

    惜分飞发表于 2025-05-30 14:31:29
    love 0

    联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

    标题:ORA-01171: datafile N going offline due to error advancing checkpoint

    作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

    最近接到一个客户有一个数据文件offline的恢复咨询,通过分析日志,当时是由于在启动的时候数据文件被占用导致后续数据库open之后,该文件被强制offline掉

    Fri May 16 20:01:05 2025
    Database mounted in Exclusive Mode
    Completed: ALTER DATABASE   MOUNT
    Fri May 16 20:01:05 2025
    ALTER DATABASE OPEN
    Fri May 16 20:01:06 2025
    LGWR: STARTING ARCH PROCESSES
    ARC0 started with pid=70, OS id=4628
    Fri May 16 20:01:06 2025
    ARC0: Archival started
    ARC1 started with pid=74, OS id=4840
    Fri May 16 20:01:06 2025
    ARC1: Archival started
    LGWR: STARTING ARCH PROCESSES COMPLETE
    Fri May 16 20:01:06 2025
    Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl_lgwr_4080.trc:
    ORA-01110: data file 14: 'D:\ORADATA\XIFENFEI105_DAT_1.DBF'
    ORA-01114: IO error writing block to file 14 (block # 1)
    ORA-27041: unable to open file
    OSD-04002: 无法打开文件
    O/S-Error: (OS 32) 另一个程序正在使用此文件,进程无法访问。
    
    Thread 1 opened at log sequence 172421
      Current log# 1 seq# 172421 mem# 0: D:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\REDO01.LOG
    Fri May 16 20:01:06 2025
    ARC1: STARTING ARCH PROCESSES
    Fri May 16 20:01:06 2025
    Successful open of redo thread 1
    Fri May 16 20:01:06 2025
    ARC0: Becoming the 'no FAL' ARCH
    ARC0: Becoming the 'no SRL' ARCH
    Fri May 16 20:01:06 2025
    ARC2: Archival started
    ARC1: STARTING ARCH PROCESSES COMPLETE
    ARC2 started with pid=78, OS id=4056
    Fri May 16 20:01:06 2025
    ARC1: Becoming the heartbeat ARCH
    Fri May 16 20:01:06 2025
    MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
    Fri May 16 20:01:06 2025
    SMON: enabling cache recovery
    Fri May 16 20:01:07 2025
    Successfully onlined Undo Tablespace 1.
    Fri May 16 20:01:07 2025
    SMON: enabling tx recovery
    Fri May 16 20:01:08 2025
    Database Characterset is ZHS16GBK
    replication_dependency_tracking turned off (no async multimaster replication found)
    Starting background process QMNC
    QMNC started with pid=86, OS id=4492
    Fri May 16 20:01:12 2025
    db_recovery_file_dest_size of 51200 MB is 1.97% used. This is a
    user-specified limit on the amount of space that will be used by this
    database for recovery-related files, and does not reflect the amount of
    space available in the underlying filesystem or ASM diskgroup.
    Fri May 16 20:01:13 2025
    Completed: ALTER DATABASE OPEN
    Fri May 16 20:06:44 2025
    Restarting dead background process MMON
    MMON started with pid=98, OS id=4232
    Fri May 16 20:07:06 2025
    Shutting down archive processes
    Fri May 16 20:07:11 2025
    ARCH shutting down
    ARC2: Archival stopped
    Fri May 16 20:10:32 2025
    Thread 1 advanced to log sequence 172422
      Current log# 2 seq# 172422 mem# 0: D:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\REDO02.LOG
    Fri May 16 20:15:33 2025
    Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl_ckpt_2496.trc:
    ORA-01171: datafile 14 going offline due to error advancing checkpoint
    ORA-01122: database file 14 failed verification check
    ORA-01110: data file 14: 'D:\ORADATA\XIFENFEI105_DAT_1.DBF'
    ORA-01208: data file is an old version - not accessing current version
    
    Fri May 16 20:23:09 2025
    Starting background process EMN0
    EMN0 started with pid=82, OS id=2660
    

    通过dbv检查报错文件,确认被offline文件本身正常
    dbv


    本身这个故障相对比较简单,只要归档存在直接recover datafile,然后online即可,但是由于备份软件定时工作,导致对应的归档被备份走

    Fri May 16 21:55:10 2025
    Control autobackup written to SBT_TAPE device
    	comment 'API Version 2.0,MMS Version 10.0.0.116',
    	media 'V_6746190_6959024'
    	handle 'c-1300253653-20250516-00'
    Fri May 16 21:56:03 2025
    Thread 1 cannot allocate new log, sequence 172423
    Private strand flush not complete
      Current log# 2 seq# 172422 mem# 0: D:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\REDO02.LOG
    

    而且被异常的数据文件不是核心业务文件,导致客户没有及时发现,等到发现之时尝试recover datafile,提示缺少归档

    Wed May 28 17:26:01 2025
    alter database recover datafile list clear
    Wed May 28 17:26:01 2025
    Completed: alter database recover datafile list clear
    Wed May 28 17:26:01 2025
    alter database recover if needed
     datafile 14
    
    Media Recovery Start
     parallel recovery started with 16 processes
    ORA-279 signalled during: alter database recover if needed
     datafile 14
    ...
    Wed May 28 17:26:11 2025
    alter database recover cancel
    Wed May 28 17:26:13 2025
    Media Recovery Canceled
    Completed: alter database recover cancel
    Wed May 28 17:38:58 2025
    ALTER DATABASE RECOVER  datafile 'D:\ORADATA\XIFENFEI105_DAT_1.DBF'  
    Wed May 28 17:38:58 2025
    Media Recovery Start
     parallel recovery started with 16 processes
    ORA-279 signalled during: ALTER DATABASE RECOVER  datafile 'D:\ORADATA\XIFENFEI105_DAT_1.DBF'  ...
    Wed May 28 18:26:37 2025
    ALTER DATABASE RECOVER    CONTINUE DEFAULT  
    Wed May 28 18:26:38 2025
    Media Recovery Log D:\ORACLE\PRODUCT\10.2.0\FLASH_RECOVERY_AREA\ORCL\ARCHIVELOG\2025_05_28\O1_MF_1_172421_%U_.ARC
    Errors with log D:\ORACLE\PRODUCT\10.2.0\FLASH_RECOVERY_AREA\ORCL\ARCHIVELOG\2025_05_28\O1_MF_1_172421_%U_.ARC
    ORA-308 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...
    Wed May 28 18:26:38 2025
    ALTER DATABASE RECOVER    CONTINUE DEFAULT  
    Wed May 28 18:26:38 2025
    Media Recovery Log D:\ORACLE\PRODUCT\10.2.0\FLASH_RECOVERY_AREA\ORCL\ARCHIVELOG\2025_05_28\O1_MF_1_172421_%U_.ARC
    Errors with log D:\ORACLE\PRODUCT\10.2.0\FLASH_RECOVERY_AREA\ORCL\ARCHIVELOG\2025_05_28\O1_MF_1_172421_%U_.ARC
    ORA-308 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...
    Wed May 28 18:26:38 2025
    ALTER DATABASE RECOVER CANCEL 
    Wed May 28 18:26:40 2025
    Media Recovery Canceled
    Completed: ALTER DATABASE RECOVER CANCEL 
    

    这个客户运气还不错,带库中的需要恢复的归档日志都还在,通过指定带库通道,直接recover datafile成功

    RUN {
      ALLOCATE CHANNEL ch1 DEVICE TYPE 'sbt_tape' 
      PARMS="BLKSIZE=262144,ENV=(CV_mmsApiVsn=2,CV_channelPar=ch1)";
      ALLOCATE CHANNEL ch2 DEVICE TYPE 'sbt_tape' 
      PARMS="BLKSIZE=262144,ENV=(CV_mmsApiVsn=2,CV_channelPar=ch2)";
     recover datafile 14;
    }
    

    rec
    ok


    至此完美解决该问题,通过这个case,的出来的经验有:
    1. 数据库重启之后,要检查数据库日志和查询数据库数据文件状态(主要防止一些不太常用的文件异常,不能及时发现)
    2. 需要需要数据库的基本情况,比如备份,容灾,asm磁盘组冗余,存储冗余,网络冗余等情况,这样出现问题好排查解决

    • 删除redo导致ORA-00313 ORA-00312故障处理
    • linux rm -rf 删除数据文件恢复
    • .pzpq扩展名勒索恢复
    • 不当使用_allow_resetlogs_corruption参数引起ORA-600 2662错误
    • pg误删除数据恢复(PostgreSQL delete数据恢复)
    • 可能是 tempdb 空间用尽或某个系统表不一致故障处理
    • fio测试io,导致磁盘文件系统损坏故障恢复
    • 中文环境显示AR8MSWIN1256(阿拉伯语字符集)
    • Oracle Recovery Tools工具一键解决ORA-00376 ORA-01110故障(文件offline)
    • 近1万个数据文件的恢复case
    • 19c非归档数据库断电导致ORA-00742故障恢复
    • Oracle Recovery Tools修复ORA-600 6101/kdxlin:psno out of range故障


沪ICP备19023445号-2号
友情链接