IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    硬件故障导致ORA-01242 ORA-01122等错误

    惜分飞发表于 2024-08-20 10:46:00
    love 0

    联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

    标题:硬件故障导致ORA-01242 ORA-01122等错误

    作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

    客户多个节点rac,早上反馈说有两个节点实例异常,需要分析原因,查看其中一个节点的数据库alert日志,发现是由于访问1399号文件异常报ORA-01242 ORA-01122等错误,导致实例crash

    Mon Aug 19 20:48:02 2024
    Read of datafile '+DATA/xifenfei_01-157.dbf' (fno 1399) header failed with ORA-01207
    Rereading datafile 1399 header failed with ORA-01207
    Errors in file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_ckpt_75582.trc:
    ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode
    ORA-01122: database file 1399 failed verification check
    ORA-01110: data file 1399: '+DATA/xifenfei_01-157.dbf'
    ORA-01207: file is more recent than control file - old control file
    Errors in file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_ckpt_75582.trc:
    ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode
    ORA-01122: database file 1399 failed verification check
    ORA-01110: data file 1399: '+DATA/xifenfei_01-157.dbf'
    ORA-01207: file is more recent than control file - old control file
    CKPT (ospid: 75582): terminating the instance due to error 1242
    Mon Aug 19 20:48:02 2024
    System state dump requested by (instance=6, osid=75582 (CKPT)), summary=[abnormal instance termination].
    System State dumped to trace file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_diag_75520.trc
    Termination issued to instance processes. Waiting for the processes to exit
    Mon Aug 19 20:48:13 2024
    ORA-1092 : opitsk aborting process
    

    继续分析日志发现集群尝试拉起该实例,遭遇ORA-01186,ORA-01122无法启动成功

    ALTER DATABASE OPEN /* db agent *//* {0:6:39} */
    Mon Aug 19 20:49:34 2024
    SUCCESS: diskgroup DATA was mounted
    Mon Aug 19 20:49:34 2024
    NOTE: dependency between database xff and diskgroup resource ora.DATA.dg is established
    Mon Aug 19 20:50:41 2024
    Picked broadcast on commit scheme to generate SCNs
    Mon Aug 19 20:50:42 2024
    Read of datafile '+DATA/xifenfei_01-157.dbf' (fno 1399) header failed with ORA-01207
    Rereading datafile 1399 header failed with ORA-01207
    Errors in file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_dbw0_29208.trc:
    ORA-01186: file 1399 failed verification tests
    ORA-01122: database file 1399 failed verification check
    ORA-01110: data file 1399: '+DATA/xifenfei_01-157.dbf'
    ORA-01207: file is more recent than control file - old control file
    File 1399 not verified due to error ORA-01122
    

    这个错误是数据库文件访问异常导致,根据经验,出现这种问题一般是由于底层异常导致,查看系统messages日志,发现有硬件磁盘报错

    Aug 19 20:41:58 xff6 fcoemon: FC_HOST_EVENT 6894 at 1724071318 secs on host1:code 65535=vendor_unique datalen 32 data=512
    Aug 19 20:41:58 xff6 kernel: sd 1:0:0:43: [sdas]  
    Aug 19 20:41:58 xff6 kernel: Sense Key : Recovered Error [current] 
    Aug 19 20:41:58 xff6 kernel: sd 1:0:0:43: [sdas]  
    Aug 19 20:41:58 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1
    Aug 19 20:42:03 xff6 kernel: sd 1:0:0:43: [sdas]  
    Aug 19 20:42:03 xff6 kernel: Sense Key : Recovered Error [current] 
    Aug 19 20:42:03 xff6 kernel: sd 1:0:0:43: [sdas]  
    Aug 19 20:42:03 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1
    Aug 19 20:42:03 xff6 fcoemon: FC_HOST_EVENT 6895 at 1724071323 secs on host1:code 65535=vendor_unique datalen 32 data=512
    Aug 19 20:42:07 xff6 fcoemon: FC_HOST_EVENT 6896 at 1724071327 secs on host1:code 65535=vendor_unique datalen 32 data=512
    Aug 19 20:42:07 xff6 kernel: sd 1:0:0:44: [sdat]  
    Aug 19 20:42:07 xff6 kernel: Sense Key : Recovered Error [current] 
    Aug 19 20:42:07 xff6 kernel: sd 1:0:0:44: [sdat]  
    Aug 19 20:42:07 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1
    Aug 19 20:42:12 xff6 fcoemon: FC_HOST_EVENT 6897 at 1724071332 secs on host1:code 65535=vendor_unique datalen 32 data=512
    Aug 19 20:42:12 xff6 kernel: sd 1:0:0:44: [sdat]  
    Aug 19 20:42:12 xff6 kernel: Sense Key : Recovered Error [current] 
    Aug 19 20:42:12 xff6 kernel: sd 1:0:0:44: [sdat]  
    Aug 19 20:42:12 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1
    Aug 19 20:42:25 xff6 fcoemon: FC_HOST_EVENT 6898 at 1724071345 secs on host1:code 65535=vendor_unique datalen 32 data=512
    Aug 19 20:42:25 xff6 kernel: sd 1:0:0:42: [sdar]  
    Aug 19 20:42:25 xff6 kernel: Sense Key : Recovered Error [current] 
    Aug 19 20:42:25 xff6 kernel: sd 1:0:0:42: [sdar]  
    Aug 19 20:42:25 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1
    Aug 19 20:42:41 xff6 fcoemon: FC_HOST_EVENT 6899 at 1724071361 secs on host1:code 65535=vendor_unique datalen 32 data=512
    Aug 19 20:42:41 xff6 kernel: sd 1:0:0:42: [sdar]  
    Aug 19 20:42:41 xff6 kernel: Sense Key : Recovered Error [current] 
    Aug 19 20:42:41 xff6 kernel: sd 1:0:0:42: [sdar]  
    Aug 19 20:42:41 xff6 kernel: <<vendor>> ASC=0xd0 ASCQ=0x6ASC=0xd0 ASCQ=0x6
    Aug 19 20:42:41 xff6 fcoemon: FC_HOST_EVENT 6900 at 1724071361 secs on host1:code 65535=vendor_unique datalen 32 data=512
    Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq]  
    Aug 19 20:42:41 xff6 kernel: Sense Key : Recovered Error [current] 
    Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq]  
    Aug 19 20:42:41 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
    Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq]  
    Aug 19 20:42:41 xff6 kernel: Sense Key : Recovered Error [current] 
    Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq]  
    Aug 19 20:42:41 xff6 kernel: <<vendor>> ASC=0xd0 ASCQ=0x6ASC=0xd0 ASCQ=0x6
    Aug 19 20:42:41 xff6 fcoemon: FC_HOST_EVENT 6901 at 1724071361 secs on host1:code 65535=vendor_unique datalen 32 data=512
    Aug 19 20:42:53 xff6 fcoemon: FC_HOST_EVENT 6902 at 1724071373 secs on host1:code 65535=vendor_unique datalen 32 data=512
    Aug 19 20:42:53 xff6 kernel: sd 1:0:0:41: [sdaq]  
    Aug 19 20:42:53 xff6 kernel: Sense Key : Recovered Error [current] 
    Aug 19 20:42:53 xff6 kernel: sd 1:0:0:41: [sdaq]  
    Aug 19 20:42:53 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
    Aug 19 20:43:03 xff6 kernel: sd 1:0:0:40: [sdap]  
    Aug 19 20:43:03 xff6 kernel: Sense Key : Recovered Error [current] 
    Aug 19 20:43:03 xff6 kernel: sd 1:0:0:40: [sdap]  
    Aug 19 20:43:03 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
    Aug 19 20:43:03 xff6 fcoemon: FC_HOST_EVENT 6903 at 1724071383 secs on host1:code 65535=vendor_unique datalen 32 data=512
    Aug 19 20:43:03 xff6 fcoemon: FC_HOST_EVENT 6904 at 1724071383 secs on host1:code 65535=vendor_unique datalen 32 data=512
    Aug 19 20:43:03 xff6 fcoemon: FC_HOST_EVENT 6905 at 1724071383 secs on host1:code 65535=vendor_unique datalen 32 data=512
    Aug 19 20:43:03 xff6 kernel: sd 1:0:0:43: [sdas]  
    Aug 19 20:43:03 xff6 kernel: Sense Key : Recovered Error [current] 
    Aug 19 20:43:03 xff6 kernel: sd 1:0:0:43: [sdas]  
    Aug 19 20:43:03 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
    Aug 19 20:49:26 xff6 kernel: scsi_verify_blk_ioctl: 683 callbacks suppressed
    

    客户进一步分析是由于昨天存储坏了一块盘,然后热备盘顶上了,但是不知道什么原因出现了文件访问异常,可能和当时的rebuild过程有关系.由于客户是rac环境,还有部分剩余节点运行正常,对于异常节点直接启动库成功
    20240820-182825


    节点写入数据报ORA-01187: cannot read from file because it failed verification tests错误
    ora-01187

    在所有节点通过执行ALTER SYSTEM CHECK DATAFILES,然后所有节点操作正常
    check_datafile

    • ORA-00333 故障恢复
    • messages日志报Error:emcp:emcp_pseudo_ctl_ioctl错误
    • 存储精简卷导致asm磁盘组异常
    • resetlogs失败故障恢复-ORA-01555
    • 注意系统bug—linux在E5、E5 V2、E7 V2 cpu之上的bug 765720
    • 记录一次ORA-01200完美恢复
    • ORA-01122 ORA-01208 故障处理
    • ORA-27154 ORA-27300 ORA-27301 ORA-27302故障处理
    • linux 7安装11.2.0.4集群注意避开特定kernal版本
    • ALERT: Disable Transparent HugePages on SLES11, RHEL6, OEL6 and UEK2 Kernels
    • 因未配置Hugepage会话数添增悲剧案例
    • Disable Transparent HugePages



沪ICP备19023445号-2号
友情链接