本站文章除注明转载外,均为本站原创: 转载自love wife & love life —Roger 的Oracle技术博客
刚到酒店,就接到客户电话说某数据库的一个数据文件报IO错误,通过vpn查看发现如下:
Sun Oct 30 23:19:27 BEIST 2016 Trace dumping is performing id=[cdmp_20161030231927] Sun Oct 30 23:19:27 BEIST 2016 Errors in file /oracle/app/10.2/admin/xxxx/bdump/xxxx2_smon_11863216.trc: ORA-00376: file 595 cannot be read at this time ORA-01110: data file 595: '/dev/rdata05vg_8g_48' Sun Oct 30 23:19:31 BEIST 2016 ORACLE Instance xxxx2 (pid = 22) - Error 376 encountered while recovering transaction (160, 1). Sun Oct 30 23:19:31 BEIST 2016 Errors in file /oracle/app/10.2/admin/xxxx/bdump/xxxx2_smon_11863216.trc: ORA-00376: file 595 cannot be read at this time ORA-01110: data file 595: '/dev/rdata05vg_8g_48' Sun Oct 30 23:19:32 BEIST 2016 Errors in file /oracle/app/10.2/admin/xxxx/bdump/xxxx2_smon_11863216.trc: ORA-00376: file 595 cannot be read at this time ORA-01110: data file 595: '/dev/rdata05vg_8g_48' Sun Oct 30 23:19:33 BEIST 2016
我们不难看出,报错文件无法读;实际上我登录2个节点ls -tr检查发现权限都是正确的,同时通过dbv 检查该文件发现也无坏块;因此我断定直接recover该文件即可。在recover时,发现居然报错nfs挂载有问题:
SQL> recover datafile 595; ORA-00279: change 15125505612642 generated at 10/30/2016 18:06:07 needed for thread 1 ORA-00289: suggestion : /arch2/1_108445_815539661.dbf ORA-00280: change 15125505612642 for thread 1 is in sequence #108445 Specify log: {<RET>=suggested | filename | AUTO | CANCEL} /arch1/1_108445_815539661.dbf ORA-00308: cannot open archived log '/arch1/1_108445_815539661.dbf' ORA-27054: NFS file system where the file is created or resides is not mounted with correct options Additional information: 6
由此可见该环境问题还不少。既然本地节点无法读取,为了短时间内恢复正常,直接将部分归档cp到相应的归档目录中,再次进行recover即可成功online该文件,如下:
Sun Oct 30 23:20:23 BEIST 2016 alter database datafile 595 online Sun Oct 30 23:20:23 BEIST 2016 Completed: alter database datafile 595 online Sun Oct 30 23:20:28 BEIST 2016 SMON: Parallel transaction recovery tried Sun Oct 30 23:23:02 BEIST 2016 Thread 2 advanced to log sequence 164113 (LGWR switch) Current log# 7 seq# 164113 mem# 0: /dev/rora_redo2_01
我们可以看到,确实顺利online文件了,还好是归档的数据库。那么我们继续来分析一下,为什么会出现这个问题呢? 进一步搜索alert log发现该在下午18:15分开始出现I欧错误:
Sun Oct 30 18:15:04 BEIST 2016 KCF: write/open error block=0x29790 online=1 file=595 /dev/rdata05vg_8g_48 error=27063 txt: 'IBM AIX RISC System/6000 Error: 5: I/O error Additional information: -1 Additional information: 131072' Automatic datafile offline due to write error on file 595: /dev/rdata05vg_8g_48 Sun Oct 30 18:15:28 BEIST 2016 Thread 2 advanced to log sequence 164100 (LGWR switch) Current log# 12 seq# 164100 mem# 0: /dev/rora_redo2_06 Sun Oct 30 18:15:28 BEIST 2016 Errors in file /oracle/app/10.2/admin/xxxx/udump/xxxx2_ora_28705020.trc: ORA-00372: file 595 cannot be modified at this time ORA-01110: data file 595: '/dev/rdata05vg_8g_48' ORA-00372: file 595 cannot be modified at this time ORA-01110: data file 595: '/dev/rdata05vg_8g_48'
我们可以看到,因为出现错误,Oracle自动将数据文件offline了,这其实是数据库的一直保护机制(没有相关隐含参数来控制)。到这里我怀疑多半是操作系统哪儿出问题了,果然errpt 查看发现在18:15出现了path error错误。
oracle:xxx$(/oracle)errpt IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION 3D32B80D 1030181516 P S topsvcs NIM thread blocked 3D32B80D 1030181516 P S topsvcs NIM thread blocked E86653C3 1030181516 P H LVDD I/O ERROR DETECTED BY LVM B6267342 1030181516 P H hdisk46 DISK OPERATION ERROR DE3B8540 1030181516 P H hdisk46 PATH HAS FAILED DE3B8540 1030181416 P H hdisk46 PATH HAS FAILED oracle:xxxx$(/oracle/app/10.2/admin/xxxx/bdump)errpt -aj DE3B8540 --------------------------------------------------------------------------- LABEL: SC_DISK_ERR7 IDENTIFIER: DE3B8540 Date/Time: Sun Oct 30 18:15:00 BEIST 2016 Sequence Number: 921 Machine Id: 00F7A4904C00 Node Id: sti50l02 Class: H Type: PERM WPAR: Global Resource Name: hdisk46 Resource Class: disk Resource Type: Hitachi ...... ...... Description PATH HAS FAILED Probable Causes ADAPTER HARDWARE OR CABLE DASD DEVICE Failure Causes UNDETERMINED Recommended Actions PERFORM PROBLEM DETERMINATION PROCEDURES CHECK PATH Detail Data PATH ID
不难看出,由于下午出现了相关错误,导致数据库出现了IO异常,oracle自动将文件offline了。然而我在刚刚lspath检查发现都ok,实际上也应该这样,否则recover datafile还会继续报IO错误。夜深了,到这里结束吧!简单记录一下!
PS:
1) NFS挂载的相关参数说明
Operating System | Mount options for Binaries ## | Mount options for Oracle Datafiles | Mount options for CRS Voting Disk and OCR |
Sun Solaris * | rw,bg,hard,nointr,rsize=32768, wsize=32768,proto=tcp,noac, vers=3,suid |
rw,bg,hard,nointr,rsize=32768, wsize=32768,proto=tcp,noac, forcedirectio, vers=3 |
rw,bg,hard,nointr,rsize=32768, wsize=32768,proto=tcp,vers=3, noac,forcedirectio |
AIX (5L) ** | rw,bg,hard,nointr,rsize=32768, wsize=32768,proto=tcp, vers=3,timeo=600 |
cio,rw,bg,hard,nointr,rsize=32768, wsize=32768,proto=tcp,noac, vers=3,timeo=600 |
cio,rw,bg,hard,intr,rsize=32768, wsize=32768,tcp,noac, vers=3,timeo=600 |
HPUX 11.23 *** – | rw,bg,vers=3,proto=tcp,noac, hard,nointr,timeo=600, rsize=32768,wsize=32768,suid |
rw,bg,vers=3,proto=tcp,noac, forcedirectio,hard,nointr,timeo=600, rsize=32768,wsize=32768 |
rw,bg,vers=3,proto=tcp,noac, forcedirectio,hard,nointr,timeo=600 ,rsize=32768,wsize=32768 |
Windows | Not Supported | Not Supported | Not Supported |
Linux x86 # **** |
rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp, vers=3, timeo=600, actimeo=0 |
rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,actimeo=0, vers=3,timeo=600 |
rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,noac,actimeo=0, vers=3,timeo=600 |
Linux x86-64 # **** |
rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,vers=3, timeo=600, actimeo=0 |
rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,actimeo=0, vers=3,timeo=600 |
rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,noac,vers=3, timeo=600,actimeo=0 |
Linux – Itanium | rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,vers=3, timeo=600, actimeo=0 |
rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,actimeo=0, vers=3,timeo=600 |
rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,noac,vers=3, timeo=600,actimeo=0 |
* NFS mount option “forcedirectio” is required on Solaris platforms when mounting the OCR/CRS files when using Oracle 10.1.0.4 or 10.2.0.2 or later (Oracle unpublished bug 4466428)
** AIX is only supported with NAS on AIX 5.3 TL04 and higher with Oracle 10.2.0.1 and later
*** NAS devices are only supported with HPUX 11.23 or higher ONLY
# These mount options are for Linux kernels 2.6 and above. For older kernels please check Note 279393.1
## The stated mount options for binaries are applicable only if the ORACLE HOME is shared.
2) 如果是Oracle 11.2.0.2版本开始,在没有安装Patch 7691270的情况之下,Oracle在遇到IO错误之后,会自动将数据库crash掉,其中有个相关的隐含参数:_datafile_write_errors_crash_instance
该参数在11.2.0.2版本之后默认为true,包含最新的11.2.0.4版本。
Enter value for par: datafile_write old 3: WHERE x.indx = y.indx AND x.ksppinm LIKE '%&par%' new 3: WHERE x.indx = y.indx AND x.ksppinm LIKE '%datafile_write%' NAME VALUE DESCRIB -------------------------------------------------- -------------------- ------------------------------------------------------------ _datafile_write_errors_crash_instance TRUE datafile write errors crash instance
Related posts: