IT博客汇 | 使用alter system events导致库crash

使用alter system events导致库crash

惜分飞发表于 2016-09-05 11:52:29

由于数据库导入大量数据的时候io等待比较高，新的存储无法直接挂过来，考虑使用nfs挂载过来，然后存放redo缓解io压力。
数据库版本信息

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for IBM/AIX RISC System/6000: Version 10.2.0.4.0 - Productio
NLSRTL Version 10.2.0.4.0 - Production

挂载参数(mount命令查看)

10.240.10.1 /top/data4/nfs   /back1            nfs3   
Aug 29 13:40 cio,rw,bg,hard,nointr,rsize=32768,wsize=32768,proto=tcp,noac,vers=3,timeo=600

尝试创建redo

SQL> alter database add logfile group 13 ('/back/newxff/redo13.log') size 2048m; 
alter database add logfile group 13 ('/back1/newxff/redo13.log') size 2048m
*
ERROR at line 1:
ORA-00301: error in adding log file '/back1/newxff/redo13.log' - file cannot be
created
ORA-27054: NFS file system where the file is created or resides is not mounted
with correct options
Additional information: 6

根据mos文档
ORA-27054 ERRORS WHEN RUNNING RMAN WITH NFS (文档 ID 387700.1)

SQL> Alter system set events '10298 trace name context forever,level 32';

System altered.

Mon Sep  5 10:10:18 2016
Thread 1 advanced to log sequence 109 (LGWR switch)
  Current log# 1 seq# 109 mem# 0: +DATA/xff/onlinelog/group_1.257.921671023
Mon Sep  5 10:12:19 2016
OS Pid: 160710 executed alter system set events '10298 trace name context forever,level 32'

创建redo成功

SQL> alter database add logfile group 13 ('/back1/newxff/redo13.log') size 2048m;

System altered.

Mon Sep  5 10:18:13 2016
alter database add logfile group 13 ('/back1/newxff/redo13.log') size 2048m
Mon Sep  5 10:18:43 2016
Completed: alter database add logfile group 13 ('/back1/newxff/redo13.log') size 2048m

数据库crash

Mon Sep  5 10:19:06 2016
Errors in file /opt/oracle/admin/xff/bdump/xff1_lgwr_246566.trc:
ORA-00313: open failed for members of log group 13 of thread 1
ORA-00312: online log 13 thread 1: '/back1/newxff/redo13.log'
ORA-27054: NFS file system where the file is created or resides is not mounted with correct options
Additional information: 6
Mon Sep  5 10:19:06 2016
Errors in file /opt/oracle/admin/xff/bdump/xff1_lgwr_246566.trc:
ORA-00313: open failed for members of log group 13 of thread 1
ORA-00312: online log 13 thread 1: '/back1/newxff/redo13.log'
ORA-27054: NFS file system where the file is created or resides is not mounted with correct options
Additional information: 6
Mon Sep  5 10:19:06 2016
LGWR: terminating instance due to error 313
Mon Sep  5 10:19:06 2016
System state dump is made for local instance
System State dumped to trace file /opt/oracle/admin/xff/bdump/xff1_diag_299654.trc

通过报错很明显可以看出来数据库挂掉的原因和当时不能创建redo的原因一样，都是由于ORA-27054导致数据库挂了，但是为什么创建redo成功,但是使用redo失败呢？
这里需要注意使用的命令是events,而这个命令是对当前会话和后续新建的会话生效，也就是说他不会对数据库已经存在的后台进程生效，那也就可以理解了，我创建redo是在执行events的当前命令行窗口处理的，因此可以创建成功；但是lgwr进程是数据库一启动就存在的进程，现在设置的events对他没有影响，因此当lgwr去使用redo的时候无法正常使用因此就导致数据库crash掉。如果希望event对已经存在的进程生效，可以考虑使用oradebug对进程进行设置event(这个案例主要要设置多个后台进程不光lgwr访问redo)，或者设置event=的方式，然后重启数据库让其生效。