开始本文的前提是你需要有一套正常运行的Oracle Data Guard,在这里,我使用的版本是11g。
一、环境声明
首先,有几个脚本需要说明:
1. 自动根据次数去执行日志切换的脚本
这个脚本稍后会在主库端执行,用以产生大量的归档序列
# make_some_archive_log_gap.sh #代码还没有写完,目前只写了一个需要自动的刷日志切换的方法 archive_log_dir="/u01/app/oracle/product/11.2.0/dbhome_1/dbs" str_thread="1" str_last="_939308144.dbf" duplicate_dir="/home/oracle/arch_log_dup" int_s="1" int_e="$1" while [ $int_s -lt $int_e ] do echo "---------- [$int_s]" sqlplus / as sysdba<
2. 根据日志编号移动归档日志的脚本
该脚本稍后会在备库端执行,用以方便的移动拿到的归档日志,从而手动造成GAP的现象
# file: move_file_by_sequence.sh str_search="$1" source_dir="/u01/app/oracle/dg_stdbylog" target_dir="/home/oracle/arch_log_dup" echo "search string is: [$str_search]" for item in $(ls $source_dir | grep "_"$str_search"_") do echo "----> item: $item" func_full_path="$source_dir/$item" echo "Full path is: [$func_full_path]" echo "Target dir is: [$target_dir]" echo "@ do move" mv $func_full_path $target_dir echo "" done # finished
在开始前,你还可以给自己的主库创建几张大表:
create table big_data as select * from dba_objects insert into big_data select * from big_data
二、关于Data Guard的GAP自动修复
1) 开始手动制造日志序列
在主库端执行手动日志切换的脚本:
sh auto_switch_logfile.sh 320
表示自动执行日志切换【320】次。
2) 在备库端
查看日志应用情况:
SYS@enmy --> / NAME THREAD# SEQUENCE# First Next APPLIED ----------------------------------------------------------------- ---------- ---------- ------------------- ------------------- --------- /u01/app/oracle/dg_archlog/1_1091_928630070.archive 1 1091 2017-03-25 12:21:41 2017-03-25 12:21:42 YES /u01/app/oracle/dg_archlog/1_1092_928630070.archive 1 1092 2017-03-25 12:21:42 2017-03-25 12:21:44 YES /u01/app/oracle/dg_archlog/1_1093_928630070.archive 1 1093 2017-03-25 12:21:44 2017-03-25 12:21:45 YES /u01/app/oracle/dg_archlog/1_1094_928630070.archive 1 1094 2017-03-25 12:21:45 2017-03-25 12:21:46 YES /u01/app/oracle/dg_archlog/1_1095_928630070.archive 1 1095 2017-03-25 12:21:46 2017-03-25 12:21:50 YES /u01/app/oracle/dg_archlog/1_1096_928630070.archive 1 1096 2017-03-25 12:21:50 2017-03-25 12:21:52 YES /u01/app/oracle/dg_archlog/1_1097_928630070.archive 1 1097 2017-03-25 12:21:52 2017-03-25 12:21:53 IN-MEMORY /u01/app/oracle/dg_archlog/1_1098_928630070.archive 1 1098 2017-03-25 12:21:53 2017-03-25 12:21:56 NO /u01/app/oracle/dg_archlog/1_1099_928630070.archive 1 1099 2017-03-25 12:21:56 2017-03-25 12:21:57 NO /u01/app/oracle/dg_archlog/1_1100_928630070.archive 1 1100 2017-03-25 12:21:57 2017-03-25 12:21:58 NO 10 rows selected. SYS@enmy --> SYS@enmy --> list 1* select name,thread#,sequence#,to_char(first_time,'yyyy-mm-dd hh24:mi:ss') "First",to_char(next_time,'yyyy-mm-dd hh24:mi:ss') "Next",applied from v$archived_log,(select max(sequence#) "SEQ#" from v$archived_log where applied='YES') b where sequence# between b.seq#-5 and b.seq#+9 order by sequence# SYS@enmy -->
可以看到,当前SEQUENCE# 【1099】的归档还没有被应用。
用移动脚本移动它:
[oracle@ora11gdg2 ~]$ sh move_arch.sh 1099 search string is: [1099] ----> item: 1_1099_928630070.archive Full path is: [/u01/app/oracle/dg_archlog/1_1099_928630070.archive] Target dir is: [/home/oracle/arch_log_dup] @ do move [oracle@ora11gdg2 ~]$
因为是移动,所以这时候归档目录中应该就没有1099对应的数据文件了。
这时候,GAP就出现了。
这时候备库的DG会被卡住吗?
并不会,因为10g开始的DG有自动的GAP解决机制。
通过备库端的ALERT日志可以更清楚的看到这个过程:
最初的时候:
Sat Mar 25 12:26:49 2017 Media Recovery Waiting for thread 1 sequence 1098 (in transit) Archived Log entry 490 added for thread 1 sequence 1098 rlc 928630070 ID 0xb7b27973 dest 2: RFS[3]: No standby redo logfiles created for thread 1 RFS[3]: Opened log for thread 1 sequence 1099 dbid -1213006989 branch 928630070 Archived Log entry 491 added for thread 1 sequence 1099 rlc 928630070 ID 0xb7b27973 dest 2: RFS[3]: No standby redo logfiles created for thread 1 RFS[3]: Opened log for thread 1 sequence 1100 dbid -1213006989 branch 928630070 Archived Log entry 492 added for thread 1 sequence 1100 rlc 928630070 ID 0xb7b27973 dest 2: RFS[3]: No standby redo logfiles created for thread 1 RFS[3]: Opened log for thread 1 sequence 1101 dbid -1213006989 branch 928630070 Media Recovery Log /u01/app/oracle/dg_archlog/1_1098_928630070.archive
可以看到,1099的归档通过RFS拿到备库了,并且,日志应用已经到了1098。
接下来,就是1099了,但我们知道1099对应的数据文件已经不在了。
看看Oracle会怎么表现:
Sat Mar 25 12:26:54 2017 Archived Log entry 493 added for thread 1 sequence 1101 rlc 928630070 ID 0xb7b27973 dest 2: Media Recovery Log /u01/app/oracle/dg_archlog/1_1099_928630070.archive Error opening /u01/app/oracle/dg_archlog/1_1099_928630070.archive Attempting refetch RFS[3]: No standby redo logfiles created for thread 1 RFS[3]: Opened log for thread 1 sequence 1102 dbid -1213006989 branch 928630070 Media Recovery Waiting for thread 1 sequence 1099
可以看到,Oracle发现1099对应的数据文件访问有问题,并且确定了GAP【Media Recovery Waiting for thread 1 sequence 1099】
接下来,Oracle让备库向主库请求1099的数据文件:【Attempting refetch】
Fetching gap sequence in thread 1, gap sequence 1099-1099 Archived Log entry 494 added for thread 1 sequence 1102 rlc 928630070 ID 0xb7b27973 dest 2: RFS[3]: No standby redo logfiles created for thread 1 RFS[3]: Opened log for thread 1 sequence 1103 dbid -1213006989 branch 928630070 Sat Mar 25 12:26:57 2017 RFS[4]: Allowing overwrite of partial archivelog for thread 1 sequence 1099 RFS[4]: Opened log for thread 1 sequence 1099 dbid -1213006989 branch 928630070 Archived Log entry 495 added for thread 1 sequence 1099 rlc 928630070 ID 0xb7b27973 dest 2: Archived Log entry 496 added for thread 1 sequence 1103 rlc 928630070 ID 0xb7b27973 dest 2: RFS[3]: No standby redo logfiles created for thread 1 RFS[3]: Opened log for thread 1 sequence 1104 dbid -1213006989 branch 928630070 Archived Log entry 497 added for thread 1 sequence 1104 rlc 928630070 ID 0xb7b27973 dest 2: RFS[3]: No standby redo logfiles created for thread 1 RFS[3]: Opened log for thread 1 sequence 1105 dbid -1213006989 branch 928630070 Archived Log entry 498 added for thread 1 sequence 1105 rlc 928630070 ID 0xb7b27973 dest 2: RFS[3]: No standby redo logfiles created for thread 1 RFS[3]: Opened log for thread 1 sequence 1106 dbid -1213006989 branch 928630070 Sat Mar 25 12:27:06 2017 Archived Log entry 499 added for thread 1 sequence 1106 rlc 928630070 ID 0xb7b27973 dest 2: RFS[3]: No standby redo logfiles created for thread 1 RFS[3]: Opened log for thread 1 sequence 1107 dbid -1213006989 branch 928630070 Sat Mar 25 12:27:07 2017 Media Recovery Log /u01/app/oracle/dg_archlog/1_1099_928630070.archive Media Recovery Log /u01/app/oracle/dg_archlog/1_1100_928630070.archive
可以看到,最后备库重新拿到了1099,并且也应用成功了。
而在备库中查阅这一块的记录的时候,你也会发现有两条1099有关的记录:
SYS@enmy --> select name,thread#,sequence#,to_char(first_time,'yyyy-mm-dd hh24:mi:ss') "First",to_char(next_time,'yyyy-mm-dd hh24:mi:ss') "Next",applied from v$archived_log where sequence#=1099; NAME THREAD# SEQUENCE# First Next APPLIED ----------------------------------------------------------------- ---------- ---------- ------------------- ------------------- --------- /u01/app/oracle/dg_archlog/1_1099_928630070.archive 1 1099 2017-03-25 12:21:56 2017-03-25 12:21:57 NO /u01/app/oracle/dg_archlog/1_1099_928630070.archive 1 1099 2017-03-25 12:21:56 2017-03-25 12:21:57 YES SYS@enmy -->
这是一个正常的DG在遇到GAP的时候,会发生的事情。
————————————————————————
三、还可能出现的问题
但很多时候,GAP的自动解决机制并不会正常的发生效用,原因有很多,但主要涉及两个大的方面:网络与存储。
存储方面:
1. 主库端是,否还有备库请求的归档日志
2. 备库端的归档目录,是否有足够空间存放即将传过来的归档数据
网络:
0. 主库与备库之间的网络是否连通,带宽是否稳定、顺畅
1. 备库端到主库端的TNS配置是否正确
2. 主库端监听器的SERVICE_NAME与GLOBAL_NAME是否正确
3. 主库端监听器如果不是启用默认的LISTENER与1521端口,那么实例中是否正确配置了LOCAL_LISTENER
4. 主库端实例配置的DB_UNIQUE_NAME是否正确
5. 主库端的服务器/etc/hosts中是否声明了备库端的IP与主机名;或者主库端是否可以解析到备库端的主机名
6. 主库与备库之间是否有防火墙设备,拦下了需要开放的端口?主备库所在的服务器是否也启用了防火墙类的服务拦下了需要开放的端口?
————————————————————————
如果能注意到以上这些问题,应该可以规避掉很多错误。
————————————————————————
Done。