同事遇到了一个比较奇怪的问题,某客户是4个节点的RAC,olsnodes能看到全部的节点,但是check cluster只能看到部分节点,且伴随CRS-4404的报错。
查了mos,和crs-4404的报错都指向gpnp。
“crsctl check cluster -all” command gives CRS-4404, CRS-4405 errors (Doc ID 1392934.1)
CRSCTL CHECK CLUSTER -ALL errors out with CRS-4404 & CRS-2332 seen in GI Alert Log (Doc ID 1620503.1)
但是我认为应该不是gpnp进程的问题,而是mdnsd进程的问题。我在一个3节点的RAC中测试了一下kill gpnpd进程,结果是返回正常,而kill mdnsd进程才发生了只能看到部分节点,且伴随CRS-4404的报错:
在节点3上kill gpnp进程:
[root@12102-rac3 ~]# ps -ef |grep d.bin
root 2034 1 6 21:15 ? 00:00:19 /u01/app/12.1.0.2/grid/bin/ohasd.bin reboot
oracle 2349 1 0 21:16 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/oraagent.bin
oracle 2361 1 0 21:16 ? 00:00:01 /u01/app/12.1.0.2/grid/bin/mdnsd.bin
oracle 2364 1 1 21:16 ? 00:00:03 /u01/app/12.1.0.2/grid/bin/evmd.bin
oracle 2377 1 0 21:16 ? 00:00:01 /u01/app/12.1.0.2/grid/bin/gpnpd.bin
oracle 2387 1 3 21:16 ? 00:00:09 /u01/app/12.1.0.2/grid/bin/gipcd.bin
oracle 2402 2364 0 21:16 ? 00:00:01 /u01/app/12.1.0.2/grid/bin/evmlogger.bin -o /u01/app/12.1.0.2/grid/log/[HOSTNAME]/evmd/evmlogger.info -l /u01/app/12.1.0.2/grid/log/[HOSTNAME]/evmd/evmlogger.log
root 2415 1 1 21:16 ? 00:00:03 /u01/app/12.1.0.2/grid/bin/orarootagent.bin
root 2664 1 0 21:16 ? 00:00:01 /u01/app/12.1.0.2/grid/bin/cssdmonitor
root 2681 1 0 21:16 ? 00:00:01 /u01/app/12.1.0.2/grid/bin/cssdagent
oracle 2692 1 5 21:16 ? 00:00:14 /u01/app/12.1.0.2/grid/bin/ocssd.bin
root 2850 1 1 21:16 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/octssd.bin reboot
root 2891 1 1 21:16 ? 00:00:04 /u01/app/12.1.0.2/grid/bin/osysmond.bin
root 2898 1 2 21:16 ? 00:00:05 /u01/app/12.1.0.2/grid/bin/crsd.bin reboot
root 3011 1 0 21:17 ? 00:00:01 /u01/app/12.1.0.2/grid/bin/orarootagent.bin
oracle 3117 1 1 21:19 ? 00:00:00 /u01/app/12.1.0.2/grid/bin/oraagent.bin
oracle 3138 1 0 21:19 ? 00:00:00 /u01/app/12.1.0.2/grid/bin/tnslsnr ASMNET1LSNR_ASM -no_crs_notify -inherit
root 3307 3259 0 21:20 pts/1 00:00:00 grep d.bin
[root@12102-rac3 ~]# kill -9 2377
[root@12102-rac3 ~]# ps -ef |grep d.bin
root 2034 1 5 21:15 ? 00:00:26 /u01/app/12.1.0.2/grid/bin/ohasd.bin reboot
oracle 2349 1 0 21:16 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/oraagent.bin
oracle 2361 1 0 21:16 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/mdnsd.bin
oracle 2364 1 1 21:16 ? 00:00:05 /u01/app/12.1.0.2/grid/bin/evmd.bin
oracle 2387 1 3 21:16 ? 00:00:15 /u01/app/12.1.0.2/grid/bin/gipcd.bin
oracle 2402 2364 0 21:16 ? 00:00:01 /u01/app/12.1.0.2/grid/bin/evmlogger.bin -o /u01/app/12.1.0.2/grid/log/[HOSTNAME]/evmd/evmlogger.info -l /u01/app/12.1.0.2/grid/log/[HOSTNAME]/evmd/evmlogger.log
root 2415 1 1 21:16 ? 00:00:05 /u01/app/12.1.0.2/grid/bin/orarootagent.bin
root 2664 1 0 21:16 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/cssdmonitor
root 2681 1 0 21:16 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/cssdagent
oracle 2692 1 5 21:16 ? 00:00:20 /u01/app/12.1.0.2/grid/bin/ocssd.bin
root 2850 1 1 21:16 ? 00:00:04 /u01/app/12.1.0.2/grid/bin/octssd.bin reboot
root 2891 1 1 21:16 ? 00:00:06 /u01/app/12.1.0.2/grid/bin/osysmond.bin
root 2898 1 2 21:16 ? 00:00:08 /u01/app/12.1.0.2/grid/bin/crsd.bin reboot
root 3011 1 0 21:17 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/orarootagent.bin
oracle 3117 1 0 21:19 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/oraagent.bin
oracle 3138 1 0 21:19 ? 00:00:00 /u01/app/12.1.0.2/grid/bin/tnslsnr ASMNET1LSNR_ASM -no_crs_notify -inherit
oracle 3839 1 3 21:23 ? 00:00:00 /u01/app/12.1.0.2/grid/bin/gpnpd.bin #######gpnpd进程被自动拉起来.
root 3878 3259 0 21:23 pts/1 00:00:00 grep d.bin
[root@12102-rac3 ~]#
此时while true; do olsnodes; crsctl check cluster -all; crsctl stat res -t |grep ons; echo “================`date`=================”; sleep 3; done
的输出:
在节点1上:
================Mon Nov 16 21:42:39 CST 2015=================
12102-rac1
12102-rac2
12102-rac3
**************************************************************
12102-rac1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
12102-rac2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
12102-rac3:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
ora.ons
================Mon Nov 16 21:42:43 CST 2015=================
12102-rac1
12102-rac2
12102-rac3
**************************************************************
12102-rac1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
12102-rac2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
12102-rac3:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
ora.ons
================Mon Nov 16 21:42:47 CST 2015=================
在节点2上:
================Mon Nov 16 21:42:05 CST 2015=================
12102-rac1
12102-rac2
12102-rac3
**************************************************************
12102-rac1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
12102-rac2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
12102-rac3:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
ora.ons
================Mon Nov 16 21:42:11 CST 2015=================
12102-rac1
12102-rac2
12102-rac3
**************************************************************
12102-rac1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
12102-rac2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
12102-rac3:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
ora.ons
================Mon Nov 16 21:42:14 CST 2015=================
12102-rac1
12102-rac2
12102-rac3
**************************************************************
12102-rac1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
12102-rac2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
12102-rac3:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
ora.ons
================Mon Nov 16 21:42:18 CST 2015=================
在节点3上:
================Mon Nov 16 21:43:55 CST 2015=================
12102-rac1
12102-rac2
12102-rac3
**************************************************************
12102-rac1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
12102-rac2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
12102-rac3:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
ora.ons
================Mon Nov 16 21:43:58 CST 2015=================
12102-rac1
12102-rac2
12102-rac3
**************************************************************
12102-rac1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
12102-rac2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
12102-rac3:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
ora.ons
================Mon Nov 16 21:44:02 CST 2015=================
12102-rac1
12102-rac2
12102-rac3
**************************************************************
12102-rac1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
12102-rac2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
12102-rac3:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
ora.ons
================Mon Nov 16 21:44:05 CST 2015=================
在节点3上kill mdnsd进程:
[root@12102-rac3 ~]# ps -ef |grep d.bin
root 2143 1 5 21:37 ? 00:00:26 /u01/app/12.1.0.2/grid/bin/ohasd.bin reboot
oracle 2413 1 0 21:37 ? 00:00:03 /u01/app/12.1.0.2/grid/bin/oraagent.bin
oracle 2433 1 0 21:37 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/mdnsd.bin
oracle 2436 1 1 21:37 ? 00:00:05 /u01/app/12.1.0.2/grid/bin/evmd.bin
oracle 2456 1 0 21:37 ? 00:00:03 /u01/app/12.1.0.2/grid/bin/gpnpd.bin
oracle 2472 1 3 21:37 ? 00:00:16 /u01/app/12.1.0.2/grid/bin/gipcd.bin
oracle 2478 2436 0 21:37 ? 00:00:01 /u01/app/12.1.0.2/grid/bin/evmlogger.bin -o /u01/app/12.1.0.2/grid/log/[HOSTNAME]/evmd/evmlogger.info -l /u01/app/12.1.0.2/grid/log/[HOSTNAME]/evmd/evmlogger.log
root 2495 1 1 21:37 ? 00:00:05 /u01/app/12.1.0.2/grid/bin/orarootagent.bin
root 2803 1 0 21:37 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/cssdmonitor
root 2820 1 0 21:37 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/cssdagent
oracle 2831 1 4 21:37 ? 00:00:21 /u01/app/12.1.0.2/grid/bin/ocssd.bin
root 2958 1 1 21:37 ? 00:00:04 /u01/app/12.1.0.2/grid/bin/octssd.bin reboot
root 2979 1 1 21:37 ? 00:00:07 /u01/app/12.1.0.2/grid/bin/osysmond.bin
root 2986 1 2 21:37 ? 00:00:08 /u01/app/12.1.0.2/grid/bin/crsd.bin reboot
root 3086 1 0 21:38 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/orarootagent.bin
oracle 3163 1 0 21:39 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/oraagent.bin
root 4082 3274 0 21:45 pts/0 00:00:00 grep d.bin
[root@12102-rac3 ~]#
[root@12102-rac3 ~]#
[root@12102-rac3 ~]#
[root@12102-rac3 ~]# kill -9 2433
[root@12102-rac3 ~]#
[root@12102-rac3 ~]#
[root@12102-rac3 ~]#
[root@12102-rac3 ~]# ps -ef |grep d.bin
root 2143 1 5 21:37 ? 00:00:26 /u01/app/12.1.0.2/grid/bin/ohasd.bin reboot
oracle 2413 1 0 21:37 ? 00:00:03 /u01/app/12.1.0.2/grid/bin/oraagent.bin
oracle 2436 1 1 21:37 ? 00:00:05 /u01/app/12.1.0.2/grid/bin/evmd.bin
oracle 2456 1 0 21:37 ? 00:00:03 /u01/app/12.1.0.2/grid/bin/gpnpd.bin
oracle 2472 1 3 21:37 ? 00:00:17 /u01/app/12.1.0.2/grid/bin/gipcd.bin
oracle 2478 2436 0 21:37 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/evmlogger.bin -o /u01/app/12.1.0.2/grid/log/[HOSTNAME]/evmd/evmlogger.info -l /u01/app/12.1.0.2/grid/log/[HOSTNAME]/evmd/evmlogger.log
root 2495 1 1 21:37 ? 00:00:05 /u01/app/12.1.0.2/grid/bin/orarootagent.bin
root 2803 1 0 21:37 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/cssdmonitor
root 2820 1 0 21:37 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/cssdagent
oracle 2831 1 4 21:37 ? 00:00:21 /u01/app/12.1.0.2/grid/bin/ocssd.bin
root 2958 1 1 21:37 ? 00:00:04 /u01/app/12.1.0.2/grid/bin/octssd.bin reboot
root 2979 1 1 21:37 ? 00:00:07 /u01/app/12.1.0.2/grid/bin/osysmond.bin
root 2986 1 2 21:37 ? 00:00:09 /u01/app/12.1.0.2/grid/bin/crsd.bin reboot
root 3086 1 0 21:38 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/orarootagent.bin
oracle 3163 1 0 21:39 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/oraagent.bin
oracle 4127 1 1 21:45 ? 00:00:00 /u01/app/12.1.0.2/grid/bin/mdnsd.bin
root 4136 3274 0 21:45 pts/0 00:00:00 grep d.bin
[root@12102-rac3 ~]#
此时while true; do olsnodes; crsctl check cluster -all; crsctl stat res -t |grep ons; echo “================`date`=================”; sleep 3; done
的输出:
在节点1上:
================Mon Nov 16 21:47:18 CST 2015=================
12102-rac1
12102-rac2
12102-rac3
**************************************************************
12102-rac1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
12102-rac2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
CRS-4404: The following nodes did not reply within the allotted time:
12102-rac3
ora.ons
================Mon Nov 16 21:48:22 CST 2015=================
12102-rac1
12102-rac2
12102-rac3
**************************************************************
12102-rac1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
12102-rac2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
CRS-4404: The following nodes did not reply within the allotted time:
12102-rac3
ora.ons
================Mon Nov 16 21:49:25 CST 2015=================
节点2上:
================Mon Nov 16 21:50:28 CST 2015=================
12102-rac1
12102-rac2
12102-rac3
**************************************************************
12102-rac1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
12102-rac2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
CRS-4404: The following nodes did not reply within the allotted time:
12102-rac3
ora.ons
================Mon Nov 16 21:51:31 CST 2015=================
12102-rac1
12102-rac2
12102-rac3
**************************************************************
12102-rac1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
12102-rac2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
CRS-4404: The following nodes did not reply within the allotted time:
12102-rac3
ora.ons
================Mon Nov 16 21:52:35 CST 2015=================
节点3上:
================Mon Nov 16 21:50:28 CST 2015=================
12102-rac1
12102-rac2
12102-rac3
**************************************************************
12102-rac3:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
CRS-4404: The following nodes did not reply within the allotted time:
12102-rac1, 12102-rac2
ora.ons
================Mon Nov 16 21:51:31 CST 2015=================
12102-rac1
12102-rac2
12102-rac3
**************************************************************
12102-rac3:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
CRS-4404: The following nodes did not reply within the allotted time:
12102-rac1, 12102-rac2
ora.ons
================Mon Nov 16 21:52:34 CST 2015=================
12102-rac1
12102-rac2
12102-rac3
**************************************************************
12102-rac3:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
CRS-4404: The following nodes did not reply within the allotted time:
12102-rac1, 12102-rac2
ora.ons
================Mon Nov 16 21:53:38 CST 2015=================
12102-rac1
12102-rac2
12102-rac3
**************************************************************
12102-rac3:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
CRS-4404: The following nodes did not reply within the allotted time:
12102-rac1, 12102-rac2
ora.ons
================Mon Nov 16 21:54:41 CST 2015=================