IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    olsnodes能看到全部节点,但是check cluster看不到全部

    小荷发表于 2015-12-01 03:51:54
    love 0

    同事遇到了一个比较奇怪的问题,某客户是4个节点的RAC,olsnodes能看到全部的节点,但是check cluster只能看到部分节点,且伴随CRS-4404的报错。

    查了mos,和crs-4404的报错都指向gpnp。
    “crsctl check cluster -all” command gives CRS-4404, CRS-4405 errors (Doc ID 1392934.1)
    CRSCTL CHECK CLUSTER -ALL errors out with CRS-4404 & CRS-2332 seen in GI Alert Log (Doc ID 1620503.1)

    但是我认为应该不是gpnp进程的问题,而是mdnsd进程的问题。我在一个3节点的RAC中测试了一下kill gpnpd进程,结果是返回正常,而kill mdnsd进程才发生了只能看到部分节点,且伴随CRS-4404的报错:

    在节点3上kill gpnp进程:

    [root@12102-rac3 ~]# ps -ef |grep d.bin
    root      2034     1  6 21:15 ?        00:00:19 /u01/app/12.1.0.2/grid/bin/ohasd.bin reboot
    oracle    2349     1  0 21:16 ?        00:00:02 /u01/app/12.1.0.2/grid/bin/oraagent.bin
    oracle    2361     1  0 21:16 ?        00:00:01 /u01/app/12.1.0.2/grid/bin/mdnsd.bin
    oracle    2364     1  1 21:16 ?        00:00:03 /u01/app/12.1.0.2/grid/bin/evmd.bin
    oracle    2377     1  0 21:16 ?        00:00:01 /u01/app/12.1.0.2/grid/bin/gpnpd.bin
    oracle    2387     1  3 21:16 ?        00:00:09 /u01/app/12.1.0.2/grid/bin/gipcd.bin
    oracle    2402  2364  0 21:16 ?        00:00:01 /u01/app/12.1.0.2/grid/bin/evmlogger.bin -o /u01/app/12.1.0.2/grid/log/[HOSTNAME]/evmd/evmlogger.info -l /u01/app/12.1.0.2/grid/log/[HOSTNAME]/evmd/evmlogger.log
    root      2415     1  1 21:16 ?        00:00:03 /u01/app/12.1.0.2/grid/bin/orarootagent.bin
    root      2664     1  0 21:16 ?        00:00:01 /u01/app/12.1.0.2/grid/bin/cssdmonitor
    root      2681     1  0 21:16 ?        00:00:01 /u01/app/12.1.0.2/grid/bin/cssdagent
    oracle    2692     1  5 21:16 ?        00:00:14 /u01/app/12.1.0.2/grid/bin/ocssd.bin
    root      2850     1  1 21:16 ?        00:00:02 /u01/app/12.1.0.2/grid/bin/octssd.bin reboot
    root      2891     1  1 21:16 ?        00:00:04 /u01/app/12.1.0.2/grid/bin/osysmond.bin
    root      2898     1  2 21:16 ?        00:00:05 /u01/app/12.1.0.2/grid/bin/crsd.bin reboot
    root      3011     1  0 21:17 ?        00:00:01 /u01/app/12.1.0.2/grid/bin/orarootagent.bin
    oracle    3117     1  1 21:19 ?        00:00:00 /u01/app/12.1.0.2/grid/bin/oraagent.bin
    oracle    3138     1  0 21:19 ?        00:00:00 /u01/app/12.1.0.2/grid/bin/tnslsnr ASMNET1LSNR_ASM -no_crs_notify -inherit
    root      3307  3259  0 21:20 pts/1    00:00:00 grep d.bin
    [root@12102-rac3 ~]# kill -9 2377
    [root@12102-rac3 ~]# ps -ef |grep d.bin
    root      2034     1  5 21:15 ?        00:00:26 /u01/app/12.1.0.2/grid/bin/ohasd.bin reboot
    oracle    2349     1  0 21:16 ?        00:00:02 /u01/app/12.1.0.2/grid/bin/oraagent.bin
    oracle    2361     1  0 21:16 ?        00:00:02 /u01/app/12.1.0.2/grid/bin/mdnsd.bin
    oracle    2364     1  1 21:16 ?        00:00:05 /u01/app/12.1.0.2/grid/bin/evmd.bin
    oracle    2387     1  3 21:16 ?        00:00:15 /u01/app/12.1.0.2/grid/bin/gipcd.bin
    oracle    2402  2364  0 21:16 ?        00:00:01 /u01/app/12.1.0.2/grid/bin/evmlogger.bin -o /u01/app/12.1.0.2/grid/log/[HOSTNAME]/evmd/evmlogger.info -l /u01/app/12.1.0.2/grid/log/[HOSTNAME]/evmd/evmlogger.log
    root      2415     1  1 21:16 ?        00:00:05 /u01/app/12.1.0.2/grid/bin/orarootagent.bin
    root      2664     1  0 21:16 ?        00:00:02 /u01/app/12.1.0.2/grid/bin/cssdmonitor
    root      2681     1  0 21:16 ?        00:00:02 /u01/app/12.1.0.2/grid/bin/cssdagent
    oracle    2692     1  5 21:16 ?        00:00:20 /u01/app/12.1.0.2/grid/bin/ocssd.bin
    root      2850     1  1 21:16 ?        00:00:04 /u01/app/12.1.0.2/grid/bin/octssd.bin reboot
    root      2891     1  1 21:16 ?        00:00:06 /u01/app/12.1.0.2/grid/bin/osysmond.bin
    root      2898     1  2 21:16 ?        00:00:08 /u01/app/12.1.0.2/grid/bin/crsd.bin reboot
    root      3011     1  0 21:17 ?        00:00:02 /u01/app/12.1.0.2/grid/bin/orarootagent.bin
    oracle    3117     1  0 21:19 ?        00:00:02 /u01/app/12.1.0.2/grid/bin/oraagent.bin
    oracle    3138     1  0 21:19 ?        00:00:00 /u01/app/12.1.0.2/grid/bin/tnslsnr ASMNET1LSNR_ASM -no_crs_notify -inherit
    oracle    3839     1  3 21:23 ?        00:00:00 /u01/app/12.1.0.2/grid/bin/gpnpd.bin  #######gpnpd进程被自动拉起来.
    root      3878  3259  0 21:23 pts/1    00:00:00 grep d.bin
    [root@12102-rac3 ~]#

    此时while true; do olsnodes; crsctl check cluster -all; crsctl stat res -t |grep ons; echo “================`date`=================”; sleep 3; done
    的输出:

    在节点1上:
    ================Mon Nov 16 21:42:39 CST 2015=================
    12102-rac1
    12102-rac2
    12102-rac3
    **************************************************************
    12102-rac1:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    12102-rac2:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    12102-rac3:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    ora.ons
    ================Mon Nov 16 21:42:43 CST 2015=================
    12102-rac1
    12102-rac2
    12102-rac3
    **************************************************************
    12102-rac1:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    12102-rac2:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    12102-rac3:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    ora.ons
    ================Mon Nov 16 21:42:47 CST 2015=================
     
     
    在节点2上:
    ================Mon Nov 16 21:42:05 CST 2015=================
    12102-rac1
    12102-rac2
    12102-rac3
    **************************************************************
    12102-rac1:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    12102-rac2:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    12102-rac3:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    ora.ons
    ================Mon Nov 16 21:42:11 CST 2015=================
    12102-rac1
    12102-rac2
    12102-rac3
    **************************************************************
    12102-rac1:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    12102-rac2:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    12102-rac3:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    ora.ons
    ================Mon Nov 16 21:42:14 CST 2015=================
    12102-rac1
    12102-rac2
    12102-rac3
    **************************************************************
    12102-rac1:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    12102-rac2:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    12102-rac3:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    ora.ons
    ================Mon Nov 16 21:42:18 CST 2015=================
     
     
    在节点3上:
    ================Mon Nov 16 21:43:55 CST 2015=================
    12102-rac1
    12102-rac2
    12102-rac3
    **************************************************************
    12102-rac1:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    12102-rac2:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    12102-rac3:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    ora.ons
    ================Mon Nov 16 21:43:58 CST 2015=================
    12102-rac1
    12102-rac2
    12102-rac3
    **************************************************************
    12102-rac1:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    12102-rac2:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    12102-rac3:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    ora.ons
    ================Mon Nov 16 21:44:02 CST 2015=================
    12102-rac1
    12102-rac2
    12102-rac3
    **************************************************************
    12102-rac1:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    12102-rac2:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    12102-rac3:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    ora.ons
    ================Mon Nov 16 21:44:05 CST 2015=================

    在节点3上kill mdnsd进程:

    [root@12102-rac3 ~]# ps -ef |grep d.bin
    root      2143     1  5 21:37 ?        00:00:26 /u01/app/12.1.0.2/grid/bin/ohasd.bin reboot
    oracle    2413     1  0 21:37 ?        00:00:03 /u01/app/12.1.0.2/grid/bin/oraagent.bin
    oracle    2433     1  0 21:37 ?        00:00:02 /u01/app/12.1.0.2/grid/bin/mdnsd.bin
    oracle    2436     1  1 21:37 ?        00:00:05 /u01/app/12.1.0.2/grid/bin/evmd.bin
    oracle    2456     1  0 21:37 ?        00:00:03 /u01/app/12.1.0.2/grid/bin/gpnpd.bin
    oracle    2472     1  3 21:37 ?        00:00:16 /u01/app/12.1.0.2/grid/bin/gipcd.bin
    oracle    2478  2436  0 21:37 ?        00:00:01 /u01/app/12.1.0.2/grid/bin/evmlogger.bin -o /u01/app/12.1.0.2/grid/log/[HOSTNAME]/evmd/evmlogger.info -l /u01/app/12.1.0.2/grid/log/[HOSTNAME]/evmd/evmlogger.log
    root      2495     1  1 21:37 ?        00:00:05 /u01/app/12.1.0.2/grid/bin/orarootagent.bin
    root      2803     1  0 21:37 ?        00:00:02 /u01/app/12.1.0.2/grid/bin/cssdmonitor
    root      2820     1  0 21:37 ?        00:00:02 /u01/app/12.1.0.2/grid/bin/cssdagent
    oracle    2831     1  4 21:37 ?        00:00:21 /u01/app/12.1.0.2/grid/bin/ocssd.bin
    root      2958     1  1 21:37 ?        00:00:04 /u01/app/12.1.0.2/grid/bin/octssd.bin reboot
    root      2979     1  1 21:37 ?        00:00:07 /u01/app/12.1.0.2/grid/bin/osysmond.bin
    root      2986     1  2 21:37 ?        00:00:08 /u01/app/12.1.0.2/grid/bin/crsd.bin reboot
    root      3086     1  0 21:38 ?        00:00:02 /u01/app/12.1.0.2/grid/bin/orarootagent.bin
    oracle    3163     1  0 21:39 ?        00:00:02 /u01/app/12.1.0.2/grid/bin/oraagent.bin
    root      4082  3274  0 21:45 pts/0    00:00:00 grep d.bin
    [root@12102-rac3 ~]#
    [root@12102-rac3 ~]#
    [root@12102-rac3 ~]#
    [root@12102-rac3 ~]# kill -9 2433
    [root@12102-rac3 ~]#
    [root@12102-rac3 ~]#
    [root@12102-rac3 ~]#
    [root@12102-rac3 ~]# ps -ef |grep d.bin
    root      2143     1  5 21:37 ?        00:00:26 /u01/app/12.1.0.2/grid/bin/ohasd.bin reboot
    oracle    2413     1  0 21:37 ?        00:00:03 /u01/app/12.1.0.2/grid/bin/oraagent.bin
    oracle    2436     1  1 21:37 ?        00:00:05 /u01/app/12.1.0.2/grid/bin/evmd.bin
    oracle    2456     1  0 21:37 ?        00:00:03 /u01/app/12.1.0.2/grid/bin/gpnpd.bin
    oracle    2472     1  3 21:37 ?        00:00:17 /u01/app/12.1.0.2/grid/bin/gipcd.bin
    oracle    2478  2436  0 21:37 ?        00:00:02 /u01/app/12.1.0.2/grid/bin/evmlogger.bin -o /u01/app/12.1.0.2/grid/log/[HOSTNAME]/evmd/evmlogger.info -l /u01/app/12.1.0.2/grid/log/[HOSTNAME]/evmd/evmlogger.log
    root      2495     1  1 21:37 ?        00:00:05 /u01/app/12.1.0.2/grid/bin/orarootagent.bin
    root      2803     1  0 21:37 ?        00:00:02 /u01/app/12.1.0.2/grid/bin/cssdmonitor
    root      2820     1  0 21:37 ?        00:00:02 /u01/app/12.1.0.2/grid/bin/cssdagent
    oracle    2831     1  4 21:37 ?        00:00:21 /u01/app/12.1.0.2/grid/bin/ocssd.bin
    root      2958     1  1 21:37 ?        00:00:04 /u01/app/12.1.0.2/grid/bin/octssd.bin reboot
    root      2979     1  1 21:37 ?        00:00:07 /u01/app/12.1.0.2/grid/bin/osysmond.bin
    root      2986     1  2 21:37 ?        00:00:09 /u01/app/12.1.0.2/grid/bin/crsd.bin reboot
    root      3086     1  0 21:38 ?        00:00:02 /u01/app/12.1.0.2/grid/bin/orarootagent.bin
    oracle    3163     1  0 21:39 ?        00:00:02 /u01/app/12.1.0.2/grid/bin/oraagent.bin
    oracle    4127     1  1 21:45 ?        00:00:00 /u01/app/12.1.0.2/grid/bin/mdnsd.bin
    root      4136  3274  0 21:45 pts/0    00:00:00 grep d.bin
    [root@12102-rac3 ~]#

    此时while true; do olsnodes; crsctl check cluster -all; crsctl stat res -t |grep ons; echo “================`date`=================”; sleep 3; done
    的输出:

    在节点1上:
    ================Mon Nov 16 21:47:18 CST 2015=================
    12102-rac1
    12102-rac2
    12102-rac3
    **************************************************************
    12102-rac1:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    12102-rac2:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    CRS-4404: The following nodes did not reply within the allotted time:
    12102-rac3
    ora.ons
    ================Mon Nov 16 21:48:22 CST 2015=================
    12102-rac1
    12102-rac2
    12102-rac3
     **************************************************************
    12102-rac1:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    12102-rac2:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    CRS-4404: The following nodes did not reply within the allotted time:
    12102-rac3
    ora.ons
    ================Mon Nov 16 21:49:25 CST 2015=================
     
    节点2上:
    ================Mon Nov 16 21:50:28 CST 2015=================
    12102-rac1
    12102-rac2
    12102-rac3
     **************************************************************
    12102-rac1:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    12102-rac2:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    CRS-4404: The following nodes did not reply within the allotted time:
    12102-rac3
    ora.ons
    ================Mon Nov 16 21:51:31 CST 2015=================
    12102-rac1
    12102-rac2
    12102-rac3
    **************************************************************
    12102-rac1:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    12102-rac2:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    CRS-4404: The following nodes did not reply within the allotted time:
    12102-rac3
    ora.ons
    ================Mon Nov 16 21:52:35 CST 2015=================
     
    节点3上:
    ================Mon Nov 16 21:50:28 CST 2015=================
    12102-rac1
    12102-rac2
    12102-rac3
    **************************************************************
    12102-rac3:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    CRS-4404: The following nodes did not reply within the allotted time:
    12102-rac1, 12102-rac2
    ora.ons
    ================Mon Nov 16 21:51:31 CST 2015=================
    12102-rac1
    12102-rac2
    12102-rac3
     **************************************************************
    12102-rac3:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    CRS-4404: The following nodes did not reply within the allotted time:
    12102-rac1, 12102-rac2
    ora.ons
    ================Mon Nov 16 21:52:34 CST 2015=================
    12102-rac1
    12102-rac2
    12102-rac3
    **************************************************************
    12102-rac3:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    CRS-4404: The following nodes did not reply within the allotted time:
    12102-rac1, 12102-rac2
    ora.ons
    ================Mon Nov 16 21:53:38 CST 2015=================
    12102-rac1
    12102-rac2
    12102-rac3
     **************************************************************
    12102-rac3:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    CRS-4404: The following nodes did not reply within the allotted time:
    12102-rac1, 12102-rac2
    ora.ons
    ================Mon Nov 16 21:54:41 CST 2015=================


沪ICP备19023445号-2号
友情链接