IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    init.cssd startcheck—HP Service Guard未启动导致CRS无法正常启动

    惜分飞发表于 2015-05-25 04:26:20
    love 0

    联系:手机(13429648788) QQ(107644445)

    链接:http://www.xifenfei.com/5898.html

    标题:init.cssd startcheck—HP Service Guard未启动导致CRS无法正常启动

    作者:惜分飞©版权所有[文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.]

    早上到客户现场,客户告知有一套环境替换OCR和VOTEDISK之后,crs无法启动,让我看看。环境:HP RAC(只用一个节点)+10.2.0.5 Oracle 数据库
    start crs显示正常,但是无法启动

    # /app/oracle/product/10.2.0/crs/bin/crsctl start crs
    Attempting to start CRS stack 
    The CRS stack will be started shortly
    
    # ps -ef|grep crs
        root  6461     1  0  May 19  ?         0:00 /bin/sh /sbin/init.d/init.crsd run
        root 29719 23678  0 10:04:51 pts/tc    0:00 grep crs
    

    也无任何日志

    [xifenfei01][orawj][/root/xifenfei]#ls -ltr
    total 148
    drwxr-x---   2 oracle     dba             96 May 15  2014 admin
    drwxr-x---   2 root       dba             96 May 15  2014 crsd
    drwxr-x---   2 oracle     dba             96 May 15  2014 evmd
    drwxrwxr-t   5 oracle     dba           1024 Jun  4  2014 racg
    drwxr-x---   5 oracle     dba           1024 May 17 22:50 cssd
    -rw-rw-r--   1 root       dba          61568 May 24 15:26 alertxifenfei01.log
    drwxr-x---   2 oracle     dba           3072 May 24 15:43 client
    [xifenfei01][orawj][/root/xifenfei]#date
    Mon, May 25, 2015 11:30:09 AM
    

    表决磁盘和OCR信息

    [xifenfei01][orawj][/root/xifenfei]#ocrcheck
    Status of Oracle Cluster Registry is as follows :
             Version                  :          2
             Total space (kbytes)     :    1441492
             Used space (kbytes)      :       5972
             Available space (kbytes) :    1435520
             ID                       : 1714667730
             Device/File Name         : /dev/vgc01/rCMPR_VGC01_OCR1
                                        Device/File integrity check succeeded
             Device/File Name         : /dev/vgc02/rCMPR_VGC02_OCR2
                                        Device/File integrity check succeeded
    
             Cluster registry integrity check succeeded
    
    [xifenfei01][orawj][/root/xifenfei]#crsctl query css votedisk
     0.     0    /dev/vgc01/rCMPR_VGC01_VOTE1
     1.     0    /dev/vgc02/rCMPR_VGC02_VOTE2
     2.     0    /dev/vgc03/rCMPR_VGC03_VOTE3
    
    located 3 votedisk(s).
    

    ocr.loc文件路径

    # more /var/opt/oracle/ocr.loc
    #Device/file /dev/vgc02/rCMPR_VGC02_OCR2 getting replaced by device /dev/vgc02/rCMPR_VGC02_OCR2 
    ocrconfig_loc=/dev/vgc01/rCMPR_VGC01_OCR1
    ocrmirrorconfig_loc=/dev/vgc02/rCMPR_VGC02_OCR2
    local_only=false
    

    这里可以看出来表决磁盘和ocr等相关信息正常

    显示init.cssd startcheck进程

    [xifenfei01][orawj][/root/xifenfei]#ps -ef|grep init
        root     1     0  0  May 19  ?         0:03 init
        root   119     0  0  May 19  ?         0:00 pagetable_init_daemon
        root   115     0  0  May 19  ?         0:00 mdep_initiator_thread
        root 26820 26792  0 10:49:53 ?         0:00 /bin/sh /sbin/init.d/init.cssd startcheck
        root 26791     1  0 10:49:53 ?         0:00 /bin/sh /sbin/init.d/init.crsd run
        root 27183 23698  0 10:50:23 ?         0:00 /bin/sh /sbin/init.d/init.cssd startcheck
        root 26792     1  0 10:49:53 ?         0:00 /bin/sh /sbin/init.d/init.cssd fatal
        root 23698     1  0 10:45:23 ?         0:00 /bin/sh /sbin/init.d/init.evmd run
        root 26816 26791  0 10:49:53 ?         0:00 /bin/sh /sbin/init.d/init.cssd startcheck
      oracle 20534 11033  0 11:30:35 pts/ta    0:00 grep init
    

    这里的init.cssd startcheck大部分情况下,是由于不能访问存储或者第三方集群件无法访问导致

    查看vg状态

    VG Name                     /dev/vgc01
    VG Write Access             read/write     
    VG Status                   available                 
    Max LV                      255    
    Cur LV                      9      
    Open LV                     9      
    Max PV                      255    
    Cur PV                      1      
    Act PV                      1      
    Max PE per PV               3200         
    VGDA                        2   
    PE Size (Mbytes)            32              
    Total PE                    3199    
    Alloc PE                    736     
    Free PE                     2463    
    Total PVG                   0        
    Total Spare PVs             0              
    Total Spare PVs in use      0                     
    VG Version                  1.0       
    VG Max Size                 25500g     
    VG Max Extents              816000        
    
    VG Name                     /dev/vgc02
    VG Write Access             read/write     
    VG Status                   available                 
    Max LV                      255    
    Cur LV                      9      
    Open LV                     9      
    Max PV                      255    
    Cur PV                      1      
    Act PV                      1      
    Max PE per PV               3200         
    VGDA                        2   
    PE Size (Mbytes)            32              
    Total PE                    3199    
    Alloc PE                    736     
    Free PE                     2463    
    Total PVG                   0        
    Total Spare PVs             0              
    Total Spare PVs in use      0                     
    VG Version                  1.0       
    VG Max Size                 25500g     
    VG Max Extents              816000        
    
    VG Name                     /dev/vgc03
    VG Write Access             read/write     
    VG Status                   available                 
    Max LV                      255    
    Cur LV                      6      
    Open LV                     6      
    Max PV                      255    
    Cur PV                      1      
    Act PV                      1      
    Max PE per PV               3200         
    VGDA                        2   
    PE Size (Mbytes)            32              
    Total PE                    3199    
    Alloc PE                    448     
    Free PE                     2751    
    Total PVG                   0        
    Total Spare PVs             0              
    Total Spare PVs in use      0                     
    VG Version                  1.0       
    VG Max Size                 25500g     
    VG Max Extents              816000        
    

    这里可以看到,三个存放表决磁盘和ocr的vg都是available的

    看votedisk和ocr权限

    # ls -l /dev/vgc0*/rCMPR*|grep -v .dbf|grep -v .log|grep -v .ctl
    crw-r-----   1 oracle     dba         64 0x020008 May 24 14:40 /dev/vgc01/rCMPR_VGC01_OCR1
    crw-r-----   1 oracle     dba         64 0x020009 May 24 14:41 /dev/vgc01/rCMPR_VGC01_VOTE1
    crw-r-----   1 oracle     dba         64 0x030008 May 24 14:41 /dev/vgc02/rCMPR_VGC02_OCR2
    crw-r-----   1 oracle     dba         64 0x030009 May 24 14:41 /dev/vgc02/rCMPR_VGC02_VOTE2
    crw-r-----   1 oracle     dba         64 0x040006 May 24 14:41 /dev/vgc03/rCMPR_VGC03_VOTE3
    

    直接修改权限为777,然后尝试

    # chmod 777 /dev/vgc0*/rCMPR*|grep -v .dbf|grep -v .log|grep -v .ctl
    #  ls -l /dev/vgc0*/rCMPR*|grep -v .dbf|grep -v .log|grep -v .ctl
    crwxrwxrwx   1 oracle     dba         64 0x020008 May 24 14:40 /dev/vgc01/rCMPR_VGC01_OCR1
    crwxrwxrwx   1 oracle     dba         64 0x020009 May 24 14:41 /dev/vgc01/rCMPR_VGC01_VOTE1
    crwxrwxrwx   1 oracle     dba         64 0x030008 May 24 14:41 /dev/vgc02/rCMPR_VGC02_OCR2
    crwxrwxrwx   1 oracle     dba         64 0x030009 May 24 14:41 /dev/vgc02/rCMPR_VGC02_VOTE2
    crwxrwxrwx   1 oracle     dba         64 0x040006 May 24 14:41 /dev/vgc03/rCMPR_VGC03_VOTE3
    

    kill相关进程重试

    # ps -ef|grep init
        root     1     0  0  May 19  ?         0:03 init
        root   119     0  0  May 19  ?         0:00 pagetable_init_daemon
        root   115     0  0  May 19  ?         0:00 mdep_initiator_thread
        root  6458     1  0  May 19  ?         0:00 /bin/sh /sbin/init.d/init.evmd run
        root 20975     1  0 10:40:11 ?         0:00 /bin/sh /sbin/init.d/init.crsd run
        root 20976     1  0 10:40:11 ?         0:00 /bin/sh /sbin/init.d/init.cssd fatal
        root 21006 20976  0 10:40:11 ?         0:00 /bin/sh /sbin/init.d/init.cssd startcheck
        root 20997 20975  0 10:40:11 ?         0:00 /bin/sh /sbin/init.d/init.cssd startcheck
        root 21152 23678  0 10:40:18 pts/tc    0:00 grep init
    
    vi /etc/inittab
    #h1:3:respawn:/sbin/init.d/init.evmd run >/dev/null 2>&1 /dev/null 2>&1 /dev/null 2>&1 

    重新启动init进程

    vi /etc/inittab
    h1:3:respawn:/sbin/init.d/init.evmd run >/dev/null 2>&1 /dev/null 2>&1 /dev/null 2>&1 

    证明修改lv权限,问题依旧,不是votedisk和ocr的权限和所有者导致,通过dd和strings读相关文件,发现都OK.

    调试/sbin/init.d/init.cssd startcheck进程

    [xifenfei01][orawj][/root/xifenfei]#sh -x  /sbin/init.d/init.cssd startcheck
    + ORA_CRS_HOME=/app/oracle/product/10.2.0/crs
    + ORACLE_USER=oracle
    + ORACLE_HOME=/app/oracle/product/10.2.0/crs
    + export ORACLE_HOME
    + export ORA_CRS_HOME
    + export ORACLE_USER
    + DISABLE_OPROCD=false
    + OPROCD_DEFAULT_TIMEOUT=1000
    + OPROCD_DEFAULT_MARGIN=500
    + OPROCD_CHECK_TIMEOUT=2000
    + OPROCD_STOP_TIMEOUT=2000
    + OPROCD_DEFAULT_HISTORGRAM=
    + HOSTN=/bin/hostname
    + EXPRN=/usr/bin/expr
    + CUT=/usr/bin/cut
    + AWK=/bin/awk
    + ECHO=echo
    + TR=/bin/tr
    + /bin/uname
    + [ SunOS = HP-UX ]
    + /bin/uname
    + [ Linux = HP-UX ]
    + + /bin/hostname
    HOST=xifenfei01
    + + /usr/bin/expr xifenfei01 : .*
    len1=8
    + + /usr/bin/expr match xifenfei01 [0-9]*\.[0-9]*\.[0-9]*\.[0-9]*
    len2=0
    + [ 8 != 0 ]
    + + echo xifenfei01
    + /usr/bin/cut -d. -f1
    HOST=xifenfei01
    + + echo xifenfei01
    + /bin/tr [:upper:] [:lower:]
    HOST=xifenfei01
    + PS=/bin/ps
    + PSE=/bin/ps -e
    + PSEF=/bin/ps -ef
    + HEAD=/bin/head
    + GREP=/bin/grep
    + KILL=/bin/kill
    + KILLTERM=/bin/kill -TERM
    + KILLDIE=/bin/kill -9
    + KILLCHECK=/bin/kill -0 5852
    + SLEEP=/bin/sleep
    + NULL=/dev/null
    + UNAME=/bin/uname
    + CAT=/bin/cat
    ………………
    + eval /bin/true
    + /bin/true
    + [ 0 != 0 ]
    + eval /bin/ps -ef | /bin/grep '/usr/lbin/cm[g]msd' 1>/dev/null 2>/dev/null
    + /bin/grep /usr/lbin/cm[g]msd
    + /bin/ps -ef
    + 1> /dev/null 2> /dev/null
    + RC=1
    + [ 1 -ne 0 ]
    + /bin/logger -puser.err Oracle Cluster Ready Services waiting for HP-UX Service Guard to start.
    + /bin/sleep 60
    

    这里可以通过-x调试shell脚本,发现crs在等待HP-UX Service Guard启动,从而可以确定是由于HP-UX Service Guard未启动

    检查HP-UX Service Guard是否启动

    [xifenfei01][orawj][/root/xifenfei]#cmviewcl 
    
    CLUSTER           STATUS       
    crmdb_b_cluster   down         
      
      NODE           STATUS       STATE        
      xifenfei01       down         unknown      
      crmdbb02       down         unknown      
        
    UNOWNED_PACKAGES
    
        PACKAGE        STATUS           STATE            AUTO_RUN    NODE        
        pkg1           down             halted           enabled     unowned     
        pkg2           down             halted           enabled     unowned     
    

    通过这里,结合客户描述(只启动了一个节点,另外一个节点的vg未激活),可以判断出来由于只使用一个节点,在未启动Service Guard的情况下,直接激活vg,由于Service Guard未启动导致crs无法启动

    • 10g rac安装注意事项
    • 11.2.0.4 GI单独安装tfa
    • Oracle RAC10g UNKNOWN解决
    • Heartbeat安装及简单配置
    • oracle集群管理入门级命令
    • linux中oracle开机启动
    • 打patch出现Copy failed—tfa服务导致部分lib未释放
    • OLR相关维护


沪ICP备19023445号-2号
友情链接