IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    多cpu环境中运行root.sh失败,asm报ORA-04031

    惜分飞发表于 2015-07-24 17:16:22
    love 0

    联系:手机(13429648788) QQ(107644445)

    链接:http://www.xifenfei.com/5985.html

    标题:多cpu环境中运行root.sh失败,asm报ORA-04031

    作者:惜分飞©版权所有[文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.]

    有朋友和我反馈,说他们在装linux 6.5上面装11.2.0.3的rac出现异常,root.sh在第一个节点执行就失败了,请求帮助
    root.sh-asm-fail


    根据上面记录,查看asmca日志
    [main] [ 2015-07-24 12:49:35.885 CST ] [SQLEngine.reInitialize:738]  Reinitializing SQLEngine...
    [main] [ 2015-07-24 12:49:35.885 CST ] [OracleHome.getVersion:889]  OracleHome.getVersion called.  Current Version: 11.2.0.3.0
    [main] [ 2015-07-24 12:49:35.885 CST ] [OracleHome.getVersion:957]  Current Version From Inventory: 11.2.0.3.0
    [main] [ 2015-07-24 12:49:35.885 CST ] [OracleHome.getVersion:889]  OracleHome.getVersion called.  Current Version: 11.2.0.3.0
    [main] [ 2015-07-24 12:49:35.886 CST ] [OracleHome.getVersion:957]  Current Version From Inventory: 11.2.0.3.0
    [main] [ 2015-07-24 12:49:35.886 CST ] [OracleHome.getVersion:889]  OracleHome.getVersion called.  Current Version: 11.2.0.3.0
    [main] [ 2015-07-24 12:49:35.886 CST ] [OracleHome.getVersion:957]  Current Version From Inventory: 11.2.0.3.0
    [main] [ 2015-07-24 12:49:35.886 CST ] [SQLPlusEngine.getCmmdParams:222]  m_home 11.2.0.3.0
    [main] [ 2015-07-24 12:49:35.887 CST ] [SQLPlusEngine.getCmmdParams:223]  version > 112 true
    [main] [ 2015-07-24 12:49:35.887 CST ] [SQLEngine.getEnvParams:555]  Default NLS_LANG: AMERICAN_AMERICA.AL32UTF8
    [main] [ 2015-07-24 12:49:35.887 CST ] [SQLEngine.getEnvParams:565]  NLS_LANG: AMERICAN_AMERICA.AL32UTF8
    [main] [ 2015-07-24 12:49:35.888 CST ] [SQLEngine.initialize:325]  Execing SQLPLUS/SVRMGR process...
    [main] [ 2015-07-24 12:49:35.900 CST ] [SQLEngine.initialize:362]  m_bReaderStarted: false
    [main] [ 2015-07-24 12:49:35.900 CST ] [SQLEngine.initialize:366]  Starting Reader Thread... 
    [main] [ 2015-07-24 12:49:35.901 CST ] [SQLEngine.initialize:415]  Waiting for m_bReaderStarted to be true 
    [main] [ 2015-07-24 12:49:35.972 CST ] [SQLEngine.done:2189]  Done called
    [main] [ 2015-07-24 12:49:35.972 CST ] [UsmcaLogger.logException:173]  SEVERE:method oracle.sysman.assistants.usmca.backend.USMInstance:configureLocalASM
    [main] [ 2015-07-24 12:49:35.973 CST ] [UsmcaLogger.logException:174]  ORA-01012: not logged on
    
    [main] [ 2015-07-24 12:49:35.973 CST ] [UsmcaLogger.logException:175]  oracle.sysman.assistants.util.sqlEngine.SQLFatalErrorException: ORA-01012: not logged on
    
    oracle.sysman.assistants.util.sqlEngine.SQLEngine.executeImpl(SQLEngine.java:1658)
    oracle.sysman.assistants.util.sqlEngine.SQLEngine.executeQuery(SQLEngine.java:831)
    oracle.sysman.assistants.usmca.backend.USMInstance.configureLocalASM(USMInstance.java:3036)
    oracle.sysman.assistants.usmca.service.UsmcaService.configureLocalASM(UsmcaService.java:1049)
    oracle.sysman.assistants.usmca.model.UsmcaModel.performConfigureLocalASM(UsmcaModel.java:944)
    oracle.sysman.assistants.usmca.model.UsmcaModel.performOperation(UsmcaModel.java:797)
    oracle.sysman.assistants.usmca.Usmca.execute(Usmca.java:174)
    oracle.sysman.assistants.usmca.Usmca.main(Usmca.java:369)
    [main] [ 2015-07-24 12:49:35.989 CST ] [UsmcaLogger.logException:173]  SEVERE:method oracle.sysman.assistants.usmca.backend.USMInstance:configureLocalASM
    [main] [ 2015-07-24 12:49:35.989 CST ] [UsmcaLogger.logException:174]  ORA-03113: end-of-file on communication channel
    
    [main] [ 2015-07-24 12:49:35.989 CST ] [UsmcaLogger.logException:175]  oracle.sysman.assistants.util.sqlEngine.SQLFatalErrorException: ORA-03113: end-of-file on communication channel
    

    这里可以看出来,asm实例无法登陆(ORA-01012和ORA-03113),根据这样的错误,分析asm日志

    Reconfiguration complete
    Fri Jul 24 12:49:29 2015
    LCK0 started with pid=22, OS id=46913 
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lmd0_46887.trc  (incident=81):
    ORA-04031: unable to allocate 7072 bytes of shared memory ("shared pool","unknown object","sga heap(1,1)","ges resource ")
    Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_81/+ASM1_lmd0_46887_i81.trc
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lck0_46913.trc  (incident=177):
    ORA-04031: unable to allocate 760 bytes of shared memory ("shared pool","unknown object","KKSSP^1343","kglss")
    Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_177/+ASM1_lck0_46913_i177.trc
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lmon_46885.trc  (incident=73):
    ORA-04031: unable to allocate 632 bytes of shared memory ("shared pool","unknown object","sga heap(1,1)","name-service ")
    Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_73/+ASM1_lmon_46885_i73.trc
    Use ADRCI or Support Workbench to package the incident.
    See Note 411.1 at My Oracle Support for error and packaging details.
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lck0_46913.trc:
    ORA-04031: unable to allocate 760 bytes of shared memory ("shared pool","unknown object","KKSSP^1343","kglss")
    System state dump requested by (instance=1, osid=46913 (LCK0)), summary=[abnormal instance termination].
    System State dumped to trace file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_diag_46879.trc
    Use ADRCI or Support Workbench to package the incident.
    See Note 411.1 at My Oracle Support for error and packaging details.
    LCK0 (ospid: 46913): terminating the instance due to error 4031
    Fri Jul 24 12:49:35 2015
    ORA-1092 : opitsk aborting process
    Instance terminated by LCK0, pid = 46913
    

    进一步分析asm日志,发现是大家熟悉的asm的ORA-4031问题,那就是说明数据库在执行root.sh的时候使用默认参数文件启动asm的时候shared pool不够大(根据ORACLE最佳实践,建议memory_target=1536M及其以上值),从而出现该问题。类似Bug 14292825 ORA-4031 in ASM as default memory parameters values for 11.2 ASM instances low,根据官方描述该问题在11.2.0.4中修复
    BUG-14292825


    通过asm日志发现相关默认值配置
    Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
    With the Real Application Clusters and Automatic Storage Management options.
    ORACLE_HOME = /u01/app/11.2.0/grid
    System name:	Linux
    Node name:	RAC01
    Release:	2.6.32-358.el6.x86_64
    Version:	#1 SMP Tue Jan 29 11:47:41 EST 2013
    Machine:	x86_64
    Using parameter settings in client-side pfile /u01/app/11.2.0/grid/dbs/init+ASM1.ora on machine RAC01
    System parameters with non-default values:
      large_pool_size          = 16M
      instance_type            = "asm"
      remote_login_passwordfile= "EXCLUSIVE"
      asm_power_limit          = 1
      diagnostic_dest          = "/u01/app/grid"
    Cluster communication is configured to use the following interface(s) for this instance
      10.10.10.31
    cluster interconnect IPC version:Oracle UDP/IP (generic)
    IPC Vendor 1 proto 2
    Fri Jul 24 12:49:27 2015
    

    通过查询/proc/cpuinfo,检查cpu数量

    processor	: 191
    vendor_id	: GenuineIntel
    cpu family	: 6
    model		: 62
    model name	: Intel(R) Xeon(R) CPU E7-8850 v2 @ 2.30GHz
    stepping	: 7
    cpu MHz		: 1200.000
    cache size	: 24576 KB
    physical id	: 7
    siblings	: 24
    core id		: 13
    cpu cores	: 12
    apicid		: 251
    initial apicid	: 251
    fpu		: yes
    fpu_exception	: yes
    cpuid level	: 13
    wp		: yes
    

    而根据How To Determine The Default Number Of Subpools Allocated During Startup (Doc ID 455179.1)中描述
    最多7个subpool(这里一共有192个cpu,因此subpool数量为7)
    1


    每个suppool最少512m内存,因此shared pool最小需要3.5G(而默认值几百M,远远不够)
    2

    由于cpu多,导致shared pool的Subpools 更加多,使得shared pool的需求量更加大。至此本次故障原因可以总结:
    由于cpu较多,需要更多的shared pool,而11.2.0.3中由于asm默认内存分配较少,导致在asm启动之时出现shared pool不足(本身默认值小,而且shared pool需求大,从而出现了ORA-04031就不奇怪了),因为运行root.sh过程中asm无法正常启动,从而使得root.sh运行失败。
    处理办法:临时disable部分cpu,然后重新执行root.sh,修改asm内存分配,再enable cpu.
    特别说明:此故障acs的兄弟遇到过,所以这次我能够快速反应,感谢acs兄弟们的帮忙,另外有权限的朋友可以看看:3-10479952701和3-7976215751等sr描述

    • 因asm sga_target设置不当导致11gr2 rac无法正常启动
    • 查询v$session报ORA-04031错误
    • 误修改/u01权限/所有者的故障恢复
    • 通过Administration Assistant for Windows配置win服务和实例关联性
    • large pool太小导致shared server异常
    • Oracle 小升级 opatch apply使用
    • 在数据库open过程中常遇到ORA-01555汇总
    • Enterprise Manager Database Express 12c 欣赏


沪ICP备19023445号-2号
友情链接