IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    ORA-00600[kjhn_post_ha_alert0-862]原因分析

    惜分飞发表于 2015-06-23 15:52:27
    love 0

    联系:手机(13429648788) QQ(107644445)

    标题:ORA-00600[kjhn_post_ha_alert0-862]原因分析

    作者:惜分飞©版权所有[文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.]

    数据库版本和平台信息
    数据库版本为10.2.0.1版本,而且是32位的win 2003 sp2之上

    ORACLE V10.2.0.1.0 - Production vsnsta=0
    vsnsql=14 vsnxtr=3
    Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
    With the Partitioning, OLAP and Data Mining options
    Windows Server 2003 Version V5.2 Service Pack 2
    CPU                 : 2 - type 586, 1 Physical Cores
    Process Affinity    : 0x00000000
    Memory (Avail/Total): Ph:2608M/3990M, Ph+PgF:4511M/5871M, VA:1242M/2047M
    Instance name: orcl
    

    数据库报大量ORA-600[kjhn_post_ha_alert0-862]错误
    数据库的mmon进程报大量ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], []错误

    Wed Jun 03 21:50:40 2015
    Restarting dead background process MMON
    MMON started with pid=11, OS id=3804
    Wed Jun 03 21:50:43 2015
    Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_mmon_3804.trc:
    ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], []
    
    Wed Jun 03 21:50:49 2015
    Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_mmon_3804.trc:
    ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], []
    
    Wed Jun 03 21:55:44 2015
    Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_mmon_3804.trc:
    ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], []
    
    Wed Jun 03 21:55:49 2015
    Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_mmon_3804.trc:
    ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], []
    
    Wed Jun 03 22:00:40 2015
    Thread 1 advanced to log sequence 476
      Current log# 1 seq# 476 mem# 0: E:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\REDO01.LOG
    Wed Jun 03 22:00:44 2015
    Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_mmon_3804.trc:
    ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], []
    

    查询对应trace文件发现

    ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [] , [], [], [], [] 
    Current SQL statement for this session: 
    BEGIN :success := dbms_ha_alerts_prvt.check_ha_resources; END;
    

    人工执行该过程

    SQL> var success varchar2
    SQL> begin
      2  :success := sys.dbms_ha_alerts_prvt.check_ha_resources;
      3  end;
      4  /
     
    PL/SQL procedure successfully completed.
     
    SQL> print success
     
    SUCCESS                                                                       
      
    --------------------------------                                              
      
    N      
    

    通过查询相关资料得到如下说明

    @ This check is triggered with FAN enabled at this instance and it seems to be 
    @ associated with a startup action. From the procedure itself which is called 
    @ this is a run-once MMON (startup) action which supports instance down 
    @ notification reliability. It does the folowing a) registers the current 
    @ instance incarnation in recent_resource_incarnations$ if it's not already 
    @ there b) deletes recent_resource_incarnations$ records that don't apply to 
    @ this database. They may, e.g., have been copied from seed db or from a former 
    @ DataGuard primary c) scans recent_resource_incarnations$ for instance 
    @ incarnations that are no longer alive, and submits instance down alerts for 
    @ them . If all is good then return 'Y' else 'N' (or error) if there is a 
    @ failure. That failure is to get back to MMON, so that it may retry this 
    @ action later. In the local instance I get a 'Y' but in the customer's system 
    @ it fails with a 'N' which seems related to the ORA-600 assert.
    
    @ This function is kjhn_post_ha_alert0() which is internal and does the real work of 
    @ posting HA alerts. It is used by both kjhn_post_ha_alert and 
    @ kjn_post_ha_alert_plsql. Its parameters are basically the same as those of 
    @ kjhn_post_ha_alert,other than the fact that it uses individual parameters 
    @ rather than the more easily extensible structure. Also the parameters passed 
    @ to it are the instance_name and the host_name which is the kernelized 
    @ implementation for posting HA alerts. Without actually having the arguments 
    @ the guess is that either the host_name or the instance_name raised in the 
    @ assert is null which triggered it.
    

    mmon进程尝试调用相关程序,然后无法得出正确值,返回N,然后会一直尝试,如果不能得到返回Y,就会一直报ORA-600,错误.通过上述的三种情况来说,都和recent_resource_incarnations$表有关系.
    该故障原因是由于:mmon在调用kjhn_post_ha_alert0函数在执行的时候,如果发现参数host_name或者instance_name为null,就会报该错误出来.

    处理方法
    This problem has been documented as Bug 5173066 REPEATED ORA-600 [KJHN_POST_HA_ALERT0-862] FROM MMON PROCESS.
    The bug is fixed in 11.1.0.6. A workaround is available for the problem.
    该bug在11.1.0.6中得以修复

    To implement the workaround, please execute the following steps as the SYS user: 
    
    1. Collect the following information and spool it to a file for your records. 
    
    a. output of select * from v$instance 
    b. show parameter instance_name 
    c. set pages 1000 
    d. select * from recent_resource_incarnations$ 
    
    2. Create a backup table of recent_resource_incarnations$. 
    
    SQL> create table recent_resource_inc$bk as select * from recent_resource_incarnations$; 
    
    
    3. Truncate recent_resource_incarnations$. Be sure to do this while the instance is up and running.
        Do not issue this statement if a shutdown is pending. 
    
    SQL> truncate table recent_resource_incarnations$; 
    
    
    4. Perform a clean shutdown, followed by a startup.
    

    具体参考:
    ORA-600 [kjhn_post_ha_alert0-862] Continuously Repeated in the Alert Log (Doc ID 401640.1)
    Bug 5173066 : REPEATED ORA-600 [KJHN_POST_HA_ALERT0-862] FROM MMON PROCESS

    • ORA-00600[kcfrbd_3]故障解决
    • spfile被覆盖导致ORA-600[kmgs_parameter_update_timeout_1]
    • ORA-600[4194]/[4193]解决
    • 记录另一起ORA-00600[13013]处理
    • ORA-600 kghstack_free2异常恢复
    • 记录一次ORA-600 3004 恢复过程和处理思路
    • ORA-607/ORA-600[4194]不一定是重大灾难
    • ORA-600[6749] 发生在 SYSMAN.MGMT_METRICS_RAW表
    • undo异常总结和恢复思路
    • 记录一次ORA-00600[kdxlin:psno out of range]/ORA-00600[3020]/ORA-00600[4000]/ORA-00600[4193]的数据库恢复
    • rac redo log file被意外覆盖数据库恢复
    • undo segment header坏块异常恢复


沪ICP备19023445号-2号
友情链接