IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    RESMGR:cpu quantum等待事件处理过程

    老A的自留地,欢迎加微信交流,微信号zhoul777发表于 2017-02-27 18:23:22
    love 0
    由于数据库上线过程中出现大量的RESMGR:cpu quantum等待事件,出现性能问题,关闭了resource manager功能,关闭过程如下:
    ALTER SYSTEM SET “_resource_manager_always_on”=FALSE SCOPE=SPFILE SID='*';

       execute dbms_scheduler.set_attribute('SATURDAY_WINDOW','RESOURCE_PLAN','');
       execute dbms_scheduler.set_attribute('SUNDAY_WINDOW','RESOURCE_PLAN','');
       execute dbms_scheduler.set_attribute('MONDAY_WINDOW','RESOURCE_PLAN','');
       execute dbms_scheduler.set_attribute('TUESDAY_WINDOW','RESOURCE_PLAN','');
       execute dbms_scheduler.set_attribute('WEDNESDAY_WINDOW','RESOURCE_PLAN','');
       execute dbms_scheduler.set_attribute('THURSDAY_WINDOW','RESOURCE_PLAN','');
       execute dbms_scheduler.set_attribute('FRIDAY_WINDOW','RESOURCE_PLAN','');

    #4 重启数据库使生效

    重启后,resource manager 关闭,数据库不在出现RESMGR:cpu quantum等待事件。
    但是数据库自动任务计划调度开始报错,后台报错如下:
    ORA-12012: error on auto execute of job "SYS"."ORA$AT_SQ_SQL_SW_186"
    ORA-29373: resource manager is not on

    视图检查报错:
    SQL> select client_name,window_name,job_name,job_status,job_start_time from dba_autotask_job_history ;

    CLIENT_NAME          WINDOW_NAME          JOB_NAME                       JOB_STATUS                     JOB_START_TIME
    -------------------- -------------------- ------------------------------ ------------------------------ ----------------------------------------
    auto optimizer stats WEDNESDAY_WINDOW     ORA$AT_OS_OPT_SY_102           FAILED                         09-APR-14 10.00.01.582006 PM PRC
    collection

    auto optimizer stats FRIDAY_WINDOW        ORA$AT_OS_OPT_SY_105           FAILED                         11-APR-14 10.00.06.999466 PM PRC
    collection

    auto optimizer stats TUESDAY_WINDOW       ORA$AT_OS_OPT_SY_99            FAILED                         08-APR-14 10.00.07.885416 PM PRC
    collection


    三 前期故障排查
    为了确认故障原因,我们对该报错进行打开29373 errorstack,收集了报错中的详细跟踪日志:
    后台alert报错如下:
    Sun May 04 06:00:04 2014
    Dumping diagnostic data in directory=[cdmp_20140504060004], requested by (instance=1, osid=55167 (J001)), summary=[abnormal process termination].
    Errors in file /oracle/database/diag/rdbms/nfdb/nfdb1/trace/nfdb1_j001_55167.trc:
    ORA-29373: resource manager is not on
    Errors in file /oracle/database/diag/rdbms/nfdb/nfdb1/trace/nfdb1_j001_55167.trc:
    ORA-12012: error on auto execute of job "SYS"."ORA$AT_OS_OPT_SY_271"
    ORA-29373: resource manager is not on
    Dumping diagnostic data in directory=[cdmp_20140504060009], requested by (instance=1, osid=55167 (J001)), summary=[abnormal process termination].


    详细的trace跟踪报告如下:
    dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=3, mask=0x0)
    ----- Error Stack Dump -----
    ORA-29373: resource manager is not on
    ----- SQL Statement (None) -----
    Current SQL information unavailable - no cursor.

    从trace文件中我们也只能确认是因为resource manager造成JOB无法调度

    不过在检测中我们发现,管理resource manager的进程DBRM依旧在数据库中存在,
    #ps –ef | grep dbrm
    oracle 34920 1 0 Apr18 ? 00:00:59  ora_dbrm_nfdb1


    这从一定程度上给予我们一定的怀疑方向,可能存在resource并没有完全关闭,而且从详细的trace文件中的从call stack trace里来看, 当前进程也没有没有关于resource manager的函数调用,而是一直在向另外一个进程post message。
    所以,我们怀疑错误很有可能是由于DBRM没有正常关闭造成。

    现对以上分析结果进行判断,获取可能依据

    四 故障模拟

    故障模拟一共分为如下几个部分:
    步骤 操作 结果
    Step1 和故障操作一样,设置“_resource_manager_always_on隐含参数,关闭resource manager windows 调用计划 后台报错
    Step2 删除隐含参数,只设置resource manager windows 调用计划关闭 后台不报错
    Step3 添加2个隐含参数,关闭resource manager windows 调用计划 后台不报错

    详细测试过程如下:
    测试我们采用将数据库时间设置到22点,让其自动调度JOB计划
    4.1 模拟管理库故障环境
    关闭数据库,调整时间,设置隐含参数,关闭windows 计划
    关闭数据库:
    SQL> shutdown immediate
    Database closed.
    Database dismounted.
    ORACLE instance shut down.

    调整时间:
    [root@rhel6 ~]# date -s 2014/05/14
    [root@rhel6 ~]# date -s 21:57:00
    [root@rhel6 ~]# clock -w
    [root@rhel6 ~]# date
    Wed May 14 21:57:35 CST 2014

    添加隐含参数,启动至open:
    SQL> ALTER SYSTEM SET "_resource_manager_always_on"=FALSE SCOPE=SPFILE;
    SQL> startup force(测试环境,直接force启动,生产环境勿如此操作)

    设置resource manager plan:
    execute dbms_scheduler.set_attribute('SATURDAY_WINDOW','RESOURCE_PLAN','');
    execute dbms_scheduler.set_attribute('SUNDAY_WINDOW','RESOURCE_PLAN','');
    execute dbms_scheduler.set_attribute('MONDAY_WINDOW','RESOURCE_PLAN','');
    execute dbms_scheduler.set_attribute('TUESDAY_WINDOW','RESOURCE_PLAN','');
    execute dbms_scheduler.set_attribute('WEDNESDAY_WINDOW','RESOURCE_PLAN','');
    execute dbms_scheduler.set_attribute('THURSDAY_WINDOW','RESOURCE_PLAN','');
    execute dbms_scheduler.set_attribute('FRIDAY_WINDOW','RESOURCE_PLAN','');

    我们观察22点的alert信息,确实开始报错:
    Wed May 14 22:00:03 2014
    Errors in file /oracle/ora11g/base/diag/rdbms/ora11g/ora11g/trace/ora11g_j003_4452.trc:
    ORA-12012: error on auto execute of job "SYS"."ORA$AT_OS_OPT_SY_27"
    ORA-29373: resource manager is not on
    Wed May 14 22:00:03 2014
    Errors in file /oracle/ora11g/base/diag/rdbms/ora11g/ora11g/trace/ora11g_j004_4454.trc:
    ORA-12012: error on auto execute of job "SYS"."ORA$AT_SA_SPC_SY_28"
    ORA-29373: resource manager is not on
    Wed May 14 22:00:03 2014
    Errors in file /oracle/ora11g/base/diag/rdbms/ora11g/ora11g/trace/ora11g_j005_4456.trc:
    ORA-12012: error on auto execute of job "SYS"."ORA$AT_SQ_SQL_SW_29"
    ORA-29373: resource manager is not on
    Wed May 14 22:00:04 2014
    XDB installed.
    XDB initialized.

    检查DBRM进程已经存在:
    [root@rhel6 ~]# ps -ef | grep dbrm
    ora11g    4346     1  0 21:57 ?        00:00:00 ora_dbrm_ora11g


    检查后台JOB执行记录视图:
    SQL> select CLIENT_NAME,WINDOW_NAME,JOB_NAME,JOB_STATUS,JOB_START_TIME from DBA_AUTOTASK_JOB_HISTORY where CLIENT_NAME='auto optimizer stats collection' order by JOB_START_TIME desc;

    CLIENT_NAME                         WINDOW_NAME                    JOB_NAME                       JOB_STATUS                     JOB_START_TIME
    ----------------------------------- ------------------------------ ------------------------------ ------------------------------ ----------------------------------------
    auto optimizer stats collection     WEDNESDAY_WINDOW               ORA$AT_OS_OPT_SY_27            FAILED                         14-MAY-14 10.00.03.233570 PM PRC

    模拟过程与生产管理库一致,结果也一致,DBRM进程已经存在。


    4.2 模拟去除隐含参数
    关闭数据库,调整时间,去除隐含参数,关闭windows 计划
    关闭数据库:
    SQL> shutdown immediate
    Database closed.
    Database dismounted.
    ORACLE instance shut down.

    调整时间:
    [root@rhel6 ~]# date -s 2014/05/14
    [root@rhel6 ~]# date -s 21:57:00
    [root@rhel6 ~]# clock -w
    [root@rhel6 ~]# date
    Wed May 14 21:57:35 CST 2014

    去除隐含参数,启动至open:
    隐含参数去除采用create pfile from spfile;
    删除spfile,编辑pfile文件,删除隐含参数,以pfile启动数据库

    设置resource manager plan:(由于前面已经设置过,无需再设置)

    我们观察22点的alert信息,发现没有报错
    Tue May 13 22:00:03 2014
    Begin automatic SQL Tuning Advisor run for special tuning task  "SYS_AUTO_SQL_TUNING_TASK"
    End automatic SQL Tuning Advisor run for special tuning task  "SYS_AUTO_SQL_TUNING_TASK"
    Tue May 13 22:00:05 2014
    XDB installed.
    XDB initialized.


    检查 DBRM进程:
    [root@rhel6 ~]# ps -ef | grep dbrm
    ora11g    3844     1  0 21:55 ?        00:00:00 ora_dbrm_ora11g

    说明:此时resource manager由于只是关闭了resource manager plan计划,没有真正关闭resource manager 因此该进程依旧存在。

    检查后台JOB执行视图信息:
    SQL> select CLIENT_NAME,WINDOW_NAME,JOB_NAME,JOB_STATUS,JOB_START_TIME from DBA_AUTOTASK_JOB_HISTORY where CLIENT_NAME='auto optimizer stats collection' order by JOB_START_TIME desc;

    CLIENT_NAME                              WINDOW_NAME                    JOB_NAME                  JOB_STATUS           JOB_START_TIME
    ---------------------------------------- ------------------------------ ------------------------- -------------------- ----------------------------------------
    auto optimizer stats collection          TUESDAY_WINDOW                 ORA$AT_OS_OPT_SY_24       SUCCEEDED            13-MAY-14 10.00.02.102741 PM PRC

    说明在隐含参数除掉的情况下,JOB可以正常执行,后台没有报错。

    4.3 模拟隐含参数及resource manager plan均存在的情况
    关闭数据库,调整时间,添加隐含参数,关闭windows 计划
    关闭数据库:
    SQL> shutdown immediate
    Database closed.
    Database dismounted.
    ORACLE instance shut down.
    调整时间:
    [root@rhel6 ~]# date -s 2014/05/15
    [root@rhel6 ~]# date -s 21:57:00
    [root@rhel6 ~]# clock -w
    [root@rhel6 ~]# date
    Thu May 15 21:57:35 CST 2014
    添加隐含参数,启动至open:
    SQL> alter system set "_resource_manager_always_off"=true scope=spfile;
    SQL> ALTER SYSTEM SET "_resource_manager_always_on"=FALSE SCOPE=SPFILE;
    SQL> startup force(测试环境,直接force启动,生产环境勿如此操作)
    设置resource manager plan:(由于前面已经设置过,无需再设置)

    我们观察22点的alert信息,发现没有报错
    Thu May 15 22:00:03 2014
    Begin automatic SQL Tuning Advisor run for special tuning task  "SYS_AUTO_SQL_TUNING_TASK"
    Thu May 15 22:00:05 2014
    XDB installed.
    XDB initialized.
    End automatic SQL Tuning Advisor run for special tuning task  "SYS_AUTO_SQL_TUNING_TASK"

    检查 DBRM进程:
    [root@rhel6 ~]# ps -ef | grep dbrm
    root      4650  3647  0 21:58 pts/2    00:00:00 grep dbrm

    说明:此时DBRM进程消失

    检查后台JOB执行视图信息:
    SQL> select CLIENT_NAME,WINDOW_NAME,JOB_NAME,JOB_STATUS,JOB_START_TIME from DBA_AUTOTASK_JOB_HISTORY where CLIENT_NAME='auto optimizer stats collection' order by JOB_START_TIME desc;

    CLIENT_NAME                         WINDOW_NAME                    JOB_NAME                       JOB_STATUS                     JOB_START_TIME
    ----------------------------------- ------------------------------ ------------------------------ ------------------------------ ----------------------------------------
    auto optimizer stats collection     THURSDAY_WINDOW                ORA$AT_OS_OPT_SY_30            SUCCEEDED                      15-MAY-14 10.00.02.115232 PM PRC

    说明在隐含参数两个都添加的情况下,完全屏蔽resource manager的的情况下,JOB可以正常执行,后台没有报错。


    五 总结说明
    以上测试结果证明,后台报错JOB执行失败原因应该是DBRM进程依旧活动,而DBRM进程是管理RESOURCE manager 当去除"_resource_manager_always_off"=true及"_resource_manager_always_on"=FALSE
    或者将两个参数全部添加,均可避免该错误,统计信息自动收集也可以自动执行
    由于管理数据库相对重要,且上线时候出现过严重的RESMGR:cpu quantum等待事件,建议不要移除隐含参数,而是添加"_resource_manager_always_off"=true隐含参数,重启数据库

    当然,可以先在另外两套库上进行确认,再在管理库上进行操作


    已有 0 人发表留言,猛击->>这里<<-参与讨论


    ITeye推荐
    • —软件人才免语言低担保 赴美带薪读研!—





沪ICP备19023445号-2号
友情链接