IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    [原]ORACLE 11g 通过ASH结合AWR实战解决cpu高负载的详细过程

    mchdba发表于 2016-09-07 22:11:17
    love 0

     

     

    ASH结合AWR实战解决oracle高负载

     

                线上oracle数据库负载比较高,所以需要优化,一般进去看负载高的进程,如果是oracle进程,那么就是oracle运行不健康出异常了,我们通常可以用ash来分析问题,如果ash还无法解决,我们就需要扩大分析范围,需要采用awr分析日志来进行分析。


     

    1、ASH分析日志分析Events

    Top SQL with Top Events

    SQL ID

    Planhash

    Sampled # of Executions

    % Activity

    Event

    % Event

    Top Row Source

    % RwSrc

    SQL Text

    2xg4n5sj1sx5h

    2047223008

    5066

    11.18

    latch: cache buffers chains

    5.88

    TABLE ACCESS - FULL

    2.34

    select * from (select a.bis_pr...

     

     

     

     

    CPU + Wait for CPU

    5.24

    TABLE ACCESS - FULL

    2.35

    19gkjf4r3zkpw

    2939541073

    1013

    2.23

    latch: cache buffers chains

    1.27

    TABLE ACCESS - FULL

    1.00

    select round(sum(t.current_fac...

    6qhrc4ax6zr1a

    2939541073

    960

    2.12

    latch: cache buffers chains

    1.21

    TABLE ACCESS - FULL

    0.96

    select round(sum(t.current_fac...

    6nmdjjxfv7aqp

    2939541073

    847

    1.87

    latch: cache buffers chains

    1.03

    TABLE ACCESS - FULL

    0.78

    select round(sum(t.current_fac...

    1qq2h1qkxkaha

    2939541073

    776

    1.71

    latch: cache buffers chains

    1.02

    TABLE ACCESS - FULL

    0.83

    select round(sum(t.current_fac...

     

    Top SQL with Top Row Sources

    SQL ID

    PlanHash

    Sampled # of Executions

    % Activity

    Row Source

    % RwSrc

    Top Event

    % Event

    SQL Text

    2xg4n5sj1sx5h

    2047223008

    5066

    11.18

    TABLE ACCESS - FULL

    4.70

    CPU + Wait for CPU

    2.35

    select * from (select a.bis_pr...

     

     

     

     

    TABLE ACCESS - FULL

    3.90

    latch: cache buffers chains

    2.30

     

     

     

     

    TABLE ACCESS - FULL

    1.13

    latch: cache buffers chains

    0.66

    19gkjf4r3zkpw

    2939541073

    1013

    2.23

    TABLE ACCESS - FULL

    1.73

    latch: cache buffers chains

    1.00

    select round(sum(t.current_fac...

    6qhrc4ax6zr1a

    2939541073

    960

    2.12

    TABLE ACCESS - FULL

    1.59

    latch: cache buffers chains

    0.96

    select round(sum(t.current_fac...

    6nmdjjxfv7aqp

    2939541073

    847

    1.87

    TABLE ACCESS - FULL

    1.39

    latch: cache buffers chains

    0.78

    select round(sum(t.current_fac...

    1qq2h1qkxkaha

    2939541073

    776

    1.71

    TABLE ACCESS - FULL

    1.34

    latch: cache buffers chains

    0.83

    select round(sum(t.current_fac...

     

    从这里分析,可以看到2xg4n5sj1sx5h的SQL引发了性能瓶颈,因为短短的20分钟内执行了5066次,而且诱发了latch: cache buffers chains以及TABLE ACCESS – FULL全表扫描,所以针对这个sql,分析后,建立优化索引:

    createindex IDX_PROJECT_ID_BRRQon BIS_REPORT_RENT_QUICK(BIS_PROJECT_ID, SEQUNCE_NO, YEAR, MONTH);

     

    之后继续观察cpu负载,已经降低了一半,但是还不够,还没有达到正常值,而看ash报告已经不能提供帮助了。接下来我们需要更加全面的awr报告。

     

     

    2、Awr看以下个主要方向

    (1)SQL ordered by Sharable Memory

    (2)SQL ordered by CPU Tim

     

     

     

    3、先看SQL ordered by Sharable Memory

    • Only Statements with Sharable Memory greater than 1048576 are displayed

    Sharable Mem (b)

    Executions

    % Total

    SQL Id

    SQL Module

    SQL Text

    5,890,119

    19

    0.04

    8jfndnfr9hct8

     

    select * from ( select * from ...

    4,980,367

    3

    0.04

    3gnf4zamgyb7x

     

    select * from ( select * from ...

    3,816,036

    8

    0.03

    277rt4gknr76h

     

    update POWERDESK.cont_ledger s...

    2,966,255

    21

    0.02

    f2b3a59mdr7wb

     

    select * from ( select * from ...

    2,694,091

    12

    0.02

    a8y6k4gt09xxn

     

    select * from ( select * from ...

    2,316,487

    21

    0.02

    amyymj7a06gum

     

    select * from ( select * from ...

    2,003,767

    18

    0.01

    7z4hxbs26uz8d

     

    select * from ( select * from ...

    2,003,743

    6

    0.01

    96264xfwmxcm7

     

    select * from ( select * from ...

    1,979,443

    12

    0.01

    a8y6k4gt09xxn

     

    select * from ( select * from ...

    1,626,167

    18

    0.01

    7z4hxbs26uz8d

     

    select * from ( select * from ...

    1,354,011

    9

    0.01

    g371ncnjwg9zx

     

    select * from ( select * from ...

    1,325,857

    1,827

    0.01

    fm2hxa2nb9gzx

     

    select count(*) from ( select ...

    1,303,125

    192

    0.01

    dq8xug6mzj7dj

     

    select count(*) from (select r...

    1,277,136

    11

    0.01

    1jubuhx6cdxw8

     

    select sum(rent_square) from( ...

    1,147,400

    4

    0.01

    6gj7zqb64m29q

     

    select count(*) from ( select ...

    1,135,232

    4

    0.01

    6gj7zqb64m29q

     

    select count(*) from ( select ...

    1,126,672

    1

    0.01

    242g0k9qdh0q0

     

    select sum(rent_square) from( ...

    1,064,607

    3,723

    0.01

    7sum7dh4hcmbh

     

    select * from ( select jbpmtas...

     

     

    看到有fm2hxa2nb9gzx这条sql记录,执行次数比较多,有优化的空间:

    SELECT

      COUNT(*)

    FROM

      (SELECT

        res.*,

        1 rec_status

      FROM

        Res_Approve_Info res

      WHEREEXISTS

        (SELECT

          1

        FROM

          Res_Approve_User u

        WHERE u.res_Approve_Info_Id = res.res_Approve_Info_Id

          AND u.user_Cd = :1)

          

        UNION

        SELECT

          res.*,

          9 rec_status

        FROM

          Res_Approve_Info res

        WHEREEXISTS

          (SELECT

            1

          FROM

            Res_Approve_User u,

            Res_Accredit_Info a

          WHERE u.res_Approve_Info_Id = res.res_Approve_Info_Id

            AND U.USER_CD = A.USER_CD

            AND A.ACC_USER_CD = :2)) res

      WHERE1 = 1

        AND res.status_cd = '1'

     

     

    解决办法,添加索引:

    createindex IXU_RES_USER_CD_2 on RES_APPROVE_USER(USER_CD);

     

     

     

    4、再看SQL ordered by CPU Time

    • Resources reported for PL/SQL code includes the resources used by all SQL statements called by the code.
    • %Total - CPU Time as a percentage of Total DB CPU
    • %CPU - CPU Time as a percentage of Elapsed Time
    • %IO - User I/O Time as a percentage of Elapsed Time
    • Captured SQL account for 23.1% of Total CPU Time (s): 11,858
    • Captured PL/SQL account for 0.0% of Total CPU Time (s): 11,858

    CPU Time (s)

    Executions

    CPU per Exec (s)

    %Total

    Elapsed Time (s)

    %CPU

    %IO

    SQL Id

    SQL Module

    SQL Text

    146.67

    11,055

    0.01

    1.24

    192.16

    76.33

    0.00

    4198t22zum3dr

     

    select t1.module_cd, count(t1....

    140.39

    65

    2.16

    1.18

    300.45

    46.73

    0.00

    a8s9kjk7mk6yc

     

    select nvl(sum(nvl(sales_money...

    133.99

    63

    2.13

    1.13

    365.39

    36.67

    0.00

    15uv0hbq9dmm7

     

    select nvl(sum(nvl(sales_money...

    130.97

    65

    2.01

    1.10

    286.23

    45.75

    0.00

    bhnjxfa4av3kw

     

    select nvl(sum(nvl(sales_money...

    127.93

    4,215

    0.03

    1.08

    157.88

    81.03

    0.00

    aajc7r9y7z1qs

     

    select * from ( select this_.d...

    124.72

    63

    1.98

    1.05

    317.81

    39.24

    0.00

    1na6hypkn1q2f

     

    select nvl(sum(nvl(sales_money...

    112.56

    347

    0.32

    0.95

    444.87

    25.30

    0.00

    cpw6nx6gdv937

     

    select sum(rent_square) from( ...

    110.81

    19,013

    0.01

    0.93

    156.71

    70.71

    0.00

    ckkvz3r38xa4k

     

    select * from ( select kmpush0...

    103.00

    19,013

    0.01

    0.87

    136.82

    75.28

    0.00

    8n23w37kwdn3p

     

    select count(*) as col_0_0_ fr...

    97.71

    7,400

    0.01

    0.82

    130.32

    74.98

    0.00

    f1z0k5gx90tqz

     

    select (case when t1.module_cd...

     

     

     

    依据以上awr统计分析cpu time(s)得出的结论,点击进去,找出sql记录,并给出优化方案如下:

     

    属于sql写法不规范的优化:

     

    4198t22zum3dr

    select t1.module_cd, count(t1.jbpm_task_id) num from jbpm_task t1, jbpm_task_candidate t2 where t1.jbpm_task_id=t2.jbpm_task_id and (t2.user_cd=:1 or t2.user_cd like :2 ) group by t1.module_cd order by decode(t1.module_cd, 'mesMeetingInfo', '0', 'resApprove', '1', 'planTarget', '2') asc

    优化建议:

    将order by后面的decode(t1.module_cd, 'mesMeetingInfo', '0', 'resApprove', '1', 'planTarget', '2') asc移到from前面,在外层做order by操作

     

    bhnjxfa4av3kw

    select nvl(sum(nvl(sales_money, 0)), 0) from bis_sales_day where to_char(sales_date, 'yyyy-MM')='2016-06' and bis_cont_id in (select bis_cont_id from bis_cont bc where cont_type_cd in ('1', '2') and ((bc.status_cd = '2' and '2016-06' >= to_char(bc.cont_start_date, 'yyyy-MM') and to_char(bc.cont_to_fail_date, 'yyyy-MM')>='2016-06') or (bc.status_cd in ('1', '3') and '2016-06' >= to_char(bc.cont_start_date, 'yyyy-MM') and to_char(bc.cont_end_date, 'yyyy-MM')>='2016-06')) and bc.bis_project_id in (select bis_project_id from bis_project where is_business_project ='1' ) and bc.store_type='1' )

    优化建议:

    将where后面 in的子查询判断字句变成 exists判断。

    15uv0hbq9dmm7

    select nvl(sum(nvl(sales_money, 0)), 0) from bis_sales_day where to_char(sales_date, 'yyyy-MM')='2016-07' and bis_cont_id in (select bis_cont_id from bis_cont bc where cont_type_cd in ('1', '2') and ((bc.status_cd = '2' and '2016-07' >= to_char(bc.cont_start_date, 'yyyy-MM') and to_char(bc.cont_to_fail_date, 'yyyy-MM')>='2016-07') or (bc.status_cd in ('1', '3') and '2016-07' >= to_char(bc.cont_start_date, 'yyyy-MM') and to_char(bc.cont_end_date, 'yyyy-MM')>='2016-07')) and bc.bis_project_id in (select bis_project_id from bis_project where is_business_project ='1' ) and bc.store_type in ('1', '2') )

    优化建议:

    将where后面 in的子查询判断字句变成 exists判断。

    bhnjxfa4av3kw

    select nvl(sum(nvl(sales_money, 0)), 0) from bis_sales_day where to_char(sales_date, 'yyyy-MM')='2016-06' and bis_cont_id in (select bis_cont_id from bis_cont bc where cont_type_cd in ('1', '2') and ((bc.status_cd = '2' and '2016-06' >= to_char(bc.cont_start_date, 'yyyy-MM') and to_char(bc.cont_to_fail_date, 'yyyy-MM')>='2016-06') or (bc.status_cd in ('1', '3') and '2016-06' >= to_char(bc.cont_start_date, 'yyyy-MM') and to_char(bc.cont_end_date, 'yyyy-MM')>='2016-06')) and bc.bis_project_id in (select bis_project_id from bis_project where is_business_project ='1' ) and bc.store_type='1' )

    优化建议:

    将where后面 in的子查询判断字句变成 exists判断。

     

     

     

     

    属于sql中添加索引的优化:

    aajc7r9y7z1qs

    select * from ( select this_.dly_note_id as dly1_396_0_, this_.created_center_cd as created2_396_0_, this_.created_date as created3_396_0_, this_.created_dept_cd as created4_396_0_, this_.created_position_cd as created5_396_0_, this_.creator as creator396_0_, this_.note_biz_type_cd as note7_396_0_, this_.note_content as note8_396_0_, this_.note_title as note9_396_0_, this_.record_version as record10_396_0_, this_.remark as remark396_0_, this_.send_mail_flg as send12_396_0_, this_.seq as seq396_0_, this_.tip_end_date as tip14_396_0_, this_.tip_flg as tip15_396_0_, this_.tip_period_cd as tip16_396_0_, this_.tip_start_date as tip17_396_0_, this_.tip_type_cd as tip18_396_0_, this_.updated_center_cd as updated19_396_0_, this_.updated_date as updated20_396_0_, this_.updated_dept_cd as updated21_396_0_, this_.updated_position_cd as updated22_396_0_, this_.updator as updator396_0_ from POWERDESK.dly_note this_ where this_.creator=:1 order by this_.seq asc ) where rownum <= :2

    优化方案,添加索引:createindex IDX_CREATOR on DLY_NOTE (creator);

     

     

     

     

    在添加好IDX_CREATOR、IXU_RES_USER_CD_2 这2个索引后,cpu负载已经再次降低了一半,虽然没有到正常值,但是已经接近最低级别的告警线,暂时数据库脱离危险了,就等其它的几个需要修改sql的优化继续完成后,cpu负载基本就可以恢复到正常值了。

     

    所以,一般oracle服务器负载飙升,90%都是由不规范的sql、性能差的sql造成的,而ash和awr分析报告是我们解决的最有效的途径。



沪ICP备19023445号-2号
友情链接