ASH结合AWR实战解决oracle高负载
线上oracle数据库负载比较高,所以需要优化,一般进去看负载高的进程,如果是oracle进程,那么就是oracle运行不健康出异常了,我们通常可以用ash来分析问题,如果ash还无法解决,我们就需要扩大分析范围,需要采用awr分析日志来进行分析。
1、ASH分析日志分析Events
Top SQL with Top Events
SQL ID | Planhash | Sampled # of Executions | % Activity | Event | % Event | Top Row Source | % RwSrc | SQL Text |
2047223008 | 5066 | 11.18 | latch: cache buffers chains | 5.88 | TABLE ACCESS - FULL | 2.34 | select * from (select a.bis_pr... | |
|
|
|
| CPU + Wait for CPU | 5.24 | TABLE ACCESS - FULL | 2.35 | |
2939541073 | 1013 | 2.23 | latch: cache buffers chains | 1.27 | TABLE ACCESS - FULL | 1.00 | select round(sum(t.current_fac... | |
2939541073 | 960 | 2.12 | latch: cache buffers chains | 1.21 | TABLE ACCESS - FULL | 0.96 | select round(sum(t.current_fac... | |
2939541073 | 847 | 1.87 | latch: cache buffers chains | 1.03 | TABLE ACCESS - FULL | 0.78 | select round(sum(t.current_fac... | |
2939541073 | 776 | 1.71 | latch: cache buffers chains | 1.02 | TABLE ACCESS - FULL | 0.83 | select round(sum(t.current_fac... |
Top SQL with Top Row Sources
SQL ID | PlanHash | Sampled # of Executions | % Activity | Row Source | % RwSrc | Top Event | % Event | SQL Text |
2047223008 | 5066 | 11.18 | TABLE ACCESS - FULL | 4.70 | CPU + Wait for CPU | 2.35 | select * from (select a.bis_pr... | |
|
|
|
| TABLE ACCESS - FULL | 3.90 | latch: cache buffers chains | 2.30 | |
|
|
|
| TABLE ACCESS - FULL | 1.13 | latch: cache buffers chains | 0.66 | |
2939541073 | 1013 | 2.23 | TABLE ACCESS - FULL | 1.73 | latch: cache buffers chains | 1.00 | select round(sum(t.current_fac... | |
2939541073 | 960 | 2.12 | TABLE ACCESS - FULL | 1.59 | latch: cache buffers chains | 0.96 | select round(sum(t.current_fac... | |
2939541073 | 847 | 1.87 | TABLE ACCESS - FULL | 1.39 | latch: cache buffers chains | 0.78 | select round(sum(t.current_fac... | |
2939541073 | 776 | 1.71 | TABLE ACCESS - FULL | 1.34 | latch: cache buffers chains | 0.83 | select round(sum(t.current_fac... |
从这里分析,可以看到2xg4n5sj1sx5h的SQL引发了性能瓶颈,因为短短的20分钟内执行了5066次,而且诱发了latch: cache buffers chains以及TABLE ACCESS – FULL全表扫描,所以针对这个sql,分析后,建立优化索引:
createindex IDX_PROJECT_ID_BRRQon BIS_REPORT_RENT_QUICK(BIS_PROJECT_ID, SEQUNCE_NO, YEAR, MONTH);
之后继续观察cpu负载,已经降低了一半,但是还不够,还没有达到正常值,而看ash报告已经不能提供帮助了。接下来我们需要更加全面的awr报告。
2、Awr看以下个主要方向
(1)SQL ordered by Sharable Memory
(2)SQL ordered by CPU Tim
3、先看SQL ordered by Sharable Memory
Sharable Mem (b) | Executions | % Total | SQL Id | SQL Module | SQL Text |
5,890,119 | 19 | 0.04 |
| select * from ( select * from ... | |
4,980,367 | 3 | 0.04 |
| select * from ( select * from ... | |
3,816,036 | 8 | 0.03 |
| update POWERDESK.cont_ledger s... | |
2,966,255 | 21 | 0.02 |
| select * from ( select * from ... | |
2,694,091 | 12 | 0.02 |
| select * from ( select * from ... | |
2,316,487 | 21 | 0.02 |
| select * from ( select * from ... | |
2,003,767 | 18 | 0.01 |
| select * from ( select * from ... | |
2,003,743 | 6 | 0.01 |
| select * from ( select * from ... | |
1,979,443 | 12 | 0.01 |
| select * from ( select * from ... | |
1,626,167 | 18 | 0.01 |
| select * from ( select * from ... | |
1,354,011 | 9 | 0.01 |
| select * from ( select * from ... | |
1,325,857 | 1,827 | 0.01 |
| select count(*) from ( select ... | |
1,303,125 | 192 | 0.01 |
| select count(*) from (select r... | |
1,277,136 | 11 | 0.01 |
| select sum(rent_square) from( ... | |
1,147,400 | 4 | 0.01 |
| select count(*) from ( select ... | |
1,135,232 | 4 | 0.01 |
| select count(*) from ( select ... | |
1,126,672 | 1 | 0.01 |
| select sum(rent_square) from( ... | |
1,064,607 | 3,723 | 0.01 |
| select * from ( select jbpmtas... |
看到有fm2hxa2nb9gzx这条sql记录,执行次数比较多,有优化的空间:
SELECT COUNT(*) FROM (SELECT res.*, 1 rec_status FROM Res_Approve_Info res WHEREEXISTS (SELECT 1 FROM Res_Approve_User u WHERE u.res_Approve_Info_Id = res.res_Approve_Info_Id AND u.user_Cd = :1)
UNION SELECT res.*, 9 rec_status FROM Res_Approve_Info res WHEREEXISTS (SELECT 1 FROM Res_Approve_User u, Res_Accredit_Info a WHERE u.res_Approve_Info_Id = res.res_Approve_Info_Id AND U.USER_CD = A.USER_CD AND A.ACC_USER_CD = :2)) res WHERE1 = 1 AND res.status_cd = '1' |
解决办法,添加索引:
createindex IXU_RES_USER_CD_2 on RES_APPROVE_USER(USER_CD);
4、再看SQL ordered by CPU Time
CPU Time (s) | Executions | CPU per Exec (s) | %Total | Elapsed Time (s) | %CPU | %IO | SQL Id | SQL Module | SQL Text |
146.67 | 11,055 | 0.01 | 1.24 | 192.16 | 76.33 | 0.00 |
| select t1.module_cd, count(t1.... | |
140.39 | 65 | 2.16 | 1.18 | 300.45 | 46.73 | 0.00 |
| select nvl(sum(nvl(sales_money... | |
133.99 | 63 | 2.13 | 1.13 | 365.39 | 36.67 | 0.00 |
| select nvl(sum(nvl(sales_money... | |
130.97 | 65 | 2.01 | 1.10 | 286.23 | 45.75 | 0.00 |
| select nvl(sum(nvl(sales_money... | |
127.93 | 4,215 | 0.03 | 1.08 | 157.88 | 81.03 | 0.00 |
| select * from ( select this_.d... | |
124.72 | 63 | 1.98 | 1.05 | 317.81 | 39.24 | 0.00 |
| select nvl(sum(nvl(sales_money... | |
112.56 | 347 | 0.32 | 0.95 | 444.87 | 25.30 | 0.00 |
| select sum(rent_square) from( ... | |
110.81 | 19,013 | 0.01 | 0.93 | 156.71 | 70.71 | 0.00 |
| select * from ( select kmpush0... | |
103.00 | 19,013 | 0.01 | 0.87 | 136.82 | 75.28 | 0.00 |
| select count(*) as col_0_0_ fr... | |
97.71 | 7,400 | 0.01 | 0.82 | 130.32 | 74.98 | 0.00 |
| select (case when t1.module_cd... |
依据以上awr统计分析cpu time(s)得出的结论,点击进去,找出sql记录,并给出优化方案如下:
属于sql写法不规范的优化:
4198t22zum3dr | select t1.module_cd, count(t1.jbpm_task_id) num from jbpm_task t1, jbpm_task_candidate t2 where t1.jbpm_task_id=t2.jbpm_task_id and (t2.user_cd=:1 or t2.user_cd like :2 ) group by t1.module_cd order by decode(t1.module_cd, 'mesMeetingInfo', '0', 'resApprove', '1', 'planTarget', '2') asc | 优化建议: 将order by后面的decode(t1.module_cd, 'mesMeetingInfo', '0', 'resApprove', '1', 'planTarget', '2') asc移到from前面,在外层做order by操作 |
| ||
bhnjxfa4av3kw | select nvl(sum(nvl(sales_money, 0)), 0) from bis_sales_day where to_char(sales_date, 'yyyy-MM')='2016-06' and bis_cont_id in (select bis_cont_id from bis_cont bc where cont_type_cd in ('1', '2') and ((bc.status_cd = '2' and '2016-06' >= to_char(bc.cont_start_date, 'yyyy-MM') and to_char(bc.cont_to_fail_date, 'yyyy-MM')>='2016-06') or (bc.status_cd in ('1', '3') and '2016-06' >= to_char(bc.cont_start_date, 'yyyy-MM') and to_char(bc.cont_end_date, 'yyyy-MM')>='2016-06')) and bc.bis_project_id in (select bis_project_id from bis_project where is_business_project ='1' ) and bc.store_type='1' ) | 优化建议: 将where后面 in的子查询判断字句变成 exists判断。 |
15uv0hbq9dmm7 | select nvl(sum(nvl(sales_money, 0)), 0) from bis_sales_day where to_char(sales_date, 'yyyy-MM')='2016-07' and bis_cont_id in (select bis_cont_id from bis_cont bc where cont_type_cd in ('1', '2') and ((bc.status_cd = '2' and '2016-07' >= to_char(bc.cont_start_date, 'yyyy-MM') and to_char(bc.cont_to_fail_date, 'yyyy-MM')>='2016-07') or (bc.status_cd in ('1', '3') and '2016-07' >= to_char(bc.cont_start_date, 'yyyy-MM') and to_char(bc.cont_end_date, 'yyyy-MM')>='2016-07')) and bc.bis_project_id in (select bis_project_id from bis_project where is_business_project ='1' ) and bc.store_type in ('1', '2') ) | 优化建议: 将where后面 in的子查询判断字句变成 exists判断。 |
bhnjxfa4av3kw | select nvl(sum(nvl(sales_money, 0)), 0) from bis_sales_day where to_char(sales_date, 'yyyy-MM')='2016-06' and bis_cont_id in (select bis_cont_id from bis_cont bc where cont_type_cd in ('1', '2') and ((bc.status_cd = '2' and '2016-06' >= to_char(bc.cont_start_date, 'yyyy-MM') and to_char(bc.cont_to_fail_date, 'yyyy-MM')>='2016-06') or (bc.status_cd in ('1', '3') and '2016-06' >= to_char(bc.cont_start_date, 'yyyy-MM') and to_char(bc.cont_end_date, 'yyyy-MM')>='2016-06')) and bc.bis_project_id in (select bis_project_id from bis_project where is_business_project ='1' ) and bc.store_type='1' ) | 优化建议: 将where后面 in的子查询判断字句变成 exists判断。 |
|
属于sql中添加索引的优化:
aajc7r9y7z1qs | select * from ( select this_.dly_note_id as dly1_396_0_, this_.created_center_cd as created2_396_0_, this_.created_date as created3_396_0_, this_.created_dept_cd as created4_396_0_, this_.created_position_cd as created5_396_0_, this_.creator as creator396_0_, this_.note_biz_type_cd as note7_396_0_, this_.note_content as note8_396_0_, this_.note_title as note9_396_0_, this_.record_version as record10_396_0_, this_.remark as remark396_0_, this_.send_mail_flg as send12_396_0_, this_.seq as seq396_0_, this_.tip_end_date as tip14_396_0_, this_.tip_flg as tip15_396_0_, this_.tip_period_cd as tip16_396_0_, this_.tip_start_date as tip17_396_0_, this_.tip_type_cd as tip18_396_0_, this_.updated_center_cd as updated19_396_0_, this_.updated_date as updated20_396_0_, this_.updated_dept_cd as updated21_396_0_, this_.updated_position_cd as updated22_396_0_, this_.updator as updator396_0_ from POWERDESK.dly_note this_ where this_.creator=:1 order by this_.seq asc ) where rownum <= :2 | 优化方案,添加索引:createindex IDX_CREATOR on DLY_NOTE (creator);
|
在添加好IDX_CREATOR、IXU_RES_USER_CD_2 这2个索引后,cpu负载已经再次降低了一半,虽然没有到正常值,但是已经接近最低级别的告警线,暂时数据库脱离危险了,就等其它的几个需要修改sql的优化继续完成后,cpu负载基本就可以恢复到正常值了。
所以,一般oracle服务器负载飙升,90%都是由不规范的sql、性能差的sql造成的,而ash和awr分析报告是我们解决的最有效的途径。