长版本:我有一个大表,我想执行如下查询:
-- original
select IH.* from ITEM_HISTORY IH
join ITEM_PACKAGE IP on IP.PACKAGE_NAME = IH.PACKAGE_NAME
where IP.OPERATOR_ID = ?
and (
IH.OPERATION != 'CHANGE_OWNER' OR IH.EVENT_DATE = IH.INSTALLATION_DATE
) and IH.EXTERNAL_SERVICE_ACTION != 'NOT_APPLICABLE'
and IH.EVENT_DATE >= ? and IH.EVENT_DATE < ?
and ROWNUM <= 500000
order by IH.EVENT_DATE
这是JPA的< code>@NamedNativeQuery中定义的本地查询。每行代表一个项目发生的变更事件。有很多操作员可以修改项目,所以< code>ITEM_HISTORY表是一个巨大的表,经常给我们带来麻烦。它包含数百万条记录,并且经常超时。
最近,我们发生了一个事件,当生产 Pod 运行此查询时,Oracle 突然将执行计划更改为错误的执行计划,走一条不太优化的“路线”,并减慢了 Pod 的速度,最终导致应用程序无响应。我们必须重新启动 pod 才能使其恢复正常。客户不满意,DB团队只是将执行计划固定为通常的,更好的计划。但是他们问我们,作为DEV,在应用程序端可以做什么。
乍一看,我想:啊,这是错误的,因为在Oracle数据库中,我们应该使用“内联视图”或获取前X行
,因为这样,Oracle数据库就知道如何使用停止排序或窗口排序推送排名来优化它,所以,这是一个简单的!(我从这里和这里学到了这些)
所以我改成了:
-- version 1
select * from (
select IH.* from ITEM_HISTORY IH
join ITEM_PACKAGE IP on IP.PACKAGE_NAME = IH.PACKAGE_NAME
where IP.OPERATOR_ID = ?
and (
IH.OPERATION != 'CHANGE_OWNER' OR IH.EVENT_DATE = IH.INSTALLATION_DATE
) and IH.EXTERNAL_SERVICE_ACTION != 'NOT_APPLICABLE'
and IH.EVENT_DATE >= ? and IH.EVENT_DATE < ?
order by IH.EVENT_DATE
) where ROWNUM <= 500000
还有,这个:
-- version 2
select IH.* from ITEM_HISTORY IH
join ITEM_PACKAGE IP on IP.PACKAGE_NAME = IH.PACKAGE_NAME
where IP.OPERATOR_ID = ?
and (
IH.OPERATION != 'CHANGE_OWNER' OR IH.EVENT_DATE = IH.INSTALLATION_DATE
) and IH.EXTERNAL_SERVICE_ACTION != 'NOT_APPLICABLE'
and IH.EVENT_DATE >= ? and IH.EVENT_DATE < ?
and ROWNUM <= 500000
order by IH.EVENT_DATE
fetch first 500000 rows only;
但是,我没有发现太多的性能改进。我看到版本1比原来的版本更慢,版本2更快,但是执行计划显示了相同的成本。(测试是在staging env中完成的,其中范围过滤器将获取40万行)
-- original 21789 ms / 34598 ms
explain plan for
select * from ITEM_HISTORY IH
join PACKAGE P on P.PACKAGE_NAME = IH.PACKAGE_NAME
where OPERATOR_ID = '88000001' and (IH.OPERATION != 'CHANGE_OWNER' OR IH.EVENT_DATE = IH.INSTALLATION_DATE)
and IH.EXTERNAL_SERVICE_ACTION != 'NOT_APPLICABLE'
and IH.EVENT_DATE >= TO_DATE('2018/07/01', 'yyyy/mm/dd') and IH.EVENT_DATE < TO_DATE('2020/05/01', 'yyyy/mm/dd')
and rownum < 500000
order by IH.EVENT_DATE;
SELECT * FROM TABLE(DBMS_XPLAN.DIP.AY(NULL));
-- Plan hash.value: 1529757427
--
-- ----------------------------------------------------------------------------------------------------------------------------
-- | Id | Operation | Name | Rows | Bytes |TempP.| Cost (%CPU)| Time | Pstart| Pstop |
-- ----------------------------------------------------------------------------------------------------------------------------
-- | 0 | SELECT STATEMENT | | 66280 | 29M| | 133K (1)| 00:00:06 | | |
-- | 1 | SORT ORDER BY | | 66280 | 29M| 34M| 133K (1)| 00:00:06 | | |
-- |* 2 | COUNT STOPKEY | | | | | | | | |
-- |* 3 | hash.JOIN | | 66280 | 29M| | 126K (1)| 00:00:05 | | |
-- |* 4 | TABLE ACCESS FULL | PACKAGE | 545 | 120K| | 25 (0)| 00:00:01 | | |
-- | 5 | PARTITION RANGE ITERATOR| | 287K| 64M| | 126K (1)| 00:00:05 | 44 | 65 |
-- |* 6 | TABLE ACCESS FULL | ITEM_HISTORY | 287K| 64M| | 126K (1)| 00:00:05 | 44 | 65 |
-- ----------------------------------------------------------------------------------------------------------------------------
--
-- Predicate Information (identified by operation id):
-- ---------------------------------------------------
--
-- 2 - filter(ROWNUM<500000)
-- 3 - access("P"."PACKAGE_NAME"="IH"."PACKAGE_NAME")
-- 4 - filter("P"."OPERATOR_ID"='88000001')
-- 6 - filter("IH"."EXTERNAL_SERVICE_ACTION"<>'NOT_APPLICABLE' AND ("IH"."OPERATION"<>'CHANGE_OWNER' OR
-- "IH"."EVENT_DATE"="IH"."INSTALLATION_DATE"))
--
-- Note
-- -----
-- - this is an adaptive plan
-- final query(new) 33342 ms / 26423 ms
select * from (
select * from ITEM_HISTORY IH
join PACKAGE P on P.PACKAGE_NAME = IH.PACKAGE_NAME
where OPERATOR_ID = '88000001' and (IH.OPERATION != 'CHANGE_OWNER' OR IH.EVENT_DATE = IH.INSTALLATION_DATE)
and IH.EXTERNAL_SERVICE_ACTION != 'NOT_APPLICABLE'
and IH.EVENT_DATE >= TO_DATE('2018/07/01', 'yyyy/mm/dd') and IH.EVENT_DATE < TO_DATE('2020/05/01', 'yyyy/mm/dd')
order by IH.EVENT_DATE
) where rownum < 500000;
SELECT * FROM TABLE(DBMS_XPLAN.DIP.AY(NULL));
-- Plan hash.value: 3376840570
--
-- -----------------------------------------------------------------------------------------------------------------------------
-- | Id | Operation | Name | Rows | Bytes |TempP.| Cost (%CPU)| Time | Pstart| Pstop |
-- -----------------------------------------------------------------------------------------------------------------------------
-- | 0 | SELECT STATEMENT | | 66280 | 412M| | 133K (1)| 00:00:06 | | |
-- |* 1 | COUNT STOPKEY | | | | | | | | |
-- | 2 | VIEW | | 66280 | 412M| | 133K (1)| 00:00:06 | | |
-- |* 3 | SORT ORDER BY STOPKEY | | 66280 | 29M| 34M| 133K (1)| 00:00:06 | | |
-- |* 4 | hash.JOIN | | 66280 | 29M| | 126K (1)| 00:00:05 | | |
-- |* 5 | TABLE ACCESS FULL | PACKAGE | 545 | 120K| | 25 (0)| 00:00:01 | | |
-- | 6 | PARTITION RANGE ITERATOR| | 287K| 64M| | 126K (1)| 00:00:05 | 44 | 65 |
-- |* 7 | TABLE ACCESS FULL | ITEM_HISTORY | 287K| 64M| | 126K (1)| 00:00:05 | 44 | 65 |
-- -----------------------------------------------------------------------------------------------------------------------------
--
-- Predicate Information (identified by operation id):
-- ---------------------------------------------------
--
-- 1 - filter(ROWNUM<500000)
-- 3 - filter(ROWNUM<500000)
-- 4 - access("P"."PACKAGE_NAME"="IH"."PACKAGE_NAME")
-- 5 - filter("P"."OPERATOR_ID"='88000001')
-- 7 - filter("IH"."EXTERNAL_SERVICE_ACTION"<>'NOT_APPLICABLE' AND ("IH"."OPERATION"<>'CHANGE_OWNER' OR
-- "IH"."EVENT_DATE"="IH"."INSTALLATION_DATE"))
-- final query 2(fetch X rows only) 19662 ms / 19437 ms
explain plan for
select * from ITEM_HISTORY IH
join PACKAGE P on P.PACKAGE_NAME = IH.PACKAGE_NAME
where OPERATOR_ID = '88000001' and (IH.OPERATION != 'CHANGE_OWNER' OR IH.EVENT_DATE = IH.INSTALLATION_DATE)
and IH.EXTERNAL_SERVICE_ACTION != 'NOT_APPLICABLE'
and IH.EVENT_DATE >= TO_DATE('2018/07/01', 'yyyy/mm/dd') and IH.EVENT_DATE < TO_DATE('2020/05/01', 'yyyy/mm/dd')
order by IH.EVENT_DATE
fetch first 500000 rows only;
SELECT *
FROM TABLE(DBMS_XPLAN.DIP.AY(NULL));
--Plan hash.value: 3207167953
--
------------------------------------------------------------------------------------------------------------------------------
--| Id | Operation | Name | Rows | Bytes |TempP.| Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------
--| 0 | SELECT STATEMENT | | 500K| 3120M| | 133K (1)| 00:00:06 | | |
--|* 1 | VIEW | | 500K| 3120M| | 133K (1)| 00:00:06 | | |
--|* 2 | WINDOW SORT PUIH.D RANK | | 66280 | 29M| 34M| 133K (1)| 00:00:06 | | |
--|* 3 | hash.JOIN | | 66280 | 29M| | 126K (1)| 00:00:05 | | |
--|* 4 | TABLE ACCESS FULL | PACKAGE | 545 | 120K| | 25 (0)| 00:00:01 | | |
--| 5 | PARTITION RANGE ITERATOR| | 287K| 64M| | 126K (1)| 00:00:05 | 44 | 65 |
--|* 6 | TABLE ACCESS FULL | ITEM_HISTORY | 287K| 64M| | 126K (1)| 00:00:05 | 44 | 65 |
------------------------------------------------------------------------------------------------------------------------------
--
--Predicate Information (identified by operation id):
-----------------------------------------------------
--
-- 1 - filter("from$_subquery$_004"."rowlimit_$$_rownumber"<=500000)
-- 2 - filter(ROW_NUMBER() OVER ( ORDER BY "IH"."EVENT_DATE")<=500000)
-- 3 - access("P"."PACKAGE_NAME"="IH"."PACKAGE_NAME")
-- 4 - filter("P"."OPERATOR_ID"='88000001')
-- 6 - filter("IH"."EXTERNAL_SERVICE_ACTION"<>'NOT_APPLICABLE' AND ("IH"."OPERATION"<>'CHANGE_OWNER' OR
-- "IH"."EVENT_DATE"="IH"."INSTALLATION_DATE"))
所以,问题是:
编辑:我设法在事件中使用了执行计划。通常使用第一个和第二个。坏的是第三个。时间戳1和2是事件发生前几天。时间戳3是事件时间。
根据评论部分的要求,让我总结一下:
sql_id
并行运行,那么很明显要么运行该语句,要么生成 SQL 配置文件作为建议,以使用并行查询改进执行。一个正常的场景是所谓的bind peeking
问题。我不打算解释它,你在Internet上有很多关于它的文章。修复它的一种方法是创建一个sql计划基线
。
如何创建 SQL 计划基线?
从缓存
步骤1
手动计划加载可以与自动计划捕获结合使用,也可以作为自动计划捕获的替代方法。装入操作是使用 DBMS_SPM
包执行的,该包允许从 SQL 调整集或游标高速缓存中的特定 SQL 语句装入 SQL 计划基线。默认情况下,手动加载的语句标记为已接受。如果 SQL 语句存在 SQL 计划基线,则会将该计划添加到基线中,否则将创建新的基线。
DECLARE
l_plans_loaded PLS_INTEGER;
BEGIN
l_plans_loaded := DBMS_SPM.load_plans_from_sqlset(
sqlset_name => 'my_sqlset');
END;
/
DECLARE
l_plans_loaded PLS_INTEGER;
BEGIN
l_plans_loaded := DBMS_SPM.load_plans_from_cursor_cache(
sql_id => 'yoursqlid', plan_hash_value => 'yourplanhasvalue');
END;
/
LOAD_PLANS_FROM_SQLSET
和LOAD_PLANS_FROM_CURSOR_CACHE
函数的返回值表示函数调用加载的计划数。
SELECT sql_handle, plan_name, enabled, accepted FROM dba_sql_plan_baselines
验证是否接受并启用该计划。
如果sql_id
和plan_hash_value
不再在内存中,则需要使用 AWR 创建基线。如果是这种情况,这里有一个非常好的逐步指南
使用 AWR 的 SQL 基线