I had been going pity busy in recent time with several interviews scheduled. Every interview i took, Otherwise different in their questions had a common Key question, which was asked over and over again. Before long i decided to jot it down and try to illustrate with some use cases my approach on slow running queries. - "What are the things you would check should a query that has a long history of good performance start experiencing extremely long run times. " Whenever we are going tough with our queries we go and check the explain plan most of us do so, we don't bother to check the exact execution path taken by a query. once we open up the plan we thump our brain to find the possible reasons of the dreadful performance. In this post i have tried to illustrate the causes of the long lasting queries with use cases. I tried to show the affect of data skew, multiple single column predicate, index column wrapped with function in a predicate and some other factors on query performance. I did try to figure out the slowness with examples.
"What are the things you would check should a query that has a long history of good performance start experiencing extremely long run times. "
The major factors causing a performance degradation to a query from a developer point of view could be:
DATA SKEW
I am not sure about the definition which you ofcourse, you can Google but lets try understanding with use cases. Lets say i have a table ALL_CUSTOMERS it has some data say 10000 records now all of a sudden some 20k records has been added to a table. The histogram on these column are not available so optimizer can mislead us with the cardanility and performance will be impacted this could be a member of stale data. Any query using this table would be slowed down - lets see
create table ALL_CUSTOMERS
(
id NUMBER not null,
first_name VARCHAR2(100) not null,
last_name VARCHAR2(100) not null,
description VARCHAR2(100)
)
create index X_CUSTOMER_ID_NAME on ALL_CUSTOMERS (ID, FIRST_NAME);
----------------------------------------------------------------------
--Say i already have 10k records in the table and 20K records we will --add now
---------------------------------------------------------------------
BEGIN
FOR i in 1..20000 LOOP
INSERT/*+APPEND*/ INTO ALL_CUSTOMERS
(ID,
FIRST_NAME,
LAST_NAME,
DESCRIPTION)
VALUES(customersequence.nextval,
dbms_random.string('A',10),
dbms_random.string('A',15),
dbms_random.string('A',20));
END LOOP;
END;
If we now execute our query that involve ALL_CUSTOMERS table it would run SLOW. Why not check its execution plan.
EXPLAIN PLAN FOR
SELECT COUNT(*) FROM ALL_CUSTOMERS;
Explained
SQL> SELECT * FROM table(DBMS_XPLAN.DISPLAY (FORMAT=>'ALL +OUTLINE'));
--------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2532418596
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 11 (0)| 00:00:
| 1 | SORT AGGREGATE | | 1 | |
| 2 | INDEX FAST FULL SCAN| X_CUSTOMER_ID_NAME | 10000 | 11 (0)| 00:00:
--------------------------------------------------------------------------------
The above execution Plan shows only 10k records but we have 30k records now, So we can say optimizer does not have all information and can lead to a sketchy plan degrading the performance. We can gather the statistics at this point if we find our explain plan behaving different to our assumption.
BEGIN
dbms_stats.Gather_table_stats('HR', 'ALL_CUSTOMERS',
method_opt => 'for all columns size skewonly');
END;
CHECK THE EXECUTION PLAN NOW for the same query select count(*) from ALL_CUSTOMERS
-------------------------------------------------------------
We can see now the optimizer has all the information and will perform well.
SELECT *
FROM table(DBMS_XPLAN.DISPLAY (FORMAT=>'ALL +OUTLINE'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2532418596
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 29 (0)| 00:00:
| 1 | SORT AGGREGATE | | 1 | |
| 2 | INDEX FAST FULL SCAN| X_CUSTOMER_ID_NAME | 30000 | 29 (0)| 00:00:
--------------------------------------------------------------------------------
There could be other factors also causing performance issues in Oracle lets see another with use cases of course.
Multiple Single column Predicate
We hardly bother about performance while using multiple column predicate in our query. Could this impact performance - its likely to have an impact but again its all on the data. Let's take an example with Customers table.
Why not go ahead and try it out.
Please find the script of table creation and insertion attached.
Now when you have the table in your database try running a simple query just to check the number of records on a particular condition
select count(*)
from customers_obe
where country_id = 'US' and cust_state_province = 'CA';
This is just to check the number of records which is 29
-------------------------
As we don't want the optimizer to do a dynamic sampling(we can talk about it later), let us do a gather statistics again but i will do it with a skew option so that optimizer gets all the information.
BEGIN
dbms_stats.Gather_table_stats('HR', 'CUSTOMERS_OBE',
method_opt => 'for all columns size skewonly');
END;
With this lets now try with the execution plan. Now we expect the execution plan to hold all the information.
Explain Plan For
select count(*) from customers_obe where country_id = 'US' and cust_state_province = 'CA';
Explained
SELECT *
FROM table(DBMS_XPLAN.DISPLAY (FORMAT=>'ALL +OUTLINE'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3488102334
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 15 | 5 (0)| 00:00:
| 1 | SORT AGGREGATE | | 1 | 15 | |
|* 2 | TABLE ACCESS FULL| CUSTOMERS_OBE | 8 | 120 | 5 (0)| 00:00:
Lol its even now showing just 8 rows: This at times could be misleading oracle does not know the number of rows california region could have.
lets see the number of rows in every region in our table
select count(cust_id), cust_state_province from customers_obe group by cust_state_province order by 1 desc ;
--------------------------------------------
We could see just 18 out of 120 records in the distinct region wise count in double digit with only max of 30 rows for Nordrhein-Westfalen, If we analyze more we find that there are 78 region out of 120 which has count not more than 4. With all these details Oracle optimizer has mislead and even after the histograms on the column it provides a skethy plan which can potentially degrade the performance.
For such type of real world scenarios oracle has introduced Extended statistics. lets try implementing the extended stats in here.
select dbms_stats.create_extended_stats(null,'customers_obe',
'(country_id, cust_state_province)')
from dual;
BEGIN
dbms_stats.Gather_table_stats('HR', 'CUSTOMERS_OBE',
method_opt => 'for all columns size skewonly');
END;
Now as the extended stats are present we anticipate a better performance of the query which we ran above.
Plan Hash Value : 3488102334
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 15 | 5 | 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 15 | | |
| * 2 | TABLE ACCESS FULL | CUSTOMERS_OBE | 29 | 435 | 5 | 00:00:01 |
-------------------------------------------------------------------------------
We can see the number of records is now matching the one which we executed at the start of this section. Another thing did you notice a Full acess of Table here when i started my career i always believed FTS are not as good compared to an index scan but its not true CBO has come up with the best plan for this query. We can talk about the FTS better than index scan at times later but cut it short i would say : Whenever you want to go for a fair amount of records say more than 5-10% from a table full table sacan can perform better in this example we are accessing allmost 30% of the total records so FTS is better here. Why FTS is better in this case is because for generating one record in index scan oracle reads one block at a time like it would read first block then would read second and so one but for FTS it would read many blocks at a time there by making the i0 less.
Function wrapped column in predicate
Lets try this on the same table customers_obe.
Let me issue a simple command on indexed column wrapped with a function.
select count(*)
from customers_obe
where lower(country_id) = 'us';
SELECT *
FROM table(DBMS_XPLAN.DISPLAY (FORMAT=>'ALL +OUTLINE'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 4290551856
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 2 (0)| 00:00
| 1 | SORT AGGREGATE | | 1 | 3 | |
|* 2 | INDEX FAST FULL SCAN| IDX_CTRY_ID | 6 | 18 | 2 (0)| 00:00
--------------------------------------------------------------------------------
This again misleads we were expecting the plan to show 165 instead it shows just 6
Lets try and fix this one as we did for last two scenarios
BEGIN
dbms_stats.gather_table_stats('HR','customers_obe',
method_opt =>
'for all columns size skewonly for columns (lower(country_id))');
END;
Check the execution path again.
SELECT *
2 FROM table(DBMS_XPLAN.DISPLAY (FORMAT=>'ALL +OUTLINE'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 4290551856
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 2 (0)| 00:00
| 1 | SORT AGGREGATE | | 1 | 3 | |
|* 2 | INDEX FAST FULL SCAN| IDX_CTRY_ID | 165 | 495 | 2 (0)| 00:00
We can see the plan has now the number of rows which we expected from it initially.
There could be many reasons of a query down on performance. I tried to list some of it nay only few of it but everything more or less depends on data we have another reason for slow performance of a query could be resource used by a query being acquired by another transaction for long.
#Performancetuning # Oracledevelopment #PLSQL #Oracle #Oracledatabase #Slowquery
Comments