CBO | Mauro Pagano's Blog

May 10, 2017
by Mauro Pagano Leave a comment

Bushy Joins – a closer look

When 12.2 came out most of the (optimizer) focus was around SPD and how to avoid the challenges from 12.1. Still 12.2 introduced several (less acclaimed) optimizations including “Bushy Join” transformation, which is interesting since (I think, corrections welcome) Bushy Join concept isn’t necessarily tied to query transformation in general, especially before 12.2 (some reference about “manual” bushy joins here and here) or in other RDBMS (a manual example on SQL Server here).
Anyway being the CBO way of improving our code query transformations here we go again.

There isn’t much on the internet about Bushy Joins and 12.2 beside this article so I decided to take a closer look. All the tests are from a 12.2 vanilla installation with bushy joins enabled

SQL> @hparam bushy
NAME                              DESCRIPTION                         SESSION_VA
--------------------------------- ----------------------------------- ----------
_optimizer_bushy_cost_factor      cost factor for bushy join          100       
_optimizer_bushy_fact_dim_ratio   bushy join dimension to fact ratio  20        
_optimizer_bushy_fact_min_size    minimumm fact size for bushy join   100000    
_optimizer_bushy_join             enables bushy join                  ON

and the DDL to create the objects are the following

create table f1 as select a.* from dba_objects a, (select 1 from dual connect by rownum <= 2);
create table d1 as select object_id, object_type from dba_objects;
create table f2 as select a.* from dba_objects a, (select 1 from dual connect by rownum <= 2);
create table d2 as select object_id, object_type from dba_objects;

exec dbms_stats.gather_table_stats(user,'F1');
exec dbms_stats.gather_table_stats(user,'D1');
exec dbms_stats.gather_table_stats(user,'D2');
exec dbms_stats.gather_table_stats(user,'F2');

create index f1_idx1 on f1(object_id);
create index f2_idx1 on f2(object_id);
create index f1_idx2 on f1(object_type);
create index f2_idx2 on f2(object_type);

select table_name, num_rows from user_tables where table_name like 'F_' or table_name like 'D_';

TABLE_NAME      NUM_ROWS
------------- ----------
F1                147200
D1                 73601
F2                147204
D2                 73603

The DUAL to duplicate the number of rows in DBA_OBJECTS is just to have more than 100k rows in the two fact tables F1 and F2 (also indexes *IDX1 are never used in my examples but I created them so in the spirit of full disclosure I included them).

select f1.*, f2.* 
  from f1, f2, d1, d2 
 where f1.object_type = f2.object_type 
   and d1.object_type = f1.object_type 
   and f2.object_type = d2.object_type 
   and d1.object_id = 123 
   and d2.object_id = 456;

---------------------------------------------------------------------
| Id |Operation                      |Name             | Rows | Cost|
---------------------------------------------------------------------
|   0|SELECT STATEMENT               |                 |  208K|  414|
|*  1| HASH JOIN                     |                 |  208K|  414|
|   2|  NESTED LOOPS                 |                 | 3132 |  207|
|   3|   NESTED LOOPS                |                 | 3132 |  207|
|*  4|    TABLE ACCESS FULL          |D1               |    1 |   58|
|*  5|    INDEX RANGE SCAN           |F1_IDX2          | 3132 |    9|
|   6|   TABLE ACCESS BY INDEX ROWID |F1               | 3132 |  148|
|   7|  VIEW                         |VW_BUSHY_A9E4AA31| 3132 |  207|
|   8|   NESTED LOOPS                |                 | 3132 |  207|
|   9|    NESTED LOOPS               |                 | 3132 |  207|
|* 10|     TABLE ACCESS FULL         |D2               |    1 |   58|
|* 11|     INDEX RANGE SCAN          |F2_IDX2          | 3132 |    9|
|  12|    TABLE ACCESS BY INDEX ROWID|F2               | 3132 |  148|
---------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("F1"."OBJECT_TYPE"="ITEM_1")
   4 - filter("D1"."OBJECT_ID"=123)
   5 - access("D1"."OBJECT_TYPE"="F1"."OBJECT_TYPE")
  10 - filter("D2"."OBJECT_ID"=456)
  11 - access("F2"."OBJECT_TYPE"="D2"."OBJECT_TYPE")

From the execution plan D1 and F1 are joined together first and the result is then joined with [the result of the join between D2 and F2].
In this case this is a good idea since the two large fact tables are joined together without any dimension in between, thus the filtering introduced by the two dimensions cannot be applied before a large (and filtered) fact table is joined with another large (and not filtered) fact table. Btw what I just said is a bit incorrect since a merge join cartesian between the two dimensions wouldn’t be a too bad idea in this case (it’s actually what the CBO does once Bushy Joins are disabled).

Let’s take a look under the hood, aka 10053 trace 🙂
As mentioned before Bushy Joins is a CBO transformation, plugged into CBQT framework.
For some more details about the whole transformation stuff you can look here
(Notice I skipped a few lines in the output just to make it shorter, the skipped content was either blank or not important here)
The first interesting bit is the query block is copied (1066), it’s nothing new but it’s worth noticing. Also interesting are lines 1074 and 1075 with the same message and a bit of grammar puzzle.

 1064 BJ: Checking validity for bushy join for query block SEL$1 (#1)
 1066 Registered qb: SEL$1 0x22686940 (COPY SEL$1)
 1071 ****************************************
 1072  Cost-Based Bushy Join
 1073 ****************************************
 1074 BJ: Checking validity of bushy join for query block SEL$1 (#1)
 1075 BJ: Checking validity for bushy join for query block SEL$1 (#1)

In this case a linear search type (details here) is used for the Bushy Joins, starting from the transformation NOT applied. Notice all the four tables are involved at this point (0,0,0,0) and they don’t seem to be “qualified” yet, also this is a bit different than usual since the 4 placeholder here seem to be for tables and not for query blocks (like in other transformations, for example subquery unnesting).
The interesting part is at the end of the costing (physical optimizer) the tables are “identified” as dimensions and facts, including validating stats and structure (the “sructure” typo isn’t mine 😛 ).

 1082 BJ: Starting iteration 1, state space = (0,0,0,0) : (0,0,0,0)
 1083 BJ: Original query
 (physical optimizer here)
 3787 BJ: fact stats valid: F2 [F2]
 3788 BJ: dim stats valid: D2 [D2]
 3789 BJ: dim structure valid: D2 [D2]
 3790 BJ: fact sructure valid: F2 [F2]
 3791 BJ: fact stats valid: F1 [F1]
 3792 BJ: dim stats valid: D1 [D1]
 3793 BJ: dim structure valid: D1 [D1]
 3794 BJ: fact sructure valid: F1 [F1]
 3798 BJ: Updated best state, Cost = 1063.107718

At this point something interesting happens, the CBQT framework (my guess here, easily wrong) starts to focus only on a subset of the objects to consider for the search space / transformation, notice the (4,3).
I couldn’treliably match those 4 and 3 with something in the 10053 (they aren’t query block numbers since there is only one starting query block SEL$1) but an educated guess is 4 is the “substree D1,F1” while 3 is “substree D2,F2”.

 3799 BJ: Starting iteration 2, state space = (4,3) : (0,1)
 3800 Registered qb: SEL$A9E4AA31 0x22425a70 (QUERY BLOCK TABLES CHANGED SEL$1)
 3804   signature (): qb_name=SEL$A9E4AA31 nbfros=3 flg=0
 3805     fro(0): flg=0 objn=83772 hint_alias="D1"@"SEL$1"
 3806     fro(1): flg=0 objn=83771 hint_alias="F1"@"SEL$1"
 3807     fro(2): flg=5 objn=0 hint_alias="VW_BUSHY_A9E4AA31"@"SEL$A9E4AA31"
 3808
 3809 Registered qb: SEL$9F959E4D 0x22420780 (SPLIT/MERGE QUERY BLOCKS SEL$A9E4AA31)
 3813   signature (): qb_name=SEL$9F959E4D nbfros=2 flg=0
 3814     fro(0): flg=0 objn=83774 hint_alias="D2"@"SEL$1"
 3815     fro(1): flg=0 objn=83773 hint_alias="F2"@"SEL$1"
 3816
 3817 Registered qb: SEL$F04E7C56 0x22425a70 (QUERY BLOCK HAS BUSHY JOIN SEL$1; SEL$1; LIST)
 3821   signature (): qb_name=SEL$F04E7C56 nbfros=3 flg=0
 3822     fro(0): flg=0 objn=83772 hint_alias="D1"@"SEL$1"
 3823     fro(1): flg=0 objn=83771 hint_alias="F1"@"SEL$1"
 3824     fro(2): flg=1 objn=0 hint_alias="VW_BUSHY_A9E4AA31"@"SEL$A9E4AA31"

and the transformed SQL becomes (after reintroducind the crappy “select *” I have in my SQL and doing a little formatting), which is nothing surprising if you looked at the previous linked articles.

 3828 SELECT *  
        FROM  (SELECT * 
                 FROM "MPAGANO"."F2" "F2",
                      "MPAGANO"."D2" "D2" 
                WHERE "D2"."OBJECT_ID"=456 
                  AND "F2"."OBJECT_TYPE"="D2"."OBJECT_TYPE") "VW_BUSHY_A9E4AA31",
              "MPAGANO"."F1" "F1",
              "MPAGANO"."D1" "D1" 
       WHERE "F1"."OBJECT_TYPE"="VW_BUSHY_A9E4AA31"."ITEM_1" 
         AND "D1"."OBJECT_TYPE"="F1"."OBJECT_TYPE" 
         AND "D1"."OBJECT_ID"=123

Then the classic approach of any other transformation under CBQT is used (how cool is that? 🙂 ), some more details on how to interpret all this is in the link provided above.

 4617 BJ: Updated best state, Cost = 413.807609
 4618 BJ: Starting iteration 3, state space = (4,3) : (1,0)
 5430 BJ: Not update best state, Cost = 2148.874633
 5431 BJ: Starting iteration 4, state space = (4,3) : (1,1)
 6182 BJ: Not update best state, Cost = 560.862165
 6183 BJ: transformed final query

So Bushy Join transformation will be applied, in details “grouping” together just F2 and D2 (that is (4,3) = (0,1)) since the cost was the lowest, 413.
Just for the sake of completeness here is the transformed SQL (formatted and injected “*” again to shorten it) when both “subtrees” go under Bush Joins

 5458 SELECT * 
        FROM  (SELECT * 
                 FROM "MPAGANO"."F1" "F1",
                      "MPAGANO"."D1" "D1" 
                WHERE "D1"."OBJECT_ID"=123 
                  AND "D1"."OBJECT_TYPE"="F1"."OBJECT_TYPE") "VW_BUSHY_B144F3C9", 
              (SELECT * 
                 FROM "MPAGANO"."F2" "F2",
                      "MPAGANO"."D2" "D2" 
                WHERE "D2"."OBJECT_ID"=456 
                  AND "F2"."OBJECT_TYPE"="D2"."OBJECT_TYPE") "VW_BUSHY_A9E4AA31" 
       WHERE "VW_BUSHY_B144F3C9"."ITEM_1"="VW_BUSHY_A9E4AA31"."ITEM_1"

Few other things worth mentioning below.

From the 10053 it seems Star Transformation is an interleaved transformation for Bushy Joins, even though in my case it was never considered, likely due to the simple nature of my SQL

The transformation can be controlled by the BUSHY_JOIN hint and the syntax is (surprisingly) pretty trivial for simple SQL statement. You can provide within parenthesis the grouping you want Oracle to apply as parameter of the hint. For example in order to force the transformation on both F2,D2 and F1,D1 I can use BUSHY_JOIN((d1 f1) (d2 f2)). Extended and more accurate syntax (including target query block as well as source query block for each table) would be better, but still this is much easier than write an CONCAT hint 🙂

There are several “heuristic” to make Bushy Joins considered, for example the size of the fact table (100k rows according to the parameters above, just guessing here), the number of tables the fact needs to be joined to (another 2 at least), the size ratio between dimensions and fact, etc. I don’t have a complete list but I assume the 10053 would list the reasoning for ignoring Bushy Joins for your SQL (as it did in my case when I looked into this transformation the first time).

There is a lot of guessing in this blog post so if anybody has any correction please let me know and I’ll be happy to make adjustments (and very happy to learn!).
I’m not sure why the feature is disable by default, maybe it’s not super-solid yet, but I think it can have some nice impact on specific cases.

April 29, 2017
by Mauro Pagano 4 Comments

SQLTXPLAIN vs SQLd360, differences and similarities

When talking tools to help with SQL Tuning the question on SQLTXPLAIN vs SQLd360 comes out very often. “What’s the difference?”, “Which one should I use?”, “Why another tool?” are probably the top ones 🙂

I’ll try to make a fair and (as much as possible) unbiased comparison between the two on some general aspects. If you want to know of any specific area just let me know in the comments.

Aspect	SQLTXPLAIN	SQLd360
Installation	Requires installation. Creates two schemas that serve as metadata and code repository, they need to be present every time the tool is executed	No installation, at the end of the execution no evidence is left behind
Source code	Mainly PL/SQL, unwrapped (clear text). Code is Oracle property	SQL scripts, clear text, true open source released under GPL. Code available on GitHub
Main Goal	Automate every aspect of diagnostic collection for SQL tuning	Same as SQLT, but less “obsessive” 🙂
Main Focus	Execution plan generation, why CBO generated plan X,Y,Z (includes custom testcase, very flexible)	Execution plan generation (but a little less than SQLT) and how plans execute at runtime
Main advantage	Includes every bit of info needed to drill into CBO decisions, unlikely the need to go back and collect anything else	Makes diagnostic consumption faster and easier because of data visualization, lots of focus on runtime
Main disadvantage	Requires installation. Not much focus on execution time (“same plan different performance” type of cases harder to deal with)	Might miss some information under rare circumstances

Outside the comparison table, what we use today: SQLd360
Reason(s):

No installation.
Root Cause Analysis of CBO decisions usually belongs to Oracle Support (for the most part), SQL Tuning in the real world focuses on easier aspects of plan generation, thus lower need for strong focus on it.
Performance data available in Oracle grows by release, charting allows quick and way more effective consumption of large amount of such data.

January 14, 2016
by Mauro Pagano Leave a comment

Top Executions SQL Monitoring style reports in SQLd360

I think SQL Monitoring is an amazing tool when it comes to SQL Tuning but I often find that for one reason or another the report is almost never around for post-mortem investigation.
Historical SQL Monitoring reports have been introduced in 12c but still the decision to collect or no the report for the SQL ID we are interested in depends on several factors we have no control on after the issue happened 😦

SQLd360 tried to alleviate this “issue” including ASH-based charts that provided similar information, those have been available for a long time in the Plan Details page, organized by Plan Hash Value (PHV).
The main difference between SQL Monitoring (SM) and SQLd360 is the scope. SM provides info for a single execution while SQLd360 aggregated info from all the executions active at a specific point in time. Info for recent executions are (V$ACTIVE_SESSION_HISTORY) are aggregated by minute while historical executions (DBA_HIST_ACTIVE_SESS_HISTORY) get aggregated by hour.
That section of the SQLd360 looks like this:

Screen Shot 2016-01-14 at 9.46.54 AM.png

Starting SQLd360 v1601 a new set of reports is provided for the Top-N executions per PHV, where a “top execution” is one of those with the highest number of samples.
The goal is to replicate, as close as possible, the SQL Monitoring functionalities using just the ASH data, which tend to be around for much longer than a SQL Monitoring report 🙂

The data is not aggregated so the granularity is 1s for samples from memory (V$) and 10s for samples from ASH (DBA_HIST).
With this level of granularity combined with the non aggregation other types of data visualizations make sense, like i.e. timelines to identify when a specific section of the execution plan was active (the which section is the “bad guy” can be answered from the tree chart), that’s the “Plan Step IDs timeline” in the figure below that will need its own blog post 😀 .
This new section of the report looks like this:

Screen Shot 2016-01-14 at 9.58.46 AM.png

So i.e. for each execution we can see the active sessions (plural in case of PX and not “Average Active Session” since there is no timeframe aggregation) with associated CPU/wait events over time, just like in SQL Monitoring (to be fair SQL Monitoring is able to provide sub-second details, which are not available in ASH).

Screen Shot 2016-01-14 at 10.04.50 AM.png

Hopefully you’ll find this new section useful, specially when ASH is all you got 😀

Final note: the number Top-N executions is configurable in file sql/sqld360_00_config.sql altering the value for sqld360_conf_num_top_execs (default is 3).

As usual feedback, comments, suggestions are all more than welcome!

November 11, 2015
by Mauro Pagano 1 Comment

Histograms come histograms go

Actually the title is likely incorrect, in the sense that I think histograms don’t “go” once they come. Or at least I’ve never seen a case where consistently gathering statistics using method_opt “SIZE AUTO” makes a histogram go away sometime after the same syntax made it show up (while it’s common for a histogram to just show up sometime in the future because of column usage).

Recently I’ve seen several systems where custom statistics gathering procedures were coupled with the automatic stats gathering job (voluntarily or not is another story… 😀 ) triggering all sort of weirdness in the statistics. Reason was a syntax mismatch between the custom job (method_opt “SIZE 1”) and the automatic job (default is method_opt “SIZE AUTO”).

A reliable way to figure out if histograms popped up / disappeared is to look at WRI$_OPTSTAT_HISTGRM_HISTORY, too bad this table is usually large and expensive to access (and not partitioned).
Another table is available though, table WRI$_OPTSTAT_HISTHEAD_HISTORY, which is the historical version of the dictionary table HIST_HEAD$ (1 row per histogram) but while in the dictionary there are BUCKET_CNT and ROW_CNT columns (so it’s easy to spot a histogram) such columna are gone once the info are stored in WRI$. Wouldn’t it be nicer to just have a column in WRI$_OPTSTAT_HISTHEAD_HISTORY saying “histogram was in place” ?

Let’s see if we can figure out a (approximate) way that can save us a trip to WRI$_OPTSTAT_HISTGRM_HISTORY.

Table WRI$_OPTSTAT_HISTHEAD_HISTORY has a column FLAGS that is populated using the following expression in 11.2.0.4 (extracted from a 10046 trace of DBMS_STATS.GATHER_TABLE_STATS)


bitand(h.spare2,7) + 8 + decode(h.cache_cnt,0,0,64)

while it’s tricky and risky to guess what SPARE2 might store it should be pretty easy to play with CACHE_CNT, we just need to gather stats with and without histograms to see how the value changes.

SQL> create table test_flag (n1 number);
SQL> insert into test_flag select mod(rownum,200) n1 from dual connect by rownum <= 10000; SQL> insert into test_flag select * from test_flag;
--repeat it a few times to make the data grow
SQL> commit;
SQL> exec dbms_stats.gather_table_stats('MPAGANO','TEST_FLAG',method_opt=>'FOR ALL COLUMNS SIZE 254');
SQL> select obj#, col#, bucket_cnt, row_cnt, cache_cnt, null_cnt, sample_size from sys.hist_head$ where obj# in (select object_id from dba_objects where object_name = 'TEST_FLAG');

      OBJ#       COL# BUCKET_CNT    ROW_CNT  CACHE_CNT   NULL_CNT SAMPLE_SIZE
---------- ---------- ---------- ---------- ---------- ---------- -----------
     24897          1       5487        200         10          0        5487

SQL> exec dbms_stats.gather_table_stats('MPAGANO','TEST_FLAG',method_opt=>'FOR ALL COLUMNS SIZE 1');
SQL> select obj#, col#, bucket_cnt, row_cnt, cache_cnt, null_cnt, sample_size from sys.hist_head$ where obj# in (select object_id from dba_objects where object_name = 'TEST_FLAG');

      OBJ#       COL# BUCKET_CNT    ROW_CNT  CACHE_CNT   NULL_CNT SAMPLE_SIZE
---------- ---------- ---------- ---------- ---------- ---------- -----------
     24897          1          1          0          0          0     2560000

In the first run we gathered stats with “SIZE AUTO” and a 200 buckets histogram was created, sample size ~=5500 is not surprising and it matches with AUTO_SAMPLE_SIZE used. In this case CACHE_CNT was 10.

In the second run we gathered stats with “SIZE 1” and no histogram was created. In this case CACHE_CNT was 0.

I couldn’t find with 100% accuracy what CACHE_CNT represents (actually if any reader knows it please comment this post) but my educated guess is that’s number of entries to be cached in the row cache; unfortunately this is really just a guess since I didn’t even find a parameter to change this value (again, whoever knows better please let me know).

I ran multiple tests and it seems like CACHE_CNT has a value larger than 0 only when a histogram is present (assuming my guess of what is means is accurate), thus the value of WRI$_OPTSTAT_HISTHEAD_HISTORY.FLAG is bumped by 64 only when a histogram is present.
With this info in mind it’s now pretty easy to spot if a histogram was present in the past just looking at WRI$_OPTSTAT_HISTHEAD_HISTORY.FLAG without having to access the larger WRI$_OPTSTAT_HISTGRM_HISTORY.

SQLd360 v1526 has a new column HAD_HISTOGRAM added to the Column Statistics Version report built exactly on the assumptions made here.

Feedbacks welcome as usual, especially if they can prove wrong what I just said 😀

October 26, 2015
by Mauro Pagano 12 Comments

Introducing Pathfinder, is there a better plan for my SQL?

Pathfinder is a new free tool that provides an easy way to execute a SQL statement under multiple optimizer environments in order to generate different execution plans, potentially discovering better plans. The tool can also be used to quickly identify workarounds for wrong result bugs as well as slow parse issues.

Pathfinder uses the same brute-force approach of SQLT XPLORE, executing the SQL for every single CBO parameter and fix_control present in the database, with no installation required. This make Pathfinder easy to run in any environment, including a production one (assuming you understand *WELL* what the tool does, how it works and what it means for your database).

Each test adds approximately 1 second overhead to the time the SQL takes to complete and the amount of tests considered is pretty high, in 11.2.0.4 it’s around 1100 and around 1500 in 12.1.0.2, thus I suggest to use Pathfinder on SQLs that take at most a few seconds to run (or just be ready to leave Pathfinder run for a loooong time).

The tool executes the SQL statement present in file script.sql (provided), just modify the script and replace the seeded SQL with the one you want to execute. In the same script you can also add ALTER SESSION commands that will be executed before the desired SQL, this is helpful in case you want to influence the analysis providing a different starting point.

To execute the tool just download it from the Download section on the right side of this page (or from here, also the tool will be released as standalone script in the same zip file as SQLd360) and follow these steps:

Unzip pathfinder-master.zip, navigate to the root pathfinder directory, and connect as SYS to the database.
$ unzip pathfinder-master.zip
$ cd pathfinder-master
$ sqlplus / as sysdba
Open file script.sql and add your SQL in there. Make sure to add the mandatory comment /* ^^pathfinder_testid */. The file name must be script.sql, if you wish to change the name then just ping me.
Execute pathfinder.sql and provide the connect string to connect as the user that is supposed to run the script.
Unzip output file pathfinder_<dbname>_<date>.zip into a directory on your PC and review the results starting from file 00001_pathfinder_<dbname>_<date>_index.html


SQL> @pathfinder

Parameter 1:
Full connect string of the database to run the SQL into
If the database is remote or a PDB then you must include
the TNS alias i.e. scott/tiger@orcl

Enter value for 1: mpagano/mpagano@orcl
mpagano/mpagano@orcl
Building Pathfinder driver scripts
Connected.

1) "pathfinder_{ 20151026_1906 (00001)" 19:06:40 BASELINE

2) "pathfinder_{ 20151026_1906 (00002)" 19:06:42 "_add_stale_mv_to_dependency_list" = FALSE

.....
File pathfinder_orcl_20151023_1256.zip created.

For each test Pathfinder will show the setting considered as well as some basic statistics like Plan Hash Value, Elapsed Time, CPU Time, Buffer Gets and Rows processed. Also two links are present, one points to the details of the execution plan generated while the other points to the details of V$SQL.

The main page will look something like this:

Pathfinder also considers the effect of Cardinality Feedback executing the SQL multiple times until the plan stops changing (or CFB gives up after the fifth parse), for all those settings that lead to a first execution plan different than the baseline.
This is why for some settings you will see a Test# with an additional digit, the “reparse” number:

In example for Test# 117 above the CBO generated a different plan (PHV 1837274416) than the baseline (PHV 1838229974) and Cardinality Feedback kicked in 3 times generating a different plan each time, until the 3rd parse when the CBO ended up with the same plan as the first execution (and no different plan was generated after).

This is the first release of the tool so I expect it to be far from perfect but I’ve already used it several times with success. Hopefully with time (and your feedbacks :D) the tool will get better and better.

I hope you enjoy it and please don’t hesitate to get in touch with me for feedbacks, suggestions and bugs!!!

October 23, 2015
by Mauro Pagano 1 Comment

Using SQLd360 – introduction video

This is just a quick introduction video on what SQLd360 is, where to download it (Download section here on the right :D), how to execute it and what’s the output you should expect.

Hope you like it and feedbacks are welcome as usual!

Using SQLd360 from Mauro Pagano on Vimeo.

August 6, 2015
by Mauro Pagano Leave a comment

How to quickly identify if a SQL running in parallel has been downgraded using SQLd360

This is just a quick note on how to leverage a couple reports in order to identify if a PX has been downgraded. Those reports have been around for quite some time so this isn’t about any new feature but rather how to use what’s already there 😉

An execution is downgraded when the number of PX slaves used by the SQL at runtime is lower than the requested number, there are few reasons why it can happen with the most common one being the lack of available PX slaves at the time the SQL starts. Such a downgrade can cause from small to catastrophic performance degradation depending on the severity of the downgrades (measured in %) since the CBO generates an execution plan expecting to have some specific horsepower (aka number of PX slaves) at disposal but then such horsepower is reduced. It’s like planning how long it will take to drive from A to B with a Ferrari and then go to the garage and find out you only have a Jetta 😀

SQLd360 aims at making it easier (and quicker) to identify if an execution was downgraded, let’s see how.

The requested DoP for a SQL is stored in the OTHER_XML column, for each PHV/source SQLd360 provides in the “SQL Performance Summary” the MIN and MAX DoP requested, which is likely to stay stable overtime (exception made for Adaptive Degree Policy or Adaptive Multi User, which aren’t that common anyway).
Here is an example of a SQL with a requested DoP of 4:

Starting from 11gR2 the DoP the SQL executed with can be extracted from column PX_FLAGS in ASH (TRUNC(px_flags / 2097152) as already reported by Randolf here) so for each execution that made it into ASH SQLd360 reports the “execution DoP”.
Here is the output from report “Elapsed Time for Recent Execs”:

Column MAX_PX_DEGREE is the MAX(DoP) this specific execution used (see note below about why MAX).
Column NUM_PROCESSES is the distinct number of ASH sampled sessions that collaborated to this execution (up to 2*DoP).

From the report above it’s easy to spot how the execution from July 30 was indeed executed at the requested DoP (4) while the one from August 6 was downgraded (requested DoP 4, execution DoP 2).

Note: ASH reports the DoP by DFO tree so if the plan had multiple DFO trees with different DoP (uncommon but possible) just keep in mind the number you see in the report is the MAX.

July 29, 2015
by Mauro Pagano 5 Comments

AutoDOP in 12c, what’s new and what’s old

The New Features guide for 12c reports enhancements to AutoDOP but unfortunately there is not much details so let’s try to dig a little to find out which are (some of) those enhancements.

The focus on this blog post is on how the Degree of Parallelism (DoP) is computed for three SQLs in 11.2.0.4 vs 12.1.0.2 (so not considering ADAPTIVE degree policy) reading a single table, joins will come in a separate post.

First requirement to play with AutoDOP is to have IO Calibration statistics in place (UPDATE: the requirement is in 11.2.0.x, in 12c default values are used when IO Calibration are missing, 200MB/s is the default for MAX_PMBPS).
Since all my tests are on VMs on my personal machine the IO response time is very unstable so for the sake of consistent results I’ll manually set the IO Calibration stats (AFAIK this isn’t recommended, also the INSERT requires an instance restart to take effect).
We’ll use 10 as value for PMBPS (Max MBPS during parallel scan).

SQL> desc dba_rsrc_io_calibrate
 Name                  Null?    Type
 --------------------- -------- --------------
 START_TIME                     TIMESTAMP(6)
 END_TIME                       TIMESTAMP(6)
 MAX_IOPS                       NUMBER
 MAX_MBPS                       NUMBER
 MAX_PMBPS                      NUMBER
 LATENCY                        NUMBER
 NUM_PHYSICAL_DISKS             NUMBER

SQL> insert into dba_rsrc_io_calibrate values (systimestamp, systimestamp, 10000, 5, 10, 5, 10);
SQL> commit;

And here are the three tables with 10k, 50k and 16M rows each used in the tests.
Each row is created to account for pretty much 1 block (in a 8k block_size database) in the first two tables so those tables will grow large in size even with just a few rows, the third table is designed to be very small.

SQL> drop table t10k purge;
SQL> drop table t50k purge;
SQL> drop table t16m purge;
SQL> create table t10k (n1 number, c1 char(2000), c2 char(2000), c3 char(2000));
SQL> create table t50k (n1 number, c1 char(2000), c2 char(2000), c3 char(2000));
SQL> create table t16m (n1 number, c1 char(1), c2 char(1), c3 char(1));
SQL> insert into t10k select rownum, 'a','b','c' from dual connect by rownum <= 10000;
SQL> insert into t50k select rownum, 'a','b','c' from dual connect by rownum <= 50000;
SQL> insert into t16m select rownum, 'a', 'b', 'c' from dual connect by rownum <= 250000; 
SQL> insert into t16m select * from t16m;  
SQL> /  
SQL> /  
SQL> /
SQL> /
SQL> /  
SQL> exec dbms_stats.gather_table_stats(user,'T10K');
SQL> exec dbms_stats.gather_table_stats(user,'T50K');
SQL>  exec dbms_stats.gather_table_stats(user,'T16M');
SQL> select table_name, num_rows, blocks from user_tables where table_name like 'T%' order by 2;
TABLE_NAME                       NUM_ROWS     BLOCKS
------------------------------ ---------- ----------
T10K                                10000      10097
T50K                                50000      50897
T16M				 16000000      35472

We’ll use three SQLs to check different scenarios and how they show up in the 10053 trace.
EXPLAIN PLAN has been used just to force a hard parse each time (it won’t have any “negative” side-effect here).

explain plan for select /* Q1 */ * from t10k;   
explain plan for select /* Q2 */ * from t50k;   
explain plan for select /* Q3 */ count(distinct n1) from t16m;

As we already known (nice post here) in 11.2.0.x the DoP under AutoDOP is based on “IO cost” for the operation and the calculation is performed for each table individually at the time the CBO calculates cardinality and best access method for the table (aka you’ll find it under SINGLE TABLE ACCESS PATH section of the 10053).

##### Q1 #####

The CBO decides to execute Q1 in serial in both versions, let’s look into the details.
From 11.2.0.4 trace, the first 4 lines in green are the input values used to perform the math, the next two are about In-Memory Parallel Execution (nothing to do with In-Memory Option) and the last three lines are the most interesting ones.

SINGLE TABLE ACCESS PATH 
  Single Table Cardinality Estimation for T10K[T10K] 
  Table: T10K  Alias: T10K
    Card: Original: 10000.000000  Rounded: 10000  Computed: 10000.00  Non Adjusted: 10000.00
  Access Path: TableScan
    Cost:  2737.52  Resp: 2737.52  Degree: 0
      Cost_io: 2736.00  Cost_cpu: 74005180
      Resp_io: 2736.00  Resp_cpu: 74005180
kkeCostToTime: using io calibrate stats 
 maxmbps=5(MB/s) maxpmbps=10(MB/s) 
 block_size=8192 mb_io_count=4 mb_io_size=30247 (bytes) 
 tot_io_size=79(MB) time=7897(ms)
AutoDOP: Consider caching for T10K[T10K](obj#70600) 
cost:2737.52 blkSize:8192 objSize:10097.00 marObjSize:9592.15 bufSize:37240.00 affPercent:80 smallTab:NO affinitized:YES
AutoDOP:parallel operation is set to NO
AutoDOP:parallel operation is set to NO
AutoDOP: Table/Index(#70600) access (scan) cost=2737.52 estTime=7896.70 unit=10000.00 dop=2 -> maxdop=2
  Best:: AccessPath: TableScan
         Cost: 2737.52  Degree: 1  Resp: 2737.52  Card: 10000.00  Bytes: 0

The following is based on my understanding and tests so it could easily be wrong.

The first “AutoDOP: parallel operation is set to NO” refers to the lack of PARALLEL hint in the SQL, if you add the hint then the message becomes YES prefixed by another one with the degree specified in the hint (if any).

The second “AutoDOP: parallel operation is set to NO” is based on the fact this SQL is not “expensive enough” to qualify for parallel execution, basically the E-Time is lower than PARALLEL_MIN_TIME_THRESHOLD.

The last line shows the details of the math. In 11.2.0.4 the DoP is indeed computed based on how large is the segment (tot_io_size=79MB) and how fast we can read from disk (maxpmbps=10MB/s) so 79 (MB) / 10 (MB/s) = (time=) 7897ms. The value is then divided by 10k (unit=, not sure where the value comes from since it doesn’t seem related to PARALLEL_MIN_TIME_THRESHOLD even though it does represent 10s) and the result (0.7) considered as DoP.
DoP 0.7 means this execution will be serial.
The value of dop=2 -> maxdop=2 is a little misleading here (maybe it’s a default in the code), PX is rejected here and the SQL goes serial.
The trace confirms it just few lines below

AUTO DOP PLANS EVALUATION
***************************************
Compilation completed with Dop: 1.
  Cost_io:  2736.00  Cost_cpu: 74005179.68
  Card:     10000.00  Bytes:    60070000.00
  Cost:     2737.52  Est_time:  7897ms
kkopqSetDopReason: Reason why we chose this DOP is: no expensive parallel operation.
No parallel scan operations
kkopqCombineDop: Dop:1 Hint:no 
Query: compute:yes forced:no  computedDop:2 forceDop:0
kkopqSetDopReason: Reason why we chose this DOP is: parallel threshold.
Serial phase is inexpensive (2738), stick to it.
Costing completed. DOP chosen: 1.

Here we can see the DoP chosen was 1 because no operation was expensive enough to warrant for PX as well as the E-Time for the SQL (7.8s) is below the PARALLEL_MIN_TIME_THRESHOLD (10s).

Nothing new so far, let’s look into the 10053 from 12.1.0.2 for the same SQL, Q1 as see how is that different.

SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for T10K[T10K]
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
  Table: T10K  Alias: T10K
    Card: Original: 10000.000000  Rounded: 10000  Computed: 10000.000000  Non Adjusted: 10000.000000
  Scan IO  Cost (Disk) =   2736.000000
  Scan CPU Cost (Disk) =   74005179.680000
  Total Scan IO  Cost  =   2736.000000 (scan (Disk))
                       =   2736.000000
  Total Scan CPU  Cost =   74005179.680000 (scan (Disk))
                       =   74005179.680000
  Access Path: TableScan
    Cost:  2737.397801  Resp: 2737.397801  Degree: 0
      Cost_io: 2736.000000  Cost_cpu: 74005180
      Resp_io: 2736.000000  Resp_cpu: 74005180
  Best:: AccessPath: TableScan
         Cost: 2737.397801  Degree: 1  Resp: 2737.397801  Card: 10000.000000  Bytes: 0.000000

The first difference is the AutoDOP computation does not happen during the access path decision, that’s why no AutoDOP tag is present in the previous extract.

The calculations are performed at the end of the join selection for the query block (no join in this case and just one query block) and here is where the major changes shows up (steps are marked with different colors to make it easier to reference them):

kkecComputeAPDopCS: CPUtabBytes: 82714624.000000 CPUtabRows: 10000.000000  rowSize: 6007
kkecComputeAPDopCS: Basic Stats Level
kkecComputeDOP: Proc Rate: 1048576000.000000 - Cost: 82714624.000000 - Time: 78.882812
kkecComputeDOP: Proc Rate: 1000000.000000 - Cost: 10000.000000 - Time: 10.000000
kkecComputeAPDopCS: Before Filter Preds - CPU Dop: 0.007888
kkecComputeAPDopCS: Final Filter Predicate Bytes: 0.000000 Rows: 0.000000
kkecComputeAPDop: Final -  CPU Dop: 0.007888 Rate: 1000.000000
AutoDOP: Consider caching for T10K[T10K](obj#138272)
cost:2737.397801 blkSize:8192 objSize:10097.00 marObjSize:9592.15 bufSize:41160.00 affPercent:80 smallTab:NO affinitized:YES
kkeCostToTime: using io calibrate stats maxpmbps=10(MB/s)
 block_size=8192 mb_io_count=4 mb_io_size=30247 (bytes)
 tot_io_size=79(MB) time=7896(ms)
AutoDOP:parallel operation is set to NO
AutoDOP:parallel operation is set to NO
AutoDOP: Table/Index(#138272) access (scan) cost=2737.397801 estTime=7896.34 unit=10000.00 dop=2.000 -> maxdop=2
kkecComputeAPDop: IO Dop: 2.000000 - CPU Dop: 0.007888

The first big difference is the AutoDOP in 12c also includes the CPU DoP for the operations in the query block, including not only those operations that have an IO cost (i.e. a FTS) but also those operations that have just a CPU cost (i.e. a Join).
The way the operations are normalized is via “processing rates”, in 12c there is a new view that provides to the CBO with such numbers, V$OPTIMIZER_PROCESSING_RATE.

SQL> select operation_name, to_number(manual_value) manual, to_number(calibration_value) calibration, to_number(default_value) default_ from v$optimizer_processing_rate order by 1;

OPERATION_NAME                MANUAL CALIBRATION   DEFAULT_
------------------------- ---------- ----------- ----------
AGGR                             560                   1000
ALL                                                     200
CPU                                                     200
CPU_ACCESS                                              200
CPU_AGGR                                                200
CPU_BYTES_PER_SEC                560                   1000
CPU_FILTER                                              200
CPU_GBY                                                 200
CPU_HASH_JOIN                                           200
CPU_IMC_BYTES_PER_SEC                                  2000
CPU_IMC_ROWS_PER_SEC                                2000000
CPU_JOIN                                                200
CPU_NL_JOIN                                             200
CPU_RANDOM_ACCESS                                       200
CPU_ROWS_PER_SEC                                    1000000
CPU_SEQUENTIAL_ACCESS                                   200
CPU_SM_JOIN                                             200
CPU_SORT                                                200
HASH                                                    200
IO                               560                     10
IO_ACCESS                        560                     10
IO_BYTES_PER_SEC                                         10
IO_IMC_ACCESS                                          1000
IO_RANDOM_ACCESS                3000                     10
IO_ROWS_PER_SEC                                     1000000
IO_SEQUENTIAL_ACCESS             560                     10
MEMCMP                                                  500
MEMCPY                           560                   1000

Values can be manually set using DBMS_STATS.SET_PROCESSING_RATE but they are silently ignored until the hidden parameter _OPTIMIZER_PROC_RATE_SOURCE is not set to MANUAL. Also they can be gathered using DBMS_STATS.GATHER_PROCESSING_RATE.
The value for an operation is inherited from its parent in case is missing (the hierarchy is visible in X$OPTIM_CALIB_STATS where each STATID_KKECSTATS has its parent is PSTATID_KKECSTATS).

The first line in green (two sections above) provides info about the table we want to scan: size in bytes, number of rows and avg row lenght.
The lines in orange is where the CPU DoP is computed, my understanding is here the CPU DoP is the greatest value (expressed in ms) between the “size in bytes / CPU_BYTES_PER_SEC (expressed in MB)” and “number of rows / CPU_ROWS_PER_SEC”, so in this case it’s 10 vs 78.8 = 78.8, which is then divided by 10k (unit=) giving a CPU DoP of 0.0078. The values used for the computation are also reported in the OTHER_XML column.
The lines in blue are the same as the 11.2.0.4 one, where the IO DoP is computed (again with the misleading value of 2).
The final line in red is where IO DoP (misleading here) and CPU DoP are combined together.

As for the 11.2.0.4 case, this SQL doesn’t quality for parallel execution because too fast (the second NO in the blue lines).

AUTO DOP PLANS EVALUATION
***************************************
Compilation completed with Dop: 1.
  Cost_io:  2736.000000  Cost_cpu: 74005180
  Card:     10000.000000  Bytes:    60070000.000000
  Cost:     2737.397801  Est_time:  7896ms
kkopqSetDopReason: Reason why we chose this DOP is: no expensive parallel operation.
No parallel scan operations
kkopqCombineDop: Dop:1 Hint:no
Query: compute:yes forced:no  scanDop:2 cpuDop:1 forceDop:0
kkopqSetDopReason: Reason why we chose this DOP is: parallel threshold.
Serial phase is inexpensive (2737.397801), stick to it.
Costing completed. DOP chosen: 1.

##### Q2 #####

Let’s now focus on Q2, which runs on the larger table T50K.
From 11.2.0.4

SINGLE TABLE ACCESS PATH 
  Single Table Cardinality Estimation for T50K[T50K] 
  Table: T50K  Alias: T50K
    Card: Original: 50000.000000  Rounded: 50000  Computed: 50000.00  Non Adjusted: 50000.00
  Access Path: TableScan
    Cost:  13793.68  Resp: 13793.68  Degree: 0
      Cost_io: 13786.00  Cost_cpu: 372959932
      Resp_io: 13786.00  Resp_cpu: 372959932
kkeCostToTime: using io calibrate stats 
 maxmbps=5(MB/s) maxpmbps=10(MB/s) 
 block_size=8192 mb_io_count=4 mb_io_size=30247 (bytes) 
 tot_io_size=398(MB) time=39789(ms)
AutoDOP: Consider caching for T50K[T50K](obj#70601) 
cost:13793.68 blkSize:8192 objSize:50897.00 marObjSize:48352.15 bufSize:37240.00 affPercent:80 smallTab:NO affinitized:NO
AutoDOP:parallel operation is set to NO
AutoDOP:parallel operation is set to YES
AutoDOP: Table/Index(#70601) access (scan) cost=13793.68 estTime=39789.45 unit=10000.00 dop=3 -> maxdop=3
  Best:: AccessPath: TableScan
         Cost: 13793.68  Degree: 1  Resp: 13793.68  Card: 50000.00  Bytes: 0

Table this time is around 398MB so 398 / 10 = 39789ms which is larger than 10 seconds (PARALLEL_MIN_TIME_THRESHOLD) hence the SQL is candidate to run using PX, reported in the second “AutoDOP: parallel operation is set to YES” (again this is a guess). The DoP is computed as usual with time/unit that is 3.9 truncated to 3.

AUTO DOP PLANS EVALUATION
***************************************
Compilation completed with Dop: 1.
  Cost_io:  13786.00  Cost_cpu: 372959931.68
  Card:     50000.00  Bytes:    300400000.00
  Cost:     13793.68  Est_time:  39789ms
kkopqCombineDop: Dop:3 Hint:no 
Query: compute:yes forced:no  computedDop:3 forceDop:0
Serial plan is expensive enough to be a candidate for parallelism (13794)
Signal reparse with DOP 3.

the SQL is then re-parsed and the plan with DoP 3 is found as cheaper than the serial one, hence accepted.

AUTO DOP PLANS EVALUATION
***************************************
Compilation completed with Dop: 3.
  Cost_io:  5105.93  Cost_cpu: 3888888.89
  Card:     50000.00  Bytes:    300400000.00
  Cost:     5106.01  Est_time:  14729ms
Comparing plan with dop=3 and plan with dop=1:
  dop=3 io=5105.93 cpu=3888889 cost=5106.01 card=50000 bytes=300400000 -> est=14729ms, scaled cost=7659.01
  dop=1 io=13786.00 cpu=372959932 cost=13793.68 card=50000 bytes=300400000 -> est=39789ms, scaled cost=13793.68
Plan with dop 3 is better. Scalability: 50
Current plan with dop=3 is better than best plan with dop=1
Costing completed. DOP chosen: 3.

Again, nothing new so far. Let’s now take a look at 12.1.0.2 for Q2:

kkecComputeAPDopCS: CPUtabBytes: 416948224.000000 CPUtabRows: 50000.000000  rowSize: 6008
kkecComputeAPDopCS: Basic Stats Level
kkecComputeDOP: Proc Rate: 1048576000.000000 - Cost: 416948224.000000 - Time: 397.632812
kkecComputeDOP: Proc Rate: 1000000.000000 - Cost: 50000.000000 - Time: 50.000000
kkecComputeAPDopCS: Before Filter Preds - CPU Dop: 0.039763
kkecComputeAPDopCS: Final Filter Predicate Bytes: 0.000000 Rows: 0.000000
kkecComputeAPDop: Final -  CPU Dop: 0.039763 Rate: 1000.000000
AutoDOP: Consider caching for T50K[T50K](obj#138273) 
cost:13793.044423 blkSize:8192 objSize:50897.00 marObjSize:48352.15 bufSize:41160.00 affPercent:80 smallTab:NO affinitized:NO
kkeCostToTime: using io calibrate stats maxpmbps=10(MB/s) 
 block_size=8192 mb_io_count=4 mb_io_size=30247 (bytes) 
 tot_io_size=398(MB) time=39788(ms)
AutoDOP:parallel operation is set to NO
AutoDOP:parallel operation is set to YES
AutoDOP: Table/Index(#138273) access (scan) cost=13793.044423 estTime=39787.63 unit=10000.00 dop=3.000 -> maxdop=3
kkecComputeAPDop: IO Dop: 3.000000 - CPU Dop: 0.039763

Little higher CPU DoP but still extremely low (0.039) and same IO DoP as 11.2.0.4, hence same DoP computed here as well.

AUTO DOP PLANS EVALUATION
***************************************
Compilation completed with Dop: 1.
  Cost_io:  13786.000000  Cost_cpu: 372959932
  Card:     50000.000000  Bytes:    300400000.000000
  Cost:     13793.044423  Est_time:  39788ms
kkopqCombineDop: Dop:3 Hint:no 
Query: compute:yes forced:no  scanDop:3 cpuDop:1 forceDop:0
Serial plan is expensive enough to be a candidate for parallelism (13793)
Signal reparse with DOP 3.
.....
AUTO DOP PLANS EVALUATION
***************************************
Compilation completed with Dop: 3.
  Cost_io:  5105.925926  Cost_cpu: 3888889
  Card:     50000.000000  Bytes:    300400000.000000
  Cost:     5105.999379  Est_time:  14729ms
kkopqCombineDop: Dop:3 Hint:no 
Query: compute:yes forced:no  scanDop:3 cpuDop:1 forceDop:0
Adopt new plan with dop=3
Costing completed. DOP chosen: 3.

So the new algorithm lead to the same DoP (which is good, no surprises) because the CPU DoP was too little to make a dent.

##### Q3 #####

Let’s now look into Q3 that is where things start to change and get a little more tricky (to make things easier to report I turned off _optimizer_distinct_agg_transform).

From 11.2.0.4

SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for T16M[T16M]
  Table: T16M  Alias: T16M
    Card: Original: 16000000.000000  Rounded: 16000000  Computed: 16000000.00  Non Adjusted: 16000000.00
  Access Path: TableScan
    Cost:  9697.62  Resp: 9697.62  Degree: 0
      Cost_io: 9643.00  Cost_cpu: 2653501900
      Resp_io: 9643.00  Resp_cpu: 2653501900
kkeCostToTime: using io calibrate stats
 maxmbps=5(MB/s) maxpmbps=10(MB/s)
 block_size=8192 mb_io_count=4 mb_io_size=30247 (bytes)
 tot_io_size=280(MB) time=27974(ms)
AutoDOP: Consider caching for T16M[T16M](obj#70603)
cost:9697.62 blkSize:8192 objSize:35597.00 marObjSize:33817.15 bufSize:37240.00 affPercent:80 smallTab:NO affinitized:NO
AutoDOP:parallel operation is set to NO
AutoDOP:parallel operation is set to YES
AutoDOP: Table/Index(#70603) access (scan) cost=9697.62 estTime=27973.89 unit=10000.00 dop=2 -> maxdop=2
  Best:: AccessPath: TableScan
         Cost: 9697.62  Degree: 1  Resp: 9697.62  Card: 16000000.00  Bytes: 0

so IO DoP is 2 and E-Time larger than 10 secs so we decide to go PX

AUTO DOP PLANS EVALUATION
***************************************
Compilation completed with Dop: 1.
  Cost_io:  9643.00  Cost_cpu: 2653501899.68
  Card:     1.00  Bytes:    5.00
  Cost:     9697.62  Est_time:  27974ms
kkopqCombineDop: Dop:2 Hint:no
Query: compute:yes forced:no  computedDop:2 forceDop:0
Serial plan is expensive enough to be a candidate for parallelism (9698)
Signal reparse with DOP 2.
....
AUTO DOP PLANS EVALUATION
***************************************
Compilation completed with Dop: 2.
  Cost_io:  5357.22  Cost_cpu: 1333333333.33
  Card:     1.00  Bytes:    5.00
  Cost:     5384.67  Est_time:  15533ms
Comparing plan with dop=2 and plan with dop=1:
  dop=2 io=5357.22 cpu=1333333333 cost=5384.67 card=1 bytes=5 -> est=15533ms, scaled cost=5384.67
  dop=1 io=9643.00 cpu=2653501900 cost=9697.62 card=1 bytes=5 -> est=27974ms, scaled cost=9697.62
Plan with dop 2 is better. Scalability: 50
Current plan with dop=2 is better than best plan with dop=1
Costing completed. DOP chosen: 2.

so 11.2.0.4 re-parsed the SQL and decided to go with DoP = 2 because of the IO DoP.

In 12.1.0.2 this time the CPU DoP makes a difference, from the 10053

kkecComputeAPDopCS: CPUtabBytes: 290586624.000000 CPUtabRows: 16000000.000000  rowSize: 5
kkecComputeAPDopCS: Basic Stats Level
kkecComputeDOP: Proc Rate: 1048576000.000000 - Cost: 290586624.000000 - Time: 277.125000
kkecComputeDOP: Proc Rate: 1000000.000000 - Cost: 16000000.000000 - Time: 16000.000000
kkecComputeAPDopCS: Before Filter Preds - CPU Dop: 1.600000
kkecComputeAPDopCS: Final Filter Predicate Bytes: 0.000000 Rows: 0.000000
kkecComputeAPDop: Final -  CPU Dop: 1.600000 Rate: 1000.000000
AutoDOP: Consider caching for T16M[T16M](obj#138277) 
cost:9658.102216 blkSize:8192 objSize:35472.00 marObjSize:33698.40 bufSize:41160.00 affPercent:80 smallTab:NO affinitized:NO
kkeCostToTime: using io calibrate stats maxpmbps=10(MB/s) 
 block_size=8192 mb_io_count=4 mb_io_size=30247 (bytes) 
 tot_io_size=279(MB) time=27860(ms)
AutoDOP:parallel operation is set to NO
AutoDOP:parallel operation is set to YES
AutoDOP: Table/Index(#138277) access (scan) cost=9658.102216 estTime=27859.91 unit=10000.00 dop=2.000 -> maxdop=2
kkecComputeAPDop: IO Dop: 2.000000 - CPU Dop: 1.600000
Transfer optimizer annotations for T16M[T16M]
kkecComputeGbyObyAggrDOP: rowSize: 5.000000
kkecComputeGbyObyAggrDOP: gbyRows: 16000000.000000 gbyBytes: 80000000.000000
kkecComputeGbyObyAggrDOP: obyRows: 16000000.000000 obyBytes: 80000000.000000
kkecComputeGbyObyAggrDOP: before aggr Gby - IO Dop: 0.000000  CPU Dop: 0.000000
kkecComputeDOP: Proc Rate: 1048576000.000000 - Cost: 80000000.000000 - Time: 76.293945
kkecComputeDOP: Proc Rate: 1000000.000000 - Cost: 16000000.000000 - Time: 16000.000000
kkecComputeGbyObyAggrDOP: Basic level Aggr DOP 1.600000
kkecComputeGbyObyAggrDOP: Gby - IO Dop: 0.000000 CPU Dop: 1.600000 GbyRate: 1000.000000
kkecComputeGbyObyAggrDOP: Oby - IO Dop: 0.000000 CPU Dop: 0.000000  ObyRate: 0.000000

the important parts from the extract above are a CPU DoP of 1.6 and an IO DoP of 2 because of the access to table T16M plus a Aggr DoP of 1.6 because of the count(distinct ).
Since E-Time > 10 secs we decide to go PX and trigger a reparse

AUTO DOP PLANS EVALUATION
***************************************
Compilation completed with Dop: 1.
  Cost_io:  9608.000000  Cost_cpu: 2652611720
  Card:     1.000000  Bytes:    5.000000
  Cost:     9658.102216  Est_time:  27860ms
kkopqCombineDop: Dop:2 Hint:no 
Query: compute:yes forced:no  scanDop:2 cpuDop:1 forceDop:0
Serial plan is expensive enough to be a candidate for parallelism (9658)
Signal reparse with DOP 2.

Now this is where things get a little tricky and my understanding could easily be even more wrong than before 😀

AUTO DOP PLANS EVALUATION
***************************************
Compilation completed with Dop: 2.
  Cost_io:  5337.777778  Cost_cpu: 1333333333
  Card:     1.000000  Bytes:    5.000000
  Cost:     5362.961620  Est_time:  15470ms
kkopqCombineDop: Dop:3 Hint:no 
Query: compute:yes forced:no  scanDop:2 cpuDop:3 forceDop:0
Adopt new plan with dop=2
Signal reparse with DOP 3.
....
AUTO DOP PLANS EVALUATION
***************************************
Compilation completed with Dop: 3.
  Cost_io:  3558.518519  Cost_cpu: 888888889
  Card:     1.000000  Bytes:    5.000000
  Cost:     3575.307747  Est_time:  10313ms
kkopqCombineDop: Dop:3 Hint:no 
Query: compute:yes forced:no  scanDop:2 cpuDop:3 forceDop:0
Adopt new plan with dop=3
Costing completed. DOP chosen: 3.

I omitted the details of the DoP trying to keep this extremely long post just a little shorter.
The three green lines in the previous two sections show the IO DoP and the CPU DoP, the IO DoP is consistently 2 but the CPU DoP starts as 1 and then it becomes 3.
Starting from the first re-parse the IO DoP is not computed anymore as it would provide no benefit since the table T16M is still large the same, the CBO is carrying on annotations (some details of what an annotation is is explained here, but nothing to do with that bug) on the pre-computed DoPs

Transfer optimizer annotations for T16M[T16M]
AutoDOP: dop(io):2.000
AutoDOP: rate(cpu):1000.000000 dop(cpu):1.600

and I think at this point the Aggr DoP (1.6) is added to the CPU DoP from T16M to come up with a CPU DoP of 3.
I’ve to admit I’m not sure why this isn’t done at the first iteration, maybe just to save time since a large IO DoP is enough “to get started” with a reparse.

This is where the new AutoDOP starts to behave differently and return a different DoP for a SQL (3 instead of 2).

I haven’t seen enough real life cases on AutoDOP in 12c to say if it works better than before but for sure it seems more aware of the whole plan and it factors in additional steps (i.e. those CPU only) trying to come up with a more accurate DoP.
I suspect most of the SQLs won’t have a different DoP just because most of the large parallel execution are constrained by IO scan rather than large CPU operations but it’s exactly on those environments with large sorts or aggregation that probably the new AutoDOP can surprise you, for the good or the bad 😉

April 27, 2015
by Mauro Pagano 2 Comments

Poor man SQL Monitor with SQLd360…kind of!

One of the most interesting section of a SQL Monitor report is IMHO the Metrics page where we can see on a timeline the impact / requirement on an execution in terms of CPU, IOPS, MBPS, PGA and TEMP.
Starting 11gR2, we can get something similar (not exactly the same) from ASH, this is from DBA_HIST_ACTIVE_SESS_HISTORY

For each TM_DELTA_TIME we know how much CPU and DB Time has been spent, as well as how many RW IOPS and RW and Interconnect (M)BPS per DELTA_TIME. The metrics are accumulated and reported at the time of the ASH sample “over the Delta Time” that roughly matches with the ASH sampling one, so ie. from V$ACTIVE_SESSION_HISTORY we are able to see how many RW IOPS per second (since the DELTA_TIME is roughly going to be the second) are accounted by the session we focus on. Also each time the ASH sample is taken the PGA and TEMP consumed by the session at that time is tracked.

Starting SQLd360 v1511 those columns are now collected and aggregated/charted over time so we can look at the single consumption via SQL Monitor (or from raw ASH data) and at the same time evaluate the historical trend from SQLd360. The PlansAnalysis page in SQLd360 now looks something like this

Screen Shot 2015-04-27 at 10.54.29 AM

Other new additions since the last post (v1507) are:

SQLd360 now uses a configuration file to determine how many days to collect info for and which “major” external APIs to collect / ignore
Several reports have been rewritten to better represent info from Parallel Executions
New sections provide info on
- Fix Controls (default and non-default) settings
- NLS settings
- Adaptive Cursor Sharing info
- SQL Monitoring info
- Partition Statistics history info
- SQLs with the same Force Matching Signature

plus as usual bug fixes here and there.

Thanks to everybody who reported issues and provided feedback for new features! 🙂

March 26, 2015
by Mauro Pagano Leave a comment

SQLd360 and ORA-7445 [evaopn3()]

A few kind people reported to me that SQLd360 disconnected in the middle of the execution. Looking into the details the cause was an ORA-7445 evaopn3() but such errors are a little tricky to track down because they only show up with a combination of faulty plan (that’s the bug) with specific data, that’s why the error never reproduced in my labs as well as at several other folks’ systems.

Galo Balda was kind and patience enough to report the error and test my suspects (Thanks Galo!!!) so we were able to identify a way to avoid the error.

The ORA-7445 was caused by bug 12672969 (I suggest you check the details in MOS, the bug is pretty “common”) and starting today SQLd360 works it around so that your execution shouldn’t disconnect anymore!

If you tried SQLd360 in the past but ignored it because of the unexpected disconnect then I suggest you give it one more shot now that the error should be taken care!