The New Features guide for 12c reports enhancements to AutoDOP but unfortunately there is not much details so let’s try to dig a little to find out which are (some of) those enhancements.
The focus on this blog post is on how the Degree of Parallelism (DoP) is computed for three SQLs in 11.2.0.4 vs 12.1.0.2 (so not considering ADAPTIVE degree policy) reading a single table, joins will come in a separate post.
First requirement to play with AutoDOP is to have IO Calibration statistics in place (UPDATE: the requirement is in 11.2.0.x, in 12c default values are used when IO Calibration are missing, 200MB/s is the default for MAX_PMBPS).
Since all my tests are on VMs on my personal machine the IO response time is very unstable so for the sake of consistent results I’ll manually set the IO Calibration stats (AFAIK this isn’t recommended, also the INSERT requires an instance restart to take effect).
We’ll use 10 as value for PMBPS (Max MBPS during parallel scan).
SQL> desc dba_rsrc_io_calibrate Name Null? Type --------------------- -------- -------------- START_TIME TIMESTAMP(6) END_TIME TIMESTAMP(6) MAX_IOPS NUMBER MAX_MBPS NUMBER MAX_PMBPS NUMBER LATENCY NUMBER NUM_PHYSICAL_DISKS NUMBER SQL> insert into dba_rsrc_io_calibrate values (systimestamp, systimestamp, 10000, 5, 10, 5, 10); SQL> commit;
And here are the three tables with 10k, 50k and 16M rows each used in the tests.
Each row is created to account for pretty much 1 block (in a 8k block_size database) in the first two tables so those tables will grow large in size even with just a few rows, the third table is designed to be very small.
SQL> drop table t10k purge; SQL> drop table t50k purge; SQL> drop table t16m purge; SQL> create table t10k (n1 number, c1 char(2000), c2 char(2000), c3 char(2000)); SQL> create table t50k (n1 number, c1 char(2000), c2 char(2000), c3 char(2000)); SQL> create table t16m (n1 number, c1 char(1), c2 char(1), c3 char(1)); SQL> insert into t10k select rownum, 'a','b','c' from dual connect by rownum <= 10000; SQL> insert into t50k select rownum, 'a','b','c' from dual connect by rownum <= 50000; SQL> insert into t16m select rownum, 'a', 'b', 'c' from dual connect by rownum <= 250000; SQL> insert into t16m select * from t16m; SQL> / SQL> / SQL> / SQL> / SQL> / SQL> exec dbms_stats.gather_table_stats(user,'T10K'); SQL> exec dbms_stats.gather_table_stats(user,'T50K'); SQL> exec dbms_stats.gather_table_stats(user,'T16M'); SQL> select table_name, num_rows, blocks from user_tables where table_name like 'T%' order by 2; TABLE_NAME NUM_ROWS BLOCKS ------------------------------ ---------- ---------- T10K 10000 10097 T50K 50000 50897 T16M 16000000 35472
We’ll use three SQLs to check different scenarios and how they show up in the 10053 trace.
EXPLAIN PLAN has been used just to force a hard parse each time (it won’t have any “negative” side-effect here).
explain plan for select /* Q1 */ * from t10k; explain plan for select /* Q2 */ * from t50k; explain plan for select /* Q3 */ count(distinct n1) from t16m;
As we already known (nice post here) in 11.2.0.x the DoP under AutoDOP is based on “IO cost” for the operation and the calculation is performed for each table individually at the time the CBO calculates cardinality and best access method for the table (aka you’ll find it under SINGLE TABLE ACCESS PATH section of the 10053).
##### Q1 #####
The CBO decides to execute Q1 in serial in both versions, let’s look into the details.
From 11.2.0.4 trace, the first 4 lines in green are the input values used to perform the math, the next two are about In-Memory Parallel Execution (nothing to do with In-Memory Option) and the last three lines are the most interesting ones.
SINGLE TABLE ACCESS PATH
Single Table Cardinality Estimation for T10K[T10K]
Table: T10K Alias: T10K
Card: Original: 10000.000000 Rounded: 10000 Computed: 10000.00 Non Adjusted: 10000.00
Access Path: TableScan
Cost: 2737.52 Resp: 2737.52 Degree: 0
Cost_io: 2736.00 Cost_cpu: 74005180
Resp_io: 2736.00 Resp_cpu: 74005180
kkeCostToTime: using io calibrate stats
maxmbps=5(MB/s) maxpmbps=10(MB/s)
block_size=8192 mb_io_count=4 mb_io_size=30247 (bytes)
tot_io_size=79(MB) time=7897(ms)
AutoDOP: Consider caching for T10K[T10K](obj#70600)
cost:2737.52 blkSize:8192 objSize:10097.00 marObjSize:9592.15 bufSize:37240.00 affPercent:80 smallTab:NO affinitized:YES
AutoDOP:parallel operation is set to NO
AutoDOP:parallel operation is set to NO
AutoDOP: Table/Index(#70600) access (scan) cost=2737.52 estTime=7896.70 unit=10000.00 dop=2 -> maxdop=2
Best:: AccessPath: TableScan
Cost: 2737.52 Degree: 1 Resp: 2737.52 Card: 10000.00 Bytes: 0
The following is based on my understanding and tests so it could easily be wrong.
The first “AutoDOP: parallel operation is set to NO” refers to the lack of PARALLEL hint in the SQL, if you add the hint then the message becomes YES prefixed by another one with the degree specified in the hint (if any).
The second “AutoDOP: parallel operation is set to NO” is based on the fact this SQL is not “expensive enough” to qualify for parallel execution, basically the E-Time is lower than PARALLEL_MIN_TIME_THRESHOLD.
The last line shows the details of the math. In 11.2.0.4 the DoP is indeed computed based on how large is the segment (tot_io_size=79MB) and how fast we can read from disk (maxpmbps=10MB/s) so 79 (MB) / 10 (MB/s) = (time=) 7897ms. The value is then divided by 10k (unit=, not sure where the value comes from since it doesn’t seem related to PARALLEL_MIN_TIME_THRESHOLD even though it does represent 10s) and the result (0.7) considered as DoP.
DoP 0.7 means this execution will be serial.
The value of dop=2 -> maxdop=2 is a little misleading here (maybe it’s a default in the code), PX is rejected here and the SQL goes serial.
The trace confirms it just few lines below
AUTO DOP PLANS EVALUATION *************************************** Compilation completed with Dop: 1. Cost_io: 2736.00 Cost_cpu: 74005179.68 Card: 10000.00 Bytes: 60070000.00 Cost: 2737.52 Est_time: 7897ms kkopqSetDopReason: Reason why we chose this DOP is: no expensive parallel operation. No parallel scan operations kkopqCombineDop: Dop:1 Hint:no Query: compute:yes forced:no computedDop:2 forceDop:0 kkopqSetDopReason: Reason why we chose this DOP is: parallel threshold. Serial phase is inexpensive (2738), stick to it. Costing completed. DOP chosen: 1.
Here we can see the DoP chosen was 1 because no operation was expensive enough to warrant for PX as well as the E-Time for the SQL (7.8s) is below the PARALLEL_MIN_TIME_THRESHOLD (10s).
Nothing new so far, let’s look into the 10053 from 12.1.0.2 for the same SQL, Q1 as see how is that different.
SINGLE TABLE ACCESS PATH Single Table Cardinality Estimation for T10K[T10K] SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE Table: T10K Alias: T10K Card: Original: 10000.000000 Rounded: 10000 Computed: 10000.000000 Non Adjusted: 10000.000000 Scan IO Cost (Disk) = 2736.000000 Scan CPU Cost (Disk) = 74005179.680000 Total Scan IO Cost = 2736.000000 (scan (Disk)) = 2736.000000 Total Scan CPU Cost = 74005179.680000 (scan (Disk)) = 74005179.680000 Access Path: TableScan Cost: 2737.397801 Resp: 2737.397801 Degree: 0 Cost_io: 2736.000000 Cost_cpu: 74005180 Resp_io: 2736.000000 Resp_cpu: 74005180 Best:: AccessPath: TableScan Cost: 2737.397801 Degree: 1 Resp: 2737.397801 Card: 10000.000000 Bytes: 0.000000
The first difference is the AutoDOP computation does not happen during the access path decision, that’s why no AutoDOP tag is present in the previous extract.
The calculations are performed at the end of the join selection for the query block (no join in this case and just one query block) and here is where the major changes shows up (steps are marked with different colors to make it easier to reference them):
kkecComputeAPDopCS: CPUtabBytes: 82714624.000000 CPUtabRows: 10000.000000 rowSize: 6007 kkecComputeAPDopCS: Basic Stats Level kkecComputeDOP: Proc Rate: 1048576000.000000 - Cost: 82714624.000000 - Time: 78.882812 kkecComputeDOP: Proc Rate: 1000000.000000 - Cost: 10000.000000 - Time: 10.000000 kkecComputeAPDopCS: Before Filter Preds - CPU Dop: 0.007888 kkecComputeAPDopCS: Final Filter Predicate Bytes: 0.000000 Rows: 0.000000 kkecComputeAPDop: Final - CPU Dop: 0.007888 Rate: 1000.000000 AutoDOP: Consider caching for T10K[T10K](obj#138272) cost:2737.397801 blkSize:8192 objSize:10097.00 marObjSize:9592.15 bufSize:41160.00 affPercent:80 smallTab:NO affinitized:YES kkeCostToTime: using io calibrate stats maxpmbps=10(MB/s) block_size=8192 mb_io_count=4 mb_io_size=30247 (bytes) tot_io_size=79(MB) time=7896(ms) AutoDOP:parallel operation is set to NO AutoDOP:parallel operation is set to NO AutoDOP: Table/Index(#138272) access (scan) cost=2737.397801 estTime=7896.34 unit=10000.00 dop=2.000 -> maxdop=2 kkecComputeAPDop: IO Dop: 2.000000 - CPU Dop: 0.007888
The first big difference is the AutoDOP in 12c also includes the CPU DoP for the operations in the query block, including not only those operations that have an IO cost (i.e. a FTS) but also those operations that have just a CPU cost (i.e. a Join).
The way the operations are normalized is via “processing rates”, in 12c there is a new view that provides to the CBO with such numbers, V$OPTIMIZER_PROCESSING_RATE.
SQL> select operation_name, to_number(manual_value) manual, to_number(calibration_value) calibration, to_number(default_value) default_ from v$optimizer_processing_rate order by 1; OPERATION_NAME MANUAL CALIBRATION DEFAULT_ ------------------------- ---------- ----------- ---------- AGGR 560 1000 ALL 200 CPU 200 CPU_ACCESS 200 CPU_AGGR 200 CPU_BYTES_PER_SEC 560 1000 CPU_FILTER 200 CPU_GBY 200 CPU_HASH_JOIN 200 CPU_IMC_BYTES_PER_SEC 2000 CPU_IMC_ROWS_PER_SEC 2000000 CPU_JOIN 200 CPU_NL_JOIN 200 CPU_RANDOM_ACCESS 200 CPU_ROWS_PER_SEC 1000000 CPU_SEQUENTIAL_ACCESS 200 CPU_SM_JOIN 200 CPU_SORT 200 HASH 200 IO 560 10 IO_ACCESS 560 10 IO_BYTES_PER_SEC 10 IO_IMC_ACCESS 1000 IO_RANDOM_ACCESS 3000 10 IO_ROWS_PER_SEC 1000000 IO_SEQUENTIAL_ACCESS 560 10 MEMCMP 500 MEMCPY 560 1000
Values can be manually set using DBMS_STATS.SET_PROCESSING_RATE but they are silently ignored until the hidden parameter _OPTIMIZER_PROC_RATE_SOURCE is not set to MANUAL. Also they can be gathered using DBMS_STATS.GATHER_PROCESSING_RATE.
The value for an operation is inherited from its parent in case is missing (the hierarchy is visible in X$OPTIM_CALIB_STATS where each STATID_KKECSTATS has its parent is PSTATID_KKECSTATS).
The first line in green (two sections above) provides info about the table we want to scan: size in bytes, number of rows and avg row lenght.
The lines in orange is where the CPU DoP is computed, my understanding is here the CPU DoP is the greatest value (expressed in ms) between the “size in bytes / CPU_BYTES_PER_SEC (expressed in MB)” and “number of rows / CPU_ROWS_PER_SEC”, so in this case it’s 10 vs 78.8 = 78.8, which is then divided by 10k (unit=) giving a CPU DoP of 0.0078. The values used for the computation are also reported in the OTHER_XML column.
The lines in blue are the same as the 11.2.0.4 one, where the IO DoP is computed (again with the misleading value of 2).
The final line in red is where IO DoP (misleading here) and CPU DoP are combined together.
As for the 11.2.0.4 case, this SQL doesn’t quality for parallel execution because too fast (the second NO in the blue lines).
AUTO DOP PLANS EVALUATION *************************************** Compilation completed with Dop: 1. Cost_io: 2736.000000 Cost_cpu: 74005180 Card: 10000.000000 Bytes: 60070000.000000 Cost: 2737.397801 Est_time: 7896ms kkopqSetDopReason: Reason why we chose this DOP is: no expensive parallel operation. No parallel scan operations kkopqCombineDop: Dop:1 Hint:no Query: compute:yes forced:no scanDop:2 cpuDop:1 forceDop:0 kkopqSetDopReason: Reason why we chose this DOP is: parallel threshold. Serial phase is inexpensive (2737.397801), stick to it. Costing completed. DOP chosen: 1.
##### Q2 #####
Let’s now focus on Q2, which runs on the larger table T50K.
From 11.2.0.4
SINGLE TABLE ACCESS PATH Single Table Cardinality Estimation for T50K[T50K] Table: T50K Alias: T50K Card: Original: 50000.000000 Rounded: 50000 Computed: 50000.00 Non Adjusted: 50000.00 Access Path: TableScan Cost: 13793.68 Resp: 13793.68 Degree: 0 Cost_io: 13786.00 Cost_cpu: 372959932 Resp_io: 13786.00 Resp_cpu: 372959932 kkeCostToTime: using io calibrate stats maxmbps=5(MB/s) maxpmbps=10(MB/s) block_size=8192 mb_io_count=4 mb_io_size=30247 (bytes) tot_io_size=398(MB) time=39789(ms) AutoDOP: Consider caching for T50K[T50K](obj#70601) cost:13793.68 blkSize:8192 objSize:50897.00 marObjSize:48352.15 bufSize:37240.00 affPercent:80 smallTab:NO affinitized:NO AutoDOP:parallel operation is set to NO AutoDOP:parallel operation is set to YES AutoDOP: Table/Index(#70601) access (scan) cost=13793.68 estTime=39789.45 unit=10000.00 dop=3 -> maxdop=3 Best:: AccessPath: TableScan Cost: 13793.68 Degree: 1 Resp: 13793.68 Card: 50000.00 Bytes: 0
Table this time is around 398MB so 398 / 10 = 39789ms which is larger than 10 seconds (PARALLEL_MIN_TIME_THRESHOLD) hence the SQL is candidate to run using PX, reported in the second “AutoDOP: parallel operation is set to YES” (again this is a guess). The DoP is computed as usual with time/unit that is 3.9 truncated to 3.
AUTO DOP PLANS EVALUATION
***************************************
Compilation completed with Dop: 1.
Cost_io: 13786.00 Cost_cpu: 372959931.68
Card: 50000.00 Bytes: 300400000.00
Cost: 13793.68 Est_time: 39789ms
kkopqCombineDop: Dop:3 Hint:no
Query: compute:yes forced:no computedDop:3 forceDop:0
Serial plan is expensive enough to be a candidate for parallelism (13794)
Signal reparse with DOP 3.
the SQL is then re-parsed and the plan with DoP 3 is found as cheaper than the serial one, hence accepted.
AUTO DOP PLANS EVALUATION
***************************************
Compilation completed with Dop: 3.
Cost_io: 5105.93 Cost_cpu: 3888888.89
Card: 50000.00 Bytes: 300400000.00
Cost: 5106.01 Est_time: 14729ms
Comparing plan with dop=3 and plan with dop=1:
dop=3 io=5105.93 cpu=3888889 cost=5106.01 card=50000 bytes=300400000 -> est=14729ms, scaled cost=7659.01
dop=1 io=13786.00 cpu=372959932 cost=13793.68 card=50000 bytes=300400000 -> est=39789ms, scaled cost=13793.68
Plan with dop 3 is better. Scalability: 50
Current plan with dop=3 is better than best plan with dop=1
Costing completed. DOP chosen: 3.
Again, nothing new so far. Let’s now take a look at 12.1.0.2 for Q2:
kkecComputeAPDopCS: CPUtabBytes: 416948224.000000 CPUtabRows: 50000.000000 rowSize: 6008 kkecComputeAPDopCS: Basic Stats Level kkecComputeDOP: Proc Rate: 1048576000.000000 - Cost: 416948224.000000 - Time: 397.632812 kkecComputeDOP: Proc Rate: 1000000.000000 - Cost: 50000.000000 - Time: 50.000000 kkecComputeAPDopCS: Before Filter Preds - CPU Dop: 0.039763 kkecComputeAPDopCS: Final Filter Predicate Bytes: 0.000000 Rows: 0.000000 kkecComputeAPDop: Final - CPU Dop: 0.039763 Rate: 1000.000000 AutoDOP: Consider caching for T50K[T50K](obj#138273) cost:13793.044423 blkSize:8192 objSize:50897.00 marObjSize:48352.15 bufSize:41160.00 affPercent:80 smallTab:NO affinitized:NO kkeCostToTime: using io calibrate stats maxpmbps=10(MB/s) block_size=8192 mb_io_count=4 mb_io_size=30247 (bytes) tot_io_size=398(MB) time=39788(ms) AutoDOP:parallel operation is set to NO AutoDOP:parallel operation is set to YES AutoDOP: Table/Index(#138273) access (scan) cost=13793.044423 estTime=39787.63 unit=10000.00 dop=3.000 -> maxdop=3 kkecComputeAPDop: IO Dop: 3.000000 - CPU Dop: 0.039763
Little higher CPU DoP but still extremely low (0.039) and same IO DoP as 11.2.0.4, hence same DoP computed here as well.
AUTO DOP PLANS EVALUATION *************************************** Compilation completed with Dop: 1. Cost_io: 13786.000000 Cost_cpu: 372959932 Card: 50000.000000 Bytes: 300400000.000000 Cost: 13793.044423 Est_time: 39788ms kkopqCombineDop: Dop:3 Hint:no Query: compute:yes forced:no scanDop:3 cpuDop:1 forceDop:0 Serial plan is expensive enough to be a candidate for parallelism (13793) Signal reparse with DOP 3. ..... AUTO DOP PLANS EVALUATION *************************************** Compilation completed with Dop: 3. Cost_io: 5105.925926 Cost_cpu: 3888889 Card: 50000.000000 Bytes: 300400000.000000 Cost: 5105.999379 Est_time: 14729ms kkopqCombineDop: Dop:3 Hint:no Query: compute:yes forced:no scanDop:3 cpuDop:1 forceDop:0 Adopt new plan with dop=3 Costing completed. DOP chosen: 3.
So the new algorithm lead to the same DoP (which is good, no surprises) because the CPU DoP was too little to make a dent.
##### Q3 #####
Let’s now look into Q3 that is where things start to change and get a little more tricky (to make things easier to report I turned off _optimizer_distinct_agg_transform).
From 11.2.0.4
SINGLE TABLE ACCESS PATH
Single Table Cardinality Estimation for T16M[T16M]
Table: T16M Alias: T16M
Card: Original: 16000000.000000 Rounded: 16000000 Computed: 16000000.00 Non Adjusted: 16000000.00
Access Path: TableScan
Cost: 9697.62 Resp: 9697.62 Degree: 0
Cost_io: 9643.00 Cost_cpu: 2653501900
Resp_io: 9643.00 Resp_cpu: 2653501900
kkeCostToTime: using io calibrate stats
maxmbps=5(MB/s) maxpmbps=10(MB/s)
block_size=8192 mb_io_count=4 mb_io_size=30247 (bytes)
tot_io_size=280(MB) time=27974(ms)
AutoDOP: Consider caching for T16M[T16M](obj#70603)
cost:9697.62 blkSize:8192 objSize:35597.00 marObjSize:33817.15 bufSize:37240.00 affPercent:80 smallTab:NO affinitized:NO
AutoDOP:parallel operation is set to NO
AutoDOP:parallel operation is set to YES
AutoDOP: Table/Index(#70603) access (scan) cost=9697.62 estTime=27973.89 unit=10000.00 dop=2 -> maxdop=2
Best:: AccessPath: TableScan
Cost: 9697.62 Degree: 1 Resp: 9697.62 Card: 16000000.00 Bytes: 0
so IO DoP is 2 and E-Time larger than 10 secs so we decide to go PX
AUTO DOP PLANS EVALUATION *************************************** Compilation completed with Dop: 1. Cost_io: 9643.00 Cost_cpu: 2653501899.68 Card: 1.00 Bytes: 5.00 Cost: 9697.62 Est_time: 27974ms kkopqCombineDop: Dop:2 Hint:no Query: compute:yes forced:no computedDop:2 forceDop:0 Serial plan is expensive enough to be a candidate for parallelism (9698) Signal reparse with DOP 2. .... AUTO DOP PLANS EVALUATION *************************************** Compilation completed with Dop: 2. Cost_io: 5357.22 Cost_cpu: 1333333333.33 Card: 1.00 Bytes: 5.00 Cost: 5384.67 Est_time: 15533ms Comparing plan with dop=2 and plan with dop=1: dop=2 io=5357.22 cpu=1333333333 cost=5384.67 card=1 bytes=5 -> est=15533ms, scaled cost=5384.67 dop=1 io=9643.00 cpu=2653501900 cost=9697.62 card=1 bytes=5 -> est=27974ms, scaled cost=9697.62 Plan with dop 2 is better. Scalability: 50 Current plan with dop=2 is better than best plan with dop=1 Costing completed. DOP chosen: 2.
so 11.2.0.4 re-parsed the SQL and decided to go with DoP = 2 because of the IO DoP.
In 12.1.0.2 this time the CPU DoP makes a difference, from the 10053
kkecComputeAPDopCS: CPUtabBytes: 290586624.000000 CPUtabRows: 16000000.000000 rowSize: 5 kkecComputeAPDopCS: Basic Stats Level kkecComputeDOP: Proc Rate: 1048576000.000000 - Cost: 290586624.000000 - Time: 277.125000 kkecComputeDOP: Proc Rate: 1000000.000000 - Cost: 16000000.000000 - Time: 16000.000000 kkecComputeAPDopCS: Before Filter Preds - CPU Dop: 1.600000 kkecComputeAPDopCS: Final Filter Predicate Bytes: 0.000000 Rows: 0.000000 kkecComputeAPDop: Final - CPU Dop: 1.600000 Rate: 1000.000000 AutoDOP: Consider caching for T16M[T16M](obj#138277) cost:9658.102216 blkSize:8192 objSize:35472.00 marObjSize:33698.40 bufSize:41160.00 affPercent:80 smallTab:NO affinitized:NO kkeCostToTime: using io calibrate stats maxpmbps=10(MB/s) block_size=8192 mb_io_count=4 mb_io_size=30247 (bytes) tot_io_size=279(MB) time=27860(ms) AutoDOP:parallel operation is set to NO AutoDOP:parallel operation is set to YES AutoDOP: Table/Index(#138277) access (scan) cost=9658.102216 estTime=27859.91 unit=10000.00 dop=2.000 -> maxdop=2 kkecComputeAPDop: IO Dop: 2.000000 - CPU Dop: 1.600000 Transfer optimizer annotations for T16M[T16M] kkecComputeGbyObyAggrDOP: rowSize: 5.000000 kkecComputeGbyObyAggrDOP: gbyRows: 16000000.000000 gbyBytes: 80000000.000000 kkecComputeGbyObyAggrDOP: obyRows: 16000000.000000 obyBytes: 80000000.000000 kkecComputeGbyObyAggrDOP: before aggr Gby - IO Dop: 0.000000 CPU Dop: 0.000000 kkecComputeDOP: Proc Rate: 1048576000.000000 - Cost: 80000000.000000 - Time: 76.293945 kkecComputeDOP: Proc Rate: 1000000.000000 - Cost: 16000000.000000 - Time: 16000.000000 kkecComputeGbyObyAggrDOP: Basic level Aggr DOP 1.600000 kkecComputeGbyObyAggrDOP: Gby - IO Dop: 0.000000 CPU Dop: 1.600000 GbyRate: 1000.000000 kkecComputeGbyObyAggrDOP: Oby - IO Dop: 0.000000 CPU Dop: 0.000000 ObyRate: 0.000000
the important parts from the extract above are a CPU DoP of 1.6 and an IO DoP of 2 because of the access to table T16M plus a Aggr DoP of 1.6 because of the count(distinct ).
Since E-Time > 10 secs we decide to go PX and trigger a reparse
AUTO DOP PLANS EVALUATION
***************************************
Compilation completed with Dop: 1.
Cost_io: 9608.000000 Cost_cpu: 2652611720
Card: 1.000000 Bytes: 5.000000
Cost: 9658.102216 Est_time: 27860ms
kkopqCombineDop: Dop:2 Hint:no
Query: compute:yes forced:no scanDop:2 cpuDop:1 forceDop:0
Serial plan is expensive enough to be a candidate for parallelism (9658)
Signal reparse with DOP 2.
Now this is where things get a little tricky and my understanding could easily be even more wrong than before 😀
AUTO DOP PLANS EVALUATION *************************************** Compilation completed with Dop: 2. Cost_io: 5337.777778 Cost_cpu: 1333333333 Card: 1.000000 Bytes: 5.000000 Cost: 5362.961620 Est_time: 15470ms kkopqCombineDop: Dop:3 Hint:no Query: compute:yes forced:no scanDop:2 cpuDop:3 forceDop:0 Adopt new plan with dop=2 Signal reparse with DOP 3. .... AUTO DOP PLANS EVALUATION *************************************** Compilation completed with Dop: 3. Cost_io: 3558.518519 Cost_cpu: 888888889 Card: 1.000000 Bytes: 5.000000 Cost: 3575.307747 Est_time: 10313ms kkopqCombineDop: Dop:3 Hint:no Query: compute:yes forced:no scanDop:2 cpuDop:3 forceDop:0 Adopt new plan with dop=3 Costing completed. DOP chosen: 3.
I omitted the details of the DoP trying to keep this extremely long post just a little shorter.
The three green lines in the previous two sections show the IO DoP and the CPU DoP, the IO DoP is consistently 2 but the CPU DoP starts as 1 and then it becomes 3.
Starting from the first re-parse the IO DoP is not computed anymore as it would provide no benefit since the table T16M is still large the same, the CBO is carrying on annotations (some details of what an annotation is is explained here, but nothing to do with that bug) on the pre-computed DoPs
Transfer optimizer annotations for T16M[T16M] AutoDOP: dop(io):2.000 AutoDOP: rate(cpu):1000.000000 dop(cpu):1.600
and I think at this point the Aggr DoP (1.6) is added to the CPU DoP from T16M to come up with a CPU DoP of 3.
I’ve to admit I’m not sure why this isn’t done at the first iteration, maybe just to save time since a large IO DoP is enough “to get started” with a reparse.
This is where the new AutoDOP starts to behave differently and return a different DoP for a SQL (3 instead of 2).
I haven’t seen enough real life cases on AutoDOP in 12c to say if it works better than before but for sure it seems more aware of the whole plan and it factors in additional steps (i.e. those CPU only) trying to come up with a more accurate DoP.
I suspect most of the SQLs won’t have a different DoP just because most of the large parallel execution are constrained by IO scan rather than large CPU operations but it’s exactly on those environments with large sorts or aggregation that probably the new AutoDOP can surprise you, for the good or the bad 😉
July 30, 2015 at 2:14 pm
Hi Mauro,
thanks for posting this and digging into it. Many details to go through and think through!
Cheers,
Randolf
LikeLiked by 1 person
July 30, 2015 at 2:56 pm
Mauro
Thanks for the effort you’ve done in order to share with us your investigations
Mohamed Houri
LikeLike
August 17, 2015 at 5:38 pm
HI Mauro,
one thing I forgot to mention: The 11.2 Auto DOP algorithm already includes some component that considers the number of rows estimated to be processed for Parallel DML statements – which means that the higher the number of estimated rows to process the higher the DOP will be for Parallel DML operations. So there has to be some part of the Auto DOP computation that includes this – not sure how this is handled in the new 12c Auto DOP code, though.
Randolf
LikeLiked by 1 person
August 18, 2015 at 9:15 am
Hi Randolf,
I didn’t look into the details for the algorithm with PDML in place but I makes sense to me, would you agree?
Did you blog about it or do you have any small example I can play with? I’m asking as I’d like to play with it in 12c too 🙂
Thanks,
Mauro
LikeLike
Pingback: Yes, You Must Use CALIBRATE_IO. No, You Mustn’t Use It To Test Storage Performance. | Kevin Closson's Blog: Platforms, Databases and Storage