Dynamic Sampling just got better

December 3, 2014 by Mauro Pagano 2 Comments

12c brought several new features in the SQL performance area and the most advertised one is probably Adaptive Query Optimization. The Adaptive Plans feature has been described in a lot of details (specially the adaptive join methods part) and most of the focus in Adaptive Statistics has been on SQL Plan Directives and Automatic Re-optimization, not much has been said about Dynamic Statistics, aka the new dynamic sampling (DS)
Since DS is also helpful outside of the whole Adaptive Query Optimization area so let’s just focus on it.

The original DS would kick in “by default” in absence of object statistics, increasing the level would make the CBO use it even under a few other conditions (details here).
Starting 11.2 (ER 7452863) DS can dynamically auto-adjust its own level at a query level (hence kicking in even when it would have not) when the SQL runs in parallel and large tables are involved.
Basically the old DS usually does a good job in helping the CBO get good estimations *for single tables*

in absence of stats
when the level is “high” (from 4 up)
when the SQL runs in parallel and tables are large

What just said implies DS doesn’t help much if

the table has stats but the level is default
the SQL runs in serial
we are estimating anything else but single table selectivity

The “new” dynamic sampling has been introduced in 12.1.0.1 and backported to 11.2.0.4 and it’s “hidden” behind the new level 11 or AUTO (value AUTO is accepted only in 12c) and it brings a lot of very interesting features

DS can kick in automatically when the CBO realizes it doesn’t have too solid estimations
DS can be applied on joins, group by and whole query blocks (haven’t looked too much yet into the query blocks one)
DS results can be cached using result cache and exported transparently using TCB

Let’s play a little with an example, test is from a 12.1.0.2.
Two tables with uniform data distribution and strong correlation across columns within the same table and between tables.

SQL&gt; create table t1 as select mod(rownum, 50) n1, mod(rownum, 50) n2 from dual connect by rownum &lt;= 1000;
SQL&gt; create table t2 as select mod(rownum, 50) n1, mod(rownum, 50) n2 from dual connect by rownum &lt;= 1000;
SQL&gt; exec dbms_stats.gather_table_stats(user,'T1');
SQL&gt; exec dbms_stats.gather_table_stats(user,'T2');

By default a join selectivity/cardinality would be underestimated since there is correlation but nothing to inform the CBO about it

SQL&gt; select count(*) from t1,t2 where t1.n1 = t2.n1 and t1.n2 = t2.n2;

Plan hash value: 906334482

---------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |
|*  2 |   HASH JOIN         |      |      1 |   1000 |  20000 |  &lt;---
|   3 |    TABLE ACCESS FULL| T1   |      1 |   1000 |   1000 |
|   4 |    TABLE ACCESS FULL| T2   |      1 |   1000 |   1000 |
---------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access(&quot;T1&quot;.&quot;N1&quot;=&quot;T2&quot;.&quot;N1&quot; AND &quot;T1&quot;.&quot;N2&quot;=&quot;T2&quot;.&quot;N2&quot;)

With such a simple join condition we can use a column group to fix the estimation and indeed it works fine

SQL&gt;exec dbms_stats.gather_table_stats(user,'T1', method_opt =&gt; 'FOR COLUMNS (N1,N2) SIZE AUTO');
SQL&gt;exec dbms_stats.gather_table_stats(user,'T2', method_opt =&gt; 'FOR COLUMNS (N1,N2) SIZE AUTO');

SQL&gt; select count(*) from t1,t2 where t1.n1 = t2.n1 and t1.n2 = t2.n2;

---------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |
|*  2 |   HASH JOIN         |      |      1 |  20000 |  20000 |
|   3 |    TABLE ACCESS FULL| T1   |      1 |   1000 |   1000 |
|   4 |    TABLE ACCESS FULL| T2   |      1 |   1000 |   1000 |
---------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access(&quot;T1&quot;.&quot;N1&quot;=&quot;T2&quot;.&quot;N1&quot; AND &quot;T1&quot;.&quot;N2&quot;=&quot;T2&quot;.&quot;N2&quot;)

Anyway a column group can only be used with equality conditions (a column group is basically a hash of all the columns in the group) so as soon as we change our SQL then we are back on the understimated selectivity/cardinality

SQL&gt; select count(*) from t1,t2 where t1.n1 = t2.n1 and t1.n2 &gt; t2.n2;

---------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |
|*  2 |   HASH JOIN         |      |      1 |   9800 |      0 |
|   3 |    TABLE ACCESS FULL| T1   |      1 |   1000 |   1000 |
|   4 |    TABLE ACCESS FULL| T2   |      1 |   1000 |   1000 |
---------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access(&quot;T1&quot;.&quot;N1&quot;=&quot;T2&quot;.&quot;N1&quot;)
       filter(&quot;T1&quot;.&quot;N2&quot;&gt;&quot;T2&quot;.&quot;N2&quot;)

So let’s turn on the new DS and check how it helps

alter session set optimizer_dynamic_sampling = 11;
select count(*) from t1,t2 where t1.n1 = t2.n1 and t1.n2 &gt; t2.n2;

---------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |
|*  2 |   HASH JOIN         |      |      1 |      1 |      0 |
|   3 |    TABLE ACCESS FULL| T1   |      1 |   1000 |   1000 |
|   4 |    TABLE ACCESS FULL| T2   |      1 |   1000 |   1000 |
---------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access(&quot;T1&quot;.&quot;N1&quot;=&quot;T2&quot;.&quot;N1&quot;)
       filter(&quot;T1&quot;.&quot;N2&quot;&gt;&quot;T2&quot;.&quot;N2&quot;)

Note
-----
   - dynamic statistics used: dynamic sampling (level=AUTO)

Notice how the join cardinality is now very accurate (and as a consequence we switched to HJ).
From the 10053 and 10046 (I wrapped around the 10046 to avoid a long scrollbar)

Join Card:  9800.000000 = outer (1000.000000) * inner (1000.000000) * sel (0.009800)
Join cardinality for HJ/SMJ (no post filters):  20000.000000, outer: 1000.000000, inner: 1000.000000, sel: 0.009800
&gt;&gt; Join Card adjusted from 9800.000000 to 1.000000 due to adaptive dynamic sampling, prelen=2
Adjusted Join Cards: adjRatio=0.000102 cardHjSmj=1.000000 cardHjSmjNPF=2.040816 cardNlj=1.000000 cardNSQ=1.000000 cardNSQ_na=9800.000000
Join Card - Rounded: 1 Computed: 1.000000


PARSING IN CURSOR #140546644531144 len=332 dep=1 uid=118 oct=3 lid=118 tim=173271358375 hv=1119371978 ad='c6e539a0' sqlid='f814kux1bhhqa'
SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring optimizer_features_enable(default) no_parallel result_cache(snapshot=3600) */ 
SUM(C1) FROM (SELECT /*+ qb_name(&quot;innerQuery&quot;) NO_INDEX_FFS( &quot;T1#1&quot;)  */ 1 AS C1 
FROM &quot;T2&quot; &quot;T2#0&quot;, &quot;T1&quot; &quot;T1#1&quot; WHERE (&quot;T1#1&quot;.&quot;N1&quot;=&quot;T2#0&quot;.&quot;N1&quot;) AND (&quot;T1#1&quot;.&quot;N2&quot;&gt;&quot;T2#0&quot;.&quot;N2&quot;)) innerQuery

Notice how the recursive DS SQL joins the two tables together, this wasn’t possible before.
At the same time we want to avoid triggering those SQLs over and over so the result is also cached

SELECT id, type, creation_timestamp, block_count, column_count, pin_count, row_count, name
  FROM V$RESULT_CACHE_OBJECTS
 WHERE cache_id = 'b1ytyjnksv3up3yjtn35bq96jx';  2    3  

	    ID TYPE       CREATION_ BLOCK_COUNT COLUMN_COUNT  PIN_COUNT  ROW_COUNT NAME
---------- ---------- --------- ----------- ------------ ---------- ---------- --------------------------------------------------------------------------------------------------------------------------------
	     4 Result     03-DEC-14 	      1	           1	      0	         1 SELECT /* DS_SVC */ /*+ cursor_sharing_exact dynamic_sampling(0) no_sql_tune no_monitoring optimizer_features_enable(default) re

The previous test shows how the new DS can help in more complex scenario (joins, group by, etc) than before but then at the same time it means more time could potentially be spent on DS. For this reason the recursive SQL gets a limited amount of time after which the execution aborts as well as a default cap for tables to join.
Those conditions as well as the result cache usage can be configured playing with the following parameters

NAME				            DESCRIPTION									         SESSION_VALUE
------------------------------- ---------------------------------------------------- --------------
_optimizer_ads_max_table_count	maximum number of tables in a join under ADS		 0
_optimizer_ads_time_limit	    maximum time limit (seconds) under ADS				 0
_optimizer_ads_use_result_cache use result cache for ADS queries					 TRUE

Adaptive Plans can help in avoid picking up an incorrect join method but the join order cannot be changed at runtime but getting a more accurate estimation at parse time could encourage a more efficient join order, that’s why I like the new DS so much.

IMHO this new feature has a lot of potential because it helps in addressing exactly those areas (joins, group by) where it’s particularly challenging for the CBO to get consistently accurate estimations.

2 thoughts on “Dynamic Sampling just got better”

tylerv
April 21, 2015 at 10:33 pm

Mauro – thanks for the post on this, it was recently very helpful in researching a problem. Enjoyed your training day at HOTSOS 2015 as well! You might be interested in this post I just made since you mentioned the new AUTO/11 setting for dynamic stats: https://dban00b.wordpress.com/2015/04/21/311/ … interesting behavior and side effect I ran into.

LikeLike

Pingback: Adaptive Dynamic Sampling : DYNAMIC_SAMPLING_RESULT & STALE_PERCENT | Hatem Mahmoud Oracle's blog