Mauro Pagano's Blog


Introducing TUNAs360 – Tuning with Active Sessions without Diagnostic Pack


I’m happy to introduce a new free tool that allows to get initial insights about a database using eDB360/SQLd360 like charts without Diagnostic Pack!!!

(TUN)ing with (A)ctive (s)essions, aka TUNAs360, requires no installation and no parameter when executed; it observes the workload for few minutes and then collects a set of reports on such load.

The idea behind the tool is to help getting started in those situations where there is a concern with database performance but the reason is unknown, the classic “the database is slow but I don’t know why”.
In case further investigation is necessary then eDB360 or SQLd360 can be leveraged.

Once executed TUNAs360 provides several reports for the whole database (cluster and node specific) plus additional reports to help drill into the details of the top 5 sessions and top 5 SQLs during the time the load was observed.

The tool belongs to the “360 family” so it has the same look and many of the charts of its bigger brothers, the main page looks like this

Screen Shot 2016-03-21 at 8.00.05 PM

To execute the tool:

  1. Download the tool from the link on the right side of the screen.
  2. Unzip file
  3. Navigate into tunas360-master folder
  4. Connect as a DBA user and run script tunas360.sql

The tool will run for a few minutes and it will generate a zip file named tunas360_<dbname>_<host>

As every other tool on its first version I don’t expect it to be perfect so please please please let me know if you run into problems so that I can fix them 😀

Feedbacks and suggestions are welcome as usual too!!!



Leave a comment

Top Executions SQL Monitoring style reports in SQLd360

I think SQL Monitoring is an amazing tool when it comes to SQL Tuning but I often find that for one reason or another the report is almost never around for post-mortem investigation.
Historical SQL Monitoring reports have been introduced in 12c but still the decision to collect or no the report for the SQL ID we are interested in depends on several factors we have no control on after the issue happened 😦

SQLd360 tried to alleviate this “issue” including ASH-based charts that provided similar information, those have been available for a long time in the Plan Details page, organized by Plan Hash Value (PHV).
The main difference between SQL Monitoring (SM) and SQLd360 is the scope. SM provides info for a single execution while SQLd360 aggregated info from all the executions active at a specific point in time. Info for recent executions are (V$ACTIVE_SESSION_HISTORY) are aggregated by minute while historical executions (DBA_HIST_ACTIVE_SESS_HISTORY) get aggregated by hour.
That section of the SQLd360 looks like this:

Screen Shot 2016-01-14 at 9.46.54 AM.png


Starting SQLd360 v1601 a new set of reports is provided for the Top-N executions per PHV, where a “top execution” is one of those with the highest number of samples.
The goal is to replicate, as close as possible, the SQL Monitoring functionalities using just the ASH data, which tend to be around for much longer than a SQL Monitoring report 🙂

The data is not aggregated so the granularity is 1s for samples from memory (V$) and 10s for samples from ASH (DBA_HIST).
With this level of granularity combined with the non aggregation other types of data visualizations make sense, like i.e. timelines to identify when a specific section of the execution plan was active (the which section is the “bad guy” can be answered from the tree chart), that’s the “Plan Step IDs timeline” in the figure below that will need its own blog post 😀 .
This new section of the report looks like this:

Screen Shot 2016-01-14 at 9.58.46 AM.png

So i.e. for each execution we can see the active sessions (plural in case of PX and not “Average Active Session” since there is no timeframe aggregation) with associated CPU/wait events over time, just like in SQL Monitoring (to be fair SQL Monitoring is able to provide sub-second details, which are not available in ASH).

Screen Shot 2016-01-14 at 10.04.50 AM.png

Hopefully you’ll find this new section useful, specially when ASH is all you got 😀

Final note: the number Top-N executions is configurable in file sql/sqld360_00_config.sql altering the value for sqld360_conf_num_top_execs (default is 3).

As usual feedback, comments, suggestions are all more than welcome!



Execution plan tree temperature


During the Xmas holidays I made several additions to SQLd360 I had on my TODO list for quite a while, I’ll try to blog about the most important ones in the coming days.

Something I wanted to do for a long time was to make understanding execution plan easier, I hope the tree representation introduced here achieved such goal.

SQLd360 v1601 takes this chart a step further, marking nodes with different colors depending on how often such execution plan step shows up in ASH. Basically depending on “how hot” (the temperature) each step is a color between yellow and red is used to color the node, making it easier to determine in which section of the plan you should put your attention.
All those steps that never show up in ASH are represented in white.

So in example a silly SQL like the following takes 21 secs in my database

select count(*) 
  from (select rownum n1 from dual connect by rownum <= 10000),   
       (select rownum n1 from dual connect by rownum <= 100000)

| Id  | Operation                         | Name | E-Rows | Cost (%CPU)|
|   0 | SELECT STATEMENT                  |      |        |     4 (100)|
|   1 |  SORT AGGREGATE                   |      |      1 |            |
|   2 |   MERGE JOIN CARTESIAN            |      |      1 |     4   (0)|
|   3 |    VIEW                           |      |      1 |     2   (0)|
|   4 |     COUNT                         |      |        |            |
|   5 |      CONNECT BY WITHOUT FILTERING |      |        |            |
|   6 |       FAST DUAL                   |      |      1 |     2   (0)|
|   7 |    BUFFER SORT                    |      |      1 |     4   (0)|
|   8 |     VIEW                          |      |      1 |     2   (0)|
|   9 |      COUNT                        |      |        |            |
|  10 |       CONNECT BY WITHOUT FILTERING|      |        |            |
|  11 |        FAST DUAL                  |      |      1 |     2   (0)|

18 ASH samples captured the execution on the BUFFER SORT step and 3 samples captured the SORT AGGREGATE.

The execution plan tree temperature looks like this

Screen Shot 2016-01-06 at 5.56.28 PM

Hopefully this will make it easier for people who don’t look into execution plans all the day to quickly spot where they should focus their attention 😀

Leave a comment

OR-expansion and Parallel Execution

DISCLAIMER 😀 : probably because of my previous job but I REALLY don’t like to write blog posts about potential or confirmed bugs in the Oracle code. This time is no exception, the focus is on the curious plan and not the bugs around it.

I have a SQL with an OR chain that undergoes OR-expansion transformation, executed in parallel. In the real case where I noticed it both decision (OR-expansion and parallel) were made by the optimizer, I’ll force them both to keep the testcase very simple.

Here is the single table:

drop table t1;
create table t1 as select * from dba_objects;
exec dbms_stats.gather_table_stats(user,'T1');
create index t_idx on t1(status);

and my SQL is

select /*+ USE_CONCAT(@SEL$1) PARALLEL(4) */ owner, count(*) mycount
  from t1 a
 where status between 'A' and 'B' or status = 'VALID'
 group by owner;

The USE_CONCAT hint will force the OR-expansion, it works in this case even though it’s VERY hard to manually write it properly (specially after it was enhanced to be semantically correct, bug fix 8429273).

The execution plan is way more complex than I would expect

| Id  | Operation                        | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
|   0 | SELECT STATEMENT                 |          |    15 |   195 |    47   (7)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR                  |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)            | :TQ20002 |    15 |   195 |            |          |  Q2,02 | P->S | QC (RAND)  |
|   3 |    HASH GROUP BY                 |          |    15 |   195 |            |          |  Q2,02 | PCWP |            |
|   4 |     PX RECEIVE                   |          |    15 |   195 |            |          |  Q2,02 | PCWP |            |
|   5 |      PX SEND HASH                | :TQ20001 |    15 |   195 |            |          |  Q2,01 | P->P | HASH       |
|   6 |       HASH GROUP BY              |          |    15 |   195 |            |          |  Q2,01 | PCWP |            |
|   7 |        CONCATENATION             |          |       |       |            |          |  Q2,01 | PCWP |            |
|   8 |         PX BLOCK ITERATOR        |          |       |       |            |          |  Q2,01 | PCWC |            |
|*  9 |          TABLE ACCESS FULL       | T1       | 10385 |   131K|    23   (5)| 00:00:01 |  Q2,01 | PCWP |            |
|  10 |         BUFFER SORT              |          |       |       |            |          |  Q2,01 | PCWC |            |
|  11 |          PX RECEIVE              |          |  8072 |   102K|    23   (5)| 00:00:01 |  Q2,01 | PCWP |            |
|  12 |           PX SEND ROUND-ROBIN    | :TQ20000 |  8072 |   102K|    23   (5)| 00:00:01 |        | S->P | RND-ROBIN  |
|  13 |            BUFFER SORT           |          |    15 |   195 |            |          |        |      |            |
|  14 |             PX COORDINATOR       |          |       |       |            |          |        |      |            |
|  15 |              PX SEND QC (RANDOM) | :TQ10000 |  8072 |   102K|    23   (5)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|  16 |               BUFFER SORT        |          |    15 |   195 |            |          |  Q1,00 | PCWP |            |
|  17 |                PX BLOCK ITERATOR |          |  8072 |   102K|    23   (5)| 00:00:01 |  Q1,00 | PCWC |            |
|* 18 |                 TABLE ACCESS FULL| T1       |  8072 |   102K|    23   (5)| 00:00:01 |  Q1,00 | PCWP |            |

Predicate Information (identified by operation id):
   9 - filter("STATUS"<='B' AND "STATUS">='A')
  18 - filter("STATUS"='VALID' AND (LNNVL("STATUS"<='B') OR LNNVL("STATUS">='A')))

The most surprising part IMHO is the second DFO tree under step 14.
Basically the first branch of the CONCATENATION is executed under one DFO tree while the second under another DFO tree.
The main side effect of this choice is a likely inferior performance, mostly caused by:

  1. The slaves that execute step 17-18 will have to send the data to the QC at step 14 that in turn will distribute the data again to the other PX processes. Basically the execute will be P->S->P.
  2. Having two DFO trees means that each of them can potentially be downgraded, further slowing down the performance.

This plan seems to be the consequence of the fix for Wrong Result bug 9732434 (superseded by 10411726, superseded by 17006727) . Strange enough the fix for 9732434 has a fix_control to turn the fix off (WR fixes, as well as ORA-600/7445 don’t have reasons to be turned off usually), probably just some small parts rather than the whole fix.

Turning the fix_control for 9732434 OFF (OPT_PARAM(‘_fix_control’ ‘9732434:0’)) then the plan has a more familiar shape

| Id  | Operation                 | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
|   0 | SELECT STATEMENT          |          |    15 |   195 |    47   (7)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR           |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)     | :TQ10001 |    15 |   195 |            |          |  Q1,01 | P->S | QC (RAND)  |
|   3 |    HASH GROUP BY          |          |    15 |   195 |            |          |  Q1,01 | PCWP |            |
|   4 |     PX RECEIVE            |          |    15 |   195 |            |          |  Q1,01 | PCWP |            |
|   5 |      PX SEND HASH         | :TQ10000 |    15 |   195 |            |          |  Q1,00 | P->P | HASH       |
|   6 |       HASH GROUP BY       |          |    15 |   195 |            |          |  Q1,00 | PCWP |            |
|   7 |        CONCATENATION      |          |       |       |            |          |  Q1,00 | PCWP |            |
|   8 |         PX BLOCK ITERATOR |          |       |       |            |          |  Q1,00 | PCWC |            |
|*  9 |          TABLE ACCESS FULL| T1       | 10385 |   131K|    23   (5)| 00:00:01 |  Q1,00 | PCWP |            |
|  10 |         PX BLOCK ITERATOR |          |  8072 |   102K|    23   (5)| 00:00:01 |  Q1,00 | PCWC |            |
|* 11 |          TABLE ACCESS FULL| T1       |  8072 |   102K|    23   (5)| 00:00:01 |  Q1,00 | PCWP |            |

Predicate Information (identified by operation id):
   9 - filter("STATUS"<='B' AND "STATUS">='A')
  11 - filter("STATUS"='VALID' AND (LNNVL("STATUS"<='B') OR LNNVL("STATUS">='A'))) 

So if you have some SQLs that use OR-expansion and run parallel you may want to consider taking a closer look at them.

Leave a comment

Partial fetch from SQL*Plus

This post is more a reminder to myself on how to perform “partial” FETCH in SQL*Plus with no need to use PL/SQL. I’ll need this post in the future for sure since I seem to forget it over time.

By partial I mean FETCH N rows and then wait, leaving the cursor open. This is similar to PL/SQL when the LIMIT parameter is used in a BULK COLLECT or to what the Siebel Connector does when fetching row to fill up a page on the screen. This is also the behavior Oracle expects your application to have when you set FIRST_ROWS_n optimization mode (and this is the whole point of simulating something like this 🙂

Assuming you want to FETCH 10 rows at a time, in SQL*Plus

SQL> set arraysize 10 pagesize 13 pau on

In details:

  • arraysize determines how many rows to fetch at a time
  • pagesize determines the number of lines in each page
  • pause instructs SQL*Plus to stop and wait for you to press ENTER for the next

The trick is to have each row from the rowset account for 1 row in the page otherwise such row will consume more “space” out of the pagination and FETCH will bring more rows than actually displayed (which is not a big deal but over time you will get a page with no need to execute any FETCH). The value 13 in this case accounts for 10 rows from the FETCH plus 1 row for the columns names, one for the line below the column name and one for the empty line above.

SQL> select object_id from dba_objects;




and from the SQL trace

PARSING IN CURSOR #140058741813400 len=33 dep=0 uid=94 oct=3 lid=94 tim=501040793433 hv=1491819215 ad='83d99250' sqlid='ch9x9ppcfqqqg'
select object_id from dba_objects
FETCH #140058741813400:c=1036,e=1367,p=0,cr=9,cu=0,mis=0,r=1,dep=0,og=1,plh=2969894235,tim=501040794966
FETCH #140058741813400:c=0,e=72,p=0,cr=7,cu=0,mis=0,r=10,dep=0,og=1,plh=2969894235,tim=501040795575
FETCH #140058741813400:c=0,e=64,p=0,cr=1,cu=0,mis=0,r=10,dep=0,og=1,plh=2969894235,tim=501041831892
FETCH #140058741813400:c=0,e=68,p=0,cr=1,cu=0,mis=0,r=10,dep=0,og=1,plh=2969894235,tim=501045599796

Little disclaimer: because SQL*Plus fetches 1 row only on the first FETCH call we get 1 row “leftover” from one call that we carry to the next. Basically the first page needs 2 calls to fill up the screen, while from the second on we fetch 10 rows but fill the screen with the 1 from the previous call plus 9 from the current call. This is why there are 4 FETCH calls to fill up 3 pages.

1 Comment

Histograms come histograms go

Actually the title is likely incorrect, in the sense that I think histograms don’t “go” once they come. Or at least I’ve never seen a case where consistently gathering statistics using method_opt “SIZE AUTO” makes a histogram go away sometime after the same syntax made it show up (while it’s common for a histogram to just show up sometime in the future because of column usage).

Recently I’ve seen several systems where custom statistics gathering procedures were coupled with the automatic stats gathering job (voluntarily or not is another story… 😀 ) triggering all sort of weirdness in the statistics. Reason was a syntax mismatch between the custom job (method_opt “SIZE 1”) and the automatic job (default is method_opt “SIZE AUTO”).

A reliable way to figure out if histograms popped up / disappeared is to look at WRI$_OPTSTAT_HISTGRM_HISTORY, too bad this table is usually large and expensive to access (and not partitioned).
Another table is available though, table WRI$_OPTSTAT_HISTHEAD_HISTORY, which is the historical version of the dictionary table HIST_HEAD$ (1 row per histogram) but while in the dictionary there are BUCKET_CNT and ROW_CNT columns (so it’s easy to spot a histogram) such columna are gone once the info are stored in WRI$. Wouldn’t it be nicer to just have a column in WRI$_OPTSTAT_HISTHEAD_HISTORY saying “histogram was in place” ?

Let’s see if we can figure out a (approximate) way that can save us a trip to WRI$_OPTSTAT_HISTGRM_HISTORY.

Table WRI$_OPTSTAT_HISTHEAD_HISTORY has a column FLAGS that is populated using the following expression in (extracted from a 10046 trace of DBMS_STATS.GATHER_TABLE_STATS)

bitand(h.spare2,7) + 8 + decode(h.cache_cnt,0,0,64)

while it’s tricky and risky to guess what SPARE2 might store it should be pretty easy to play with CACHE_CNT, we just need to gather stats with and without histograms to see how the value changes.

SQL> create table test_flag (n1 number);
SQL> insert into test_flag select mod(rownum,200) n1 from dual connect by rownum <= 10000; SQL> insert into test_flag select * from test_flag;
--repeat it a few times to make the data grow
SQL> commit;
SQL> exec dbms_stats.gather_table_stats('MPAGANO','TEST_FLAG',method_opt=>'FOR ALL COLUMNS SIZE 254');
SQL> select obj#, col#, bucket_cnt, row_cnt, cache_cnt, null_cnt, sample_size from sys.hist_head$ where obj# in (select object_id from dba_objects where object_name = 'TEST_FLAG');

---------- ---------- ---------- ---------- ---------- ---------- -----------
     24897          1       5487        200         10          0        5487

SQL> exec dbms_stats.gather_table_stats('MPAGANO','TEST_FLAG',method_opt=>'FOR ALL COLUMNS SIZE 1');
SQL> select obj#, col#, bucket_cnt, row_cnt, cache_cnt, null_cnt, sample_size from sys.hist_head$ where obj# in (select object_id from dba_objects where object_name = 'TEST_FLAG');

---------- ---------- ---------- ---------- ---------- ---------- -----------
     24897          1          1          0          0          0     2560000

In the first run we gathered stats with “SIZE AUTO” and a 200 buckets histogram was created, sample size ~=5500 is not surprising and it matches with AUTO_SAMPLE_SIZE used. In this case CACHE_CNT was 10.

In the second run we gathered stats with “SIZE 1” and no histogram was created. In this case CACHE_CNT was 0.

I couldn’t find with 100% accuracy what CACHE_CNT represents (actually if any reader knows it please comment this post) but my educated guess is that’s number of entries to be cached in the row cache; unfortunately this is really just a guess since I didn’t even find a parameter to change this value (again, whoever knows better please let me know).

I ran multiple tests and it seems like CACHE_CNT has a value larger than 0 only when a histogram is present (assuming my guess of what is means is accurate), thus the value of WRI$_OPTSTAT_HISTHEAD_HISTORY.FLAG is bumped by 64 only when a histogram is present.
With this info in mind it’s now pretty easy to spot if a histogram was present in the past just looking at  WRI$_OPTSTAT_HISTHEAD_HISTORY.FLAG without having to access the larger WRI$_OPTSTAT_HISTGRM_HISTORY.

SQLd360 v1526 has a new column HAD_HISTOGRAM added to the Column Statistics Version report built exactly on the assumptions made here.

Feedbacks welcome as usual, especially if they can prove wrong what I just said 😀


Introducing Pathfinder, is there a better plan for my SQL?

Pathfinder is a new free tool that provides an easy way to execute a SQL statement under multiple optimizer environments in order to generate different execution plans, potentially discovering better plans. The tool can also be used to quickly identify workarounds for wrong result bugs as well as slow parse issues.

Pathfinder uses the same brute-force approach of SQLT XPLORE, executing the SQL for every single CBO parameter and fix_control present in the database, with no installation required. This make Pathfinder easy to run in any environment, including a production one (assuming you understand *WELL* what the tool does, how it works and what it means for your database).

Each test adds approximately 1 second overhead to the time the SQL takes to complete and the amount of tests considered is pretty high, in it’s around 1100 and around 1500 in, thus I suggest to use Pathfinder on SQLs that take at most a few seconds to run (or just be ready to leave Pathfinder run for a loooong time).

The tool executes the SQL statement present in file script.sql (provided), just modify the script and replace the seeded SQL with the one you want to execute. In the same script you can also add ALTER SESSION commands that will be executed before the desired SQL, this is helpful in case you want to influence the analysis providing a different starting point.

To execute the tool just download it from the Download section on the right side of this page (or from here, also the tool will be released as standalone script in the same zip file as SQLd360) and follow these steps:

  1. Unzip, navigate to the root pathfinder directory, and connect as SYS to the database.
    $ unzip
    $ cd pathfinder-master
    $ sqlplus / as sysdba
  2. Open file script.sql and add your SQL in there. Make sure to add the mandatory comment /* ^^pathfinder_testid */. The file name must be script.sql, if you wish to change the name then just ping me.
  3. Execute pathfinder.sql and provide the connect string to connect as the user that is supposed to run the script.
  4. Unzip output file pathfinder_<dbname>_<date>.zip into a directory on your PC and review the results starting from file 00001_pathfinder_<dbname>_<date>_index.html

SQL> @pathfinder

Parameter 1:
Full connect string of the database to run the SQL into
If the database is remote or a PDB then you must include
the TNS alias i.e. scott/tiger@orcl

Enter value for 1: mpagano/mpagano@orcl
Building Pathfinder driver scripts

1) "pathfinder_{ 20151026_1906 (00001)" 19:06:40 BASELINE

2) "pathfinder_{ 20151026_1906 (00002)" 19:06:42 "_add_stale_mv_to_dependency_list" = FALSE

File created.

For each test Pathfinder will show the setting considered as well as some basic statistics like Plan Hash Value, Elapsed Time, CPU Time, Buffer Gets and Rows processed. Also two links are present, one points to the details of the execution plan generated while the other points to the details of V$SQL.

The main page will look something like this:

Screen Shot 2015-10-26 at 9.18.44 PM

Pathfinder also considers the effect of Cardinality Feedback executing the SQL multiple times until the plan stops changing (or CFB gives up after the fifth parse), for all those settings that lead to a first execution plan different than the baseline.
This is why for some settings you will see a Test# with an additional digit, the “reparse” number:

Screen Shot 2015-10-26 at 9.22.23 PM

In example for Test# 117 above the CBO generated a different plan (PHV 1837274416) than the baseline (PHV 1838229974) and Cardinality Feedback kicked in 3 times generating a different plan each time, until the 3rd parse when the CBO ended up with the same plan as the first execution (and no different plan was generated after).

This is the first release of the tool so I expect it to be far from perfect but I’ve already used it several times with success. Hopefully with time (and your feedbacks :D) the tool will get better and better.

I hope you enjoy it and please don’t hesitate to get in touch with me for feedbacks, suggestions and bugs!!!