Mauro Pagano's Blog

Missing a Bloom filter…

3 Comments

The other day I stumbled into an execution plan that made me pause for a while so hopefully if you see something similar tomorrow you will save some time 🙂

The plan was about 100 steps long but a specific join is where I want to focus on so stripping all the “noise” I reproduce it in house with the following steps and got to a plan that looked like this .

create table mydim (filter1 number, col1 number);
create table myfact (col1 number, col2 number)
partition by range(col1)
subpartition by hash(col2)
subpartitions 16
(partition p1 values less than (5),
 partition p2 values less than (10),
 partition p3 values less than (15),
 partition p4 values less than (20),
 partition p5 values less than (25));
insert into mydim select mod(rownum,10), rownum from dual connect by rownum <= 100;
insert into myfact select mod(rownum,25), rownum from dual connect by rownum <= 100000;
commit;
exec dbms_Stats.gather_table_stats(user,'MYDIM');
exec dbms_Stats.gather_table_stats(user,'MYFACT');
alter table myfact parallel 2;

select count(*) 
  from myfact, mydim 
 where mydim.filter1 = 3 
   and mydim.col1 = myfact.col2;

--------------------------------------------------------------------------------------
| Id |Operation                          |Name    |Rows  |Cost (%CPU)| Pstart| Pstop |
--------------------------------------------------------------------------------------
|   0|SELECT STATEMENT                   |        |    1 |  558   (1)|       |       |
|   1| SORT AGGREGATE                    |        |    1 |           |       |       |
|   2|  PX COORDINATOR                   |        |      |           |       |       |
|   3|   PX SEND QC (RANDOM)             |:TQ10001|    1 |           |       |       |
|   4|    SORT AGGREGATE                 |        |    1 |           |       |       |
|*  5|     HASH JOIN                     |        |   10 |  558   (1)|       |       |
|   6|      BUFFER SORT                  |        |      |           |       |       |
|   7|       PART JOIN FILTER CREATE     |:BF0000 |   10 |    3   (0)|       |       |
|   8|        PX RECEIVE                 |        |   10 |    3   (0)|       |       |
|   9|         PX SEND BROADCAST         |:TQ10000|   10 |    3   (0)|       |       |
|* 10|          TABLE ACCESS FULL        |MYDIM   |   10 |    3   (0)|       |       |
|  11|      PX PARTITION HASH JOIN-FILTER|        |  100K|  555   (1)|:BF0000|:BF0000|
|* 12|       TABLE ACCESS FULL           |MYFACT  |  100K|  555   (1)|     1 |    80 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("MYDIM"."COL1"="MYFACT"."COL2")
  10 - filter("MYDIM"."FILTER1"=3)
  12 - filter(SYS_OP_BLOOM_FILTER(:BF0001,"MYFACT"."COL2"))

So it’s a two table join where a Bloom filter (BF0000) is created based on the data extracted from MYDIM and it’s later used to prune on MYFACT, the strange part is the FTS at step 12 uses another Bloom Filter (BF0001) that is nowhere to be found in the plan.

MYFACT is RANGE partitioned on COL1 and HASH subpartitioned on COL2 and BF0000 is used to determine which subpartitions to scan. Being the table composite partitioned combined with BF pruning the risk of getting into something like this was high, but in my case I did have an evidence of a BF applied in Pstart/Pstop, I just didn’t have any evidence of where the other BF came from.
One of the things I usually do when I can’t figure out something is prove those things that look legit first to determine what can be trusted and what not (it helps a lot when working wrong results).
Is it just a display glitch in the filter section and BF0001 should be BF0000? or is it a display glitch in the plan that is missing some steps? Is BF0000 even used to prune? Is BF0001 really used to skip rows? etc etc.

First step was to prove BF0000 was indeed used to prune. Event 10128 tracks down pruning info during execution so it’s a great way to confirm things if you can run the SQL, details here.
From one of the slaves trace (call time RUN is the only one in the slaves trace, the others are in the QC one):

Partition Iterator Information:
  partition level = ABSOLUTE
  call time = RUN
  order = ASCENDING
  Partition iterator for level 1:
   iterator = RANGE [0, 4]
   index = 0
partition iterator for level 2:
   iterator = ARRAY [count= 8, max = 16] = { 2 3 4 5 6 13 14 15 }
   index = 2

Level 1/2 is the partition/subpartition level so from the trace above we accessed all the partitions (no surprise here because there is no filter on col1) but accessed only 8 subpartitions out of 16 so looks like BF0000 was indeed used to skip partitions. When you use this trace just remember partition “IDs” are counted from 0 while partition_position in the dictionary starts from 1 so the numbers are all shifted by 1 (ie. the subpartition 2 above is indeed the 3rd one).

Next step was to verify if BF0001 is really used to filter out rows on the FTS or not so I added a comment on the SQL to make the SQL ID different and ran it again so that I could use format=>’ALLSTATS ALL’ to get aggregated stats from all the slaves+QC.

select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS ALL'));
select /* B */ count(*) from myfact, mydim where mydim.filter1 = 3 and mydim.col1 = myfact.col2

-----------------------------------------------------------------------------------------
| Id |Operation                          |Name    |Starts|E-Rows |Pstart |Pstop  |A-Rows|
-----------------------------------------------------------------------------------------
|   0|SELECT STATEMENT                   |        |     1|       |       |       |     1|
|   1| SORT AGGREGATE                    |        |     1|     1 |       |       |     1|
|   2|  PX COORDINATOR                   |        |     1|       |       |       |     2|
|   3|   PX SEND QC (RANDOM)             |:TQ10001|     0|     1 |       |       |     0|
|   4|    SORT AGGREGATE                 |        |     2|     1 |       |       |     2|
|*  5|     HASH JOIN                     |        |     2|    10 |       |       |    10|
|   6|      BUFFER SORT                  |        |     2|       |       |       |    20|
|   7|       PART JOIN FILTER CREATE     |:BF0000 |     2|    10 |       |       |    20|
|   8|        PX RECEIVE                 |        |     2|    10 |       |       |    20|
|   9|         PX SEND BROADCAST         |:TQ10000|     0|    10 |       |       |     0|
|* 10|          TABLE ACCESS FULL        |MYDIM   |     1|    10 |       |       |    10|
|  11|      PX PARTITION HASH JOIN-FILTER|        |     2|   100K|:BF0000|:BF0000|    10|
|* 12|       TABLE ACCESS FULL           |MYFACT  |    80|   100K|     1 |    80 |    10|
-----------------------------------------------------------------------------------------

Each partition has 20k rows and each subpart around 1250 so in absence of BF applied the FTS would return 50k rows (1250 rows per subpart * 8 subpart * 5 part) but the A-Rows shows the number of rows was 10 (this time, during other runs sometime little higher) so it really looks like BF0001 existed and was applied.

So BFs are real and are used, time to doubt the exec plan.
The easiest way to do it was to test in a future version, the plan above came from 11.2.0.4 so it was natural to test 12.1.0.2 and surprise surprise, the plan made much more sense over there:

---------------------------------------------------------------------------------------
| Id |Operation                           |Name    |Rows  |Cost (%CPU)| Pstart| Pstop |
---------------------------------------------------------------------------------------
|   0|SELECT STATEMENT                    |        |    1 |12114   (1)|       |       |
|   1| SORT AGGREGATE                     |        |    1 |           |       |       |
|   2|  PX COORDINATOR                    |        |      |           |       |       |
|   3|   PX SEND QC (RANDOM)              |:TQ10001|    1 |           |       |       |
|   4|    SORT AGGREGATE                  |        |    1 |           |       |       |
|*  5|     HASH JOIN                      |        |   10 |12114   (1)|       |       |
|   6|      BUFFER SORT                   |        |      |           |       |       |
|   7|       JOIN FILTER CREATE           |:BF0001 |   10 |    3   (0)|       |       |
|   8|        PART JOIN FILTER CREATE     |:BF0000 |   10 |    3   (0)|       |       |
|   9|         PX RECEIVE                 |        |   10 |    3   (0)|       |       |
|  10|          PX SEND BROADCAST         |:TQ10000|   10 |    3   (0)|       |       |
|* 11|           TABLE ACCESS FULL        |MYDIM   |   10 |    3   (0)|       |       |
|  12|      JOIN FILTER USE               |:BF0001 |  100K|12111   (1)|       |       |
|  13|       PX PARTITION HASH JOIN-FILTER|        |  100K|12111   (1)|:BF0000|:BF0000|
|* 14|        TABLE ACCESS FULL           |MYFACT  |  100K|12111   (1)|     1 |    80 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("MYDIM"."COL1"="MYFACT"."COL2")
  11 - filter("MYDIM"."FILTER1"=3)
  14 - filter(SYS_OP_BLOOM_FILTER(:BF0001,"MYFACT"."COL2"))

and setting OFE=11.2.0.4 then plan switched back to the “strange” one so it was time for SQLT XPLORE starting from 12.1.0.2 and testing all the settings (parameter+fix_control) looking for one that would cause the plan “to go back to strange”. Outcome was fix for (internal) bug 12569316, unfortunately there is no backport on top of 11.2.0.4 so I couldn’t test if the fix itself is enough or it needs some other code change to display the missing step.

At least I’m now positive such step should indeed be present in the plan and in the future (12c) it’s going to be there 😀

3 thoughts on “Missing a Bloom filter…

  1. Thanks Mauro. Excellent example of combined normal and pruning Bloom filters.

    I do have an unrelated question about the execution plan: why is there a need to perform a ‘BUFFER SORT’ at line 6?

    Like

    • That should be because MYDIM is defined as DEGREE = 1 so the table is scanned once and the rows buffered to avoid reading it twice (or N times).
      If I alter MYDIM to a higher degree then the BUFFER SORT goes away and a PX BLOCK ITERATOR pops up right on top of the FTS of MYDIM.

      This is handled differently in 12c where a PX SELECTOR is used to scan MYDIM so the BUFFER SORT goes away “transparently”.
      The plan above marked as 12c is 12.1.0.2 with OFE=11.2.0.4 but _fix_control=’12569316:1′, the out of the box one is (hopefully it will look nicely formatted)

      ---------------------------------------------------------------------------------------------
      | Id  | Operation                           | Name     | Rows  | Cost (%CPU)| Pstart| Pstop |
      ---------------------------------------------------------------------------------------------
      |   0 | SELECT STATEMENT                    |          |     1 | 12114   (1)|       |       |
      |   1 |  SORT AGGREGATE                     |          |     1 |            |       |       |
      |   2 |   PX COORDINATOR                    |          |       |            |       |       |
      |   3 |    PX SEND QC (RANDOM)              | :TQ10001 |     1 |            |       |       |
      |   4 |     SORT AGGREGATE                  |          |     1 |            |       |       |
      |*  5 |      HASH JOIN                      |          |   101 | 12114   (1)|       |       |
      |   6 |       JOIN FILTER CREATE            | :BF0001  |    10 |     3   (0)|       |       |
      |   7 |        PART JOIN FILTER CREATE      | :BF0000  |    10 |     3   (0)|       |       |
      |   8 |         PX RECEIVE                  |          |    10 |     3   (0)|       |       |
      |   9 |          PX SEND BROADCAST          | :TQ10000 |    10 |     3   (0)|       |       |
      |  10 |           PX SELECTOR               |          |       |            |       |       |
      |* 11 |            TABLE ACCESS FULL        | MYDIM    |    10 |     3   (0)|       |       |
      |  12 |       JOIN FILTER USE               | :BF0001  |   100K| 12111   (1)|       |       |
      |  13 |        PX PARTITION HASH JOIN-FILTER|          |   100K| 12111   (1)|:BF0000|:BF0000|
      |* 14 |         TABLE ACCESS FULL           | MYFACT   |   100K| 12111   (1)|     1 |    80 |
      ---------------------------------------------------------------------------------------------
      

      Like

  2. The Event 10128 trace in the post is from a slave generated at “RUN” time. The COMPILE is generated in the QC.
    It is at RUN time because the bloom filter needs to be created by reading the source table to decide then which partitions to scan on the destination table.

    Liked by 1 person

Leave a comment