Search

Top 60 Oracle Blogs

Recent comments

Smart Scan

Exadata Smart Scan Projection Limitation

Here is an interesting limitation to Exadata Smart Scans - if more than 254 columns from a table (not HCC compressed, more on that in moment) need to be projected, Smart Scans for that particular segment will be disabled and Exadata will fall back to conventional I/O.

Why is my Exadata smart scan not offloading?

After receiving the excellent “Expert Oracle Exadata” book I decided to spend some time looking into Exadata performance after I having spent most of the time previously on infrastructure related questions such as “how can I prevent someone from overwriting my DR system with UAT data”, patching etc.

Now there is one thing to keep in mind with Exadata-you need lots of data before it breaks into a little sweat. Luckily for me, one of my colleagues has performed some testing on this environment. For reasons unknown the swingbench order entry benchmark (version 2.3) has been chosen. For those who don’t know the OE benchmark: it’s a heavy OLTP style workload simulating a web shop where users browse products, place orders etc. OE is optimised for single block I/O, and despite what you may have heard, Exadata doesn’t provide a noticable benefit for these queries.

Anyway, what I liked was the fact that the order_items table had about 350 million rows organised in about 8GB. From discussions with Frits Hoogland I know that a full scan of such a table takes between 40 and 60 seconds depending on system load.

SOE.ORDER_ITEMS

Here are some of the interesting facts around the table:

SQL> select partition_name,num_rows,blocks,avg_space,chain_cnt,global_stats,user_stats,stale_stats
2  from dba_tab_statistics
3  where table_name = 'ORDER_ITEMS'
4  /

PARTITION_NAME                   NUM_ROWS     BLOCKS  AVG_SPACE  CHAIN_CNT GLO USE STA
------------------------------ ---------- ---------- ---------- ---------- --- --- ---
349990815    1144832          0          0 YES NO  NO
SYS_P341                         21866814      71297          0          0 YES NO  NO
SYS_P342                         21889112      72317          0          0 YES NO  NO
SYS_P343                         21877726      71297          0          0 YES NO  NO
SYS_P344                         21866053      71297          0          0 YES NO  NO
SYS_P345                         21870127      71297          0          0 YES NO  NO
SYS_P346                         21887971      72317          0          0 YES NO  NO
SYS_P347                         21875056      71297          0          0 YES NO  NO
SYS_P348                         21891454      72317          0          0 YES NO  NO
SYS_P349                         21883576      72317          0          0 YES NO  NO
SYS_P350                         21859704      71297          0          0 YES NO  NO
SYS_P351                         21866820      71297          0          0 YES NO  NO
SYS_P352                         21865681      71297          0          0 YES NO  NO
SYS_P353                         21865239      71297          0          0 YES NO  NO
SYS_P354                         21870373      71297          0          0 YES NO  NO
SYS_P355                         21882656      71297          0          0 YES NO  NO
SYS_P356                         21872453      71297          0          0 YES NO  NO

17 rows selected.

It has 4 indexes, out of which some are reverse key indexes:

SQL> select index_name,index_type from user_indexes where table_name = 'ORDER_ITEMS';

INDEX_NAME                     INDEX_TYPE
------------------------------ ---------------------------
ITEM_ORDER_IX                  NORMAL/REV
ORDER_ITEMS_UK                 NORMAL
ORDER_ITEMS_PK                 NORMAL
ITEM_PRODUCT_IX                NORMAL/REV

SQL>

This is going to be interesting (I have to admit I didn’t check initially for reverse key indexes). All I wanted was a huge table to see if a smart scan really is so mind blowingly fast. These examples can easily be reproduced by generating the SOE schema with the oewizard-just make sure you select the use of partitioning (you should have a license for it).

Performance Testing

My plan was to start off with a serial execution, then use parallel query and check for execution times.As with all performance tuning of this kind you should have a copy of session snapper from Tanel Poder available. At the time of this writing, the latest version was 3.52 available from Tanel’s blog.

I also wanted to see when a smart scan kicked in. Here’s the first test with serial execution:

10:44:37 SQL> select count(*) from order_items
10:44:40   2  /

COUNT(*)
----------
350749016

Elapsed: 00:00:47.54

OK, that doesn’t look like a smart scan has happened, the 47 seconds are a little too slow. As always, check using snapper to confirm:

SQL> @snapper all 5 1 243
Sampling SID 243 with interval 5 seconds, taking 1 snapshots...
setting stats to all due option = all

-- Session Snapper v3.52 by Tanel Poder @ E2SN ( http://tech.e2sn.com )

-------------------------------------------------------------------------------------------------------------------------------------
SID, USERNAME  , TYPE, STATISTIC                                                 ,     HDELTA, HDELTA/SEC,    %TIME, GRAPH
-------------------------------------------------------------------------------------------------------------------------------------
243, SOE       , STAT, session logical reads                                     ,     81.07k,     16.21k,
243, SOE       , STAT, user I/O wait time                                        ,        299,       59.8,
243, SOE       , STAT, non-idle wait time                                        ,        300,         60,
243, SOE       , STAT, non-idle wait count                                       ,      1.26k,        251,
243, SOE       , STAT, physical read total IO requests                           ,        634,      126.8,
243, SOE       , STAT, physical read total multi block requests                  ,        634,      126.8,
243, SOE       , STAT, physical read total bytes                                 ,    664.14M,    132.83M,
243, SOE       , STAT, cell physical IO interconnect bytes                       ,    664.14M,    132.83M,
243, SOE       , STAT, consistent gets                                           ,     81.07k,     16.21k,
243, SOE       , STAT, consistent gets from cache                                ,          1,         .2,
243, SOE       , STAT, consistent gets from cache (fastpath)                     ,          1,         .2,
243, SOE       , STAT, consistent gets direct                                    ,     81.07k,     16.21k,
243, SOE       , STAT, physical reads                                            ,     81.07k,     16.21k,
243, SOE       , STAT, physical reads direct                                     ,     81.07k,     16.21k,
243, SOE       , STAT, physical read IO requests                                 ,        634,      126.8,
243, SOE       , STAT, physical read bytes                                       ,    664.14M,    132.83M,
243, SOE       , STAT, calls to kcmgcs                                           ,          1,         .2,
243, SOE       , STAT, file io wait time                                         ,        395,         79,
243, SOE       , STAT, Number of read IOs issued                                 ,        635,        127,
243, SOE       , STAT, no work - consistent read gets                            ,      81.1k,     16.22k,
243, SOE       , TIME, DB CPU                                                    ,      1.19s,   237.96ms,    23.8%, |@@@       |
243, SOE       , TIME, sql execute elapsed time                                  ,      4.01s,   801.37ms,    80.1%, |@@@@@@@@  |
243, SOE       , TIME, DB time                                                   ,      4.01s,   801.37ms,    80.1%, |@@@@@@@@  |
243, SOE       , WAIT, direct path read                                          ,      2.99s,   598.63ms,    59.9%, |@@@@@@    |
243, SOE       , WAIT, kfk: async disk IO                                        ,     4.25ms,    849.4us,      .1%, |          |
--  End of Stats snap 1, end=2011-08-17 10:30:47, seconds=5

-----------------------------------------------------------------------
Active% | SQL_ID          | EVENT                     | WAIT_CLASS
-----------------------------------------------------------------------
62% | b0hcgjs21yrq9   | direct path read          | User I/O
38% | b0hcgjs21yrq9   | ON CPU                    | ON CPU

--  End of ASH snap 1, end=2011-08-17 10:30:47, seconds=5, samples_taken=42

PL/SQL procedure successfully completed.

Right-no smart scan; this puzzled me. To recap, smart scans happen only if:

  • Direct path reads are used
  • The row source can be offloaded
  • The parameter cell_offload_processing is set to true (I think it’s “always” in 11.2)
  • There are no chained or migrated rows

Now let’s check these.

I can clearly see that direct path reads have happened from the snapper output-check. It’s also worth remembering that the decision to perform direct path reads is made on a segment basis-and each partition is a segment: keep that in mind!

You can check the execution plan to find out if the row source can be offloaded as in this example:

SQL> select * from table(dbms_xplan.display(format=>'+projection'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2209137760

-----------------------------------------------------------------------------------------
| Id  | Operation                     | Name            | Rows  | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                 |     1 |   249K  (1)| 00:49:52 |
|   1 |  SORT AGGREGATE               |                 |     1 |            |          |
|   2 |   INDEX STORAGE FAST FULL SCAN| ITEM_PRODUCT_IX |   349M|   249K  (1)| 00:49:52 |
-----------------------------------------------------------------------------------------

Column Projection Information (identified by operation id):

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------

1 - (#keys=0) COUNT(*)[22]

14 rows selected.

You can see that the INDEX STORAGE FAST FULL SCAN clause is used, in other words the source can be offloaded. Strange that it didn’t happen though. The parameter cell_offload_processing was set to true on my system. Do you remember that the index is a reverse key index? I’m wondering if that has anything to do with it.

To rule out that there was a problem with the direct path reads I set “_serial_direct_read”=true and tried again, but it didn’t make a difference.

Another way to check for smart scans in a live system is the use of v$sql-columns IO_CELL_UNCOMPRESSED_BYTES and  IO_CELL_OFFLOAD_RETURNED_BYTES are cumulative counters for smart scan activity. However if they are 0 like in my case, they indicate some sort of issue.

This continued with parallel query: 8 or 64 slaves and still no smart scan. I even traced the execution, but there was not a single pxxx trace file with the word “smart” in it (and I made sure I captured the waits)

What was going on?

SQL_ID  4vkkk105tny60, child number 0
-------------------------------------
select /*+parallel(64)*/ count(*) from order_items

Plan hash value: 544438321

--------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name            | Rows  | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                 |       |  4306 (100)|          |        |      |            |
|   1 |  SORT AGGREGATE                   |                 |     1 |            |          |        |      |            |
|   2 |   PX COORDINATOR                  |                 |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)            | :TQ10000        |     1 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE                |                 |     1 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR            |                 |   349M|  4306   (1)| 00:00:52 |  Q1,00 | PCWC |            |
|*  6 |       INDEX STORAGE FAST FULL SCAN| ITEM_PRODUCT_IX |   349M|  4306   (1)| 00:00:52 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

1 - SEL$1
6 - SEL$1 / ORDER_ITEMS@SEL$1

Predicate Information (identified by operation id):
---------------------------------------------------

6 - storage(:Z>=:Z AND :Z<=:Z)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

1 - (#keys=0) COUNT()[22]
2 - SYS_OP_MSR()[10]
3 - (#keys=0) SYS_OP_MSR()[10]
4 - (#keys=0) SYS_OP_MSR()[10]

Note
-----
- automatic DOP: Computed Degree of Parallelism is 64

Ok then I got fed up with that ITEM_PRODUCT_IX and forced a full table scan and volia-a smart scan happened.

SQL> select * from table(dbms_xplan.display_cursor('5a1x1v72ujf8s', format=>'ALL'));

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  5a1x1v72ujf8s, child number 0
-------------------------------------
select /*+ full(t) parallel(t, 8) */ count(*) from order_items t

Plan hash value: 661298821

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name        | Rows  | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |             |       | 43358 (100)|          |       |       |        |      |            |
|   1 |  SORT AGGREGATE                |             |     1 |            |          |       |       |        |      |            |
|   2 |   PX COORDINATOR               |             |       |            |          |       |       |        |      |            |
|   3 |    PX SEND QC (RANDOM)         | :TQ10000    |     1 |            |          |       |       |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE             |             |     1 |            |          |       |       |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR         |             |   349M| 43358   (1)| 00:08:41 |     1 |    16 |  Q1,00 | PCWC |            |
|*  6 |       TABLE ACCESS STORAGE FULL| ORDER_ITEMS |   349M| 43358   (1)| 00:08:41 |     1 |    16 |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

1 - SEL$1
6 - SEL$1 / T@SEL$1

Predicate Information (identified by operation id):
---------------------------------------------------

6 - storage(:Z>=:Z AND :Z<=:Z)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

1 - (#keys=0) COUNT()[22]
2 - SYS_OP_MSR()[10]
3 - (#keys=0) SYS_OP_MSR()[10]
4 - (#keys=0) SYS_OP_MSR()[10]

37 rows selected.

To be sure that was correct, I retried with serial execution (check the time):

11:44:10 SQL> select /*+ full(t) single */ count(*) from order_items t;

COUNT(*)
----------
350749016

Elapsed: 00:00:12.02

@snapper all 5 1 241

Sampling SID 243 with interval 5 seconds, taking 1 snapshots...
setting stats to all due option = all

-- Session Snapper v3.52 by Tanel Poder @ E2SN ( http://tech.e2sn.com )

-------------------------------------------------------------------------------------------------------------------------------------
SID, USERNAME  , TYPE, STATISTIC                                                 ,     HDELTA, HDELTA/SEC,    %TIME, GRAPH
-------------------------------------------------------------------------------------------------------------------------------------
243, SOE       , STAT, session logical reads                                     ,    482.88k,     96.58k,
243, SOE       , STAT, application wait time                                     ,          1,         .2,
243, SOE       , STAT, user I/O wait time                                        ,        158,       31.6,
243, SOE       , STAT, non-idle wait time                                        ,        159,       31.8,
243, SOE       , STAT, non-idle wait count                                       ,      3.72k,      744.8,
243, SOE       , STAT, enqueue waits                                             ,         15,          3,
243, SOE       , STAT, enqueue requests                                          ,         14,        2.8,
243, SOE       , STAT, enqueue conversions                                       ,         21,        4.2,
243, SOE       , STAT, enqueue releases                                          ,         14,        2.8,
243, SOE       , STAT, global enqueue gets sync                                  ,         35,          7,
243, SOE       , STAT, global enqueue releases                                   ,         14,        2.8,
243, SOE       , STAT, physical read total IO requests                           ,      5.23k,      1.05k,
243, SOE       , STAT, physical read total multi block requests                  ,      4.91k,        982,
243, SOE       , STAT, physical read total bytes                                 ,      3.95G,    790.99M,
243, SOE       , STAT, cell physical IO interconnect bytes                       ,      1.72G,    343.34M,
243, SOE       , STAT, ges messages sent                                         ,         21,        4.2,
243, SOE       , STAT, consistent gets                                           ,    482.88k,     96.58k,
243, SOE       , STAT, consistent gets from cache                                ,         97,       19.4,
243, SOE       , STAT, consistent gets from cache (fastpath)                     ,         97,       19.4,
243, SOE       , STAT, consistent gets direct                                    ,    482.78k,     96.56k,
243, SOE       , STAT, physical reads                                            ,    482.78k,     96.56k,
243, SOE       , STAT, physical reads direct                                     ,    482.78k,     96.56k,
243, SOE       , STAT, physical read IO requests                                 ,      5.23k,      1.05k,
243, SOE       , STAT, physical read bytes                                       ,      3.95G,    790.99M,
243, SOE       , STAT, calls to kcmgcs                                           ,         97,       19.4,
243, SOE       , STAT, file io wait time                                         ,      19.7k,      3.94k,
243, SOE       , STAT, cell physical IO bytes eligible for predicate offload     ,      3.95G,    790.89M,
243, SOE       , STAT, cell smart IO session cache lookups                       ,          7,        1.4,
243, SOE       , STAT, cell smart IO session cache hits                          ,          7,        1.4,
243, SOE       , STAT, cell physical IO interconnect bytes returned by smart scan,      1.72G,    343.38M,
243, SOE       , STAT, table scans (long tables)                                 ,          7,        1.4,
243, SOE       , STAT, table scans (direct read)                                 ,          7,        1.4,
243, SOE       , STAT, table scan rows gotten                                    ,    147.19M,     29.44M,
243, SOE       , STAT, table scan blocks gotten                                  ,    480.07k,     96.01k,
243, SOE       , STAT, cell scans                                                ,          7,        1.4,
243, SOE       , STAT, cell blocks processed by cache layer                      ,    576.41k,    115.28k,
243, SOE       , STAT, cell blocks processed by txn layer                        ,    576.41k,    115.28k,
243, SOE       , STAT, cell blocks processed by data layer                       ,    485.06k,     97.01k,
243, SOE       , STAT, cell blocks helped by minscn optimization                 ,    576.28k,    115.26k,
243, SOE       , STAT, cell simulated session smart scan efficiency              ,      3.97G,     794.6M,
243, SOE       , STAT, cell IO uncompressed bytes                                ,      3.97G,    794.93M,
243, SOE       , TIME, DB CPU                                                    ,      2.63s,   526.72ms,    52.7%, |@@@@@@    |
243, SOE       , TIME, sql execute elapsed time                                  ,      4.01s,   801.13ms,    80.1%, |@@@@@@@@  |
243, SOE       , TIME, DB time                                                   ,      4.01s,   801.13ms,    80.1%, |@@@@@@@@  |
243, SOE       , WAIT, enq: KO - fast object checkpoint                          ,     2.19ms,    438.8us,      .0%, |          |
243, SOE       , WAIT, cell smart table scan                                     ,      1.58s,   316.49ms,    31.6%, |@@@@      |
243, SOE       , WAIT, events in waitclass Other                                 ,     8.54ms,     1.71ms,      .2%, |          |
--  End of Stats snap 1, end=2011-08-17 12:53:04, seconds=5

-----------------------------------------------------------------------
Active% | SQL_ID          | EVENT                     | WAIT_CLASS
-----------------------------------------------------------------------
66% | cfhpsq29gb49m   | ON CPU                    | ON CPU
34% | cfhpsq29gb49m   | cell smart table scan     | User I/O

--  End of ASH snap 1, end=2011-08-17 12:53:04, seconds=5, samples_taken=47

PL/SQL procedure successfully complet

Confusing, confusing. It was time to ask the experts: Frits Hoogland who reproduced the behaviour on his environment (11.2.0.1 and 11.2.0.2), as well as Kerry Osborne-both of which thought a smart scan should have happened even with the index. An Index FFS certainly used to be offloadable, and we think there is a regression bug visible with the cells. My version of the cell software is 11.2.2.2 with 11.2.0.1 BP 6. I first thought it might be a problem with 11.2.0.1 and the version of the cell software, but it probably isn’t: Kerry tested on 11.2.0.2 BP10 and cellsrv 11.2.2.3.2, Frits tested on 11.2.0.1 and 11.2.0.2 BP 6 and cellsrv 11.2.2.2.0_LINUX.X64_101206.2-1

Summary

What we are seeing isn’t something we should be seeing. I am trying to get this raised as an SR and see what happens. From my testing it seems that it potentially impacts anyone with indexes on their tables.

Making the Most of Oracle Exadata – A Technical Review

class="tweetmeme_button" style="float: right; margin-left: 10px;"> /> />

Over the past few weeks several people have asked me about an Exadata article entitled “Making the Most of Oracle Exadata” by Marc Fielding of Pythian. Overall it’s an informative article and touches on many of the key points of Exadata, however, even though I read (skimmed is a much better word) and briefly commented on the article back in August, after further review I found some technical inaccuracies with this article so I wanted to take the time to clarify this information for the Exadata community.

Exadata Smart Scans

Marc writes:

style="text-decoration: underline;">Smart scans: Smart scans are Exadata’s headline feature. They provide three main benefits: reduced data transfer volumes from storage servers to databases, CPU savings on database servers as workload is transferred to storage servers, and improved buffer cache efficiency thanks to column projection. Smart scans use helper processes that function much like parallel query processes but run directly on the storage servers. Operations off-loadable through smart scans include the following:

  • Predicate filtering – processing WHERE clause comparisons to literals, including logical operators and most SQL functions.
  • Column projection – by looking at a query’s SELECT clause, storage servers return only the columns requested, which is a big win for wide tables.
  • Joins – storage servers can improve join performance by using Bloom filters to recognize rows matching join criteria during the table scan phase, avoiding most of the I/O and temporary space overhead involved in the join processing.
  • Data mining model scoring – for users of Oracle Data Mining, scoring functions like PREDICT() can be evaluated on storage servers.

I personally would not choose a specific number of benefits from Exadata Smart Scan, simply stated, the design goal behind Smart Scan is to reduce the amount of data that is sent from the storage nodes (or storage arrays) to the database nodes (why move data that is not needed?). Smart Scan does this in two ways: it applies the appropriate column projection and row restriction rules to the data as it streams off of disk. However, projection is not limited to just columns in the SELECT clause, as Marc mentions, it also includes columns in the WHERE clause as well. Obviously JOIN columns need to be projected to perform the JOIN in the database nodes. The one area that Smart Scan does not help with at all is improved buffer cache efficiency. The reason for this is quite simple: Smart Scan returns data in blocks that were created on-the-fly just for that given query — it contains only the needed columns (projections) and has rows filtered out from the predicates (restrictions). Those blocks could not be reused unless someone ran the exact same query (think of those blocks as custom built just for that query). The other thing is that Smart Scans use direct path reads (cell smart table scan) and these reads are done into the PGA space, not the shared SGA space (buffer cache).

As most know, Exadata can easily push down simple predicates filters (WHERE c1 = ‘FOO’) that can be applied as restrictions with Smart Scan. In addition, Bloom Filters can be applied as restrictions for simple JOINs, like those commonly found in Star Schemas (Dimensional Data Models). These operations can be observed in the query execution plan by the JOIN FILTER CREATE and JOIN FILTER USE row sources. What is very cool is that Bloom Filters can also pass their list of values to Storage Indexes to aid in further I/O reductions if there is natural clustering on those columns or it eliminates significant amounts of data (as in a highly selective set of values). Even if there isn’t significant data elimination via Storage Indexes, a Smart Scan Bloom Filter can be applied post scan to prevent unneeded data being sent to the database servers.

Exadata Storage Indexes

Marc writes:

style="text-decoration: underline;">Storage indexes: Storage indexes reduce disk I/O volumes by tracking high and low values in memory for each 1-megabyte storage region. They can be used to give partition pruning benefits without requiring the partition key in the WHERE clause, as long as one of these columns is correlated with the partition key. For example, if a table has order_date and processed_date columns, is partitioned on order_date, and if orders are processed within 5 days of receipt, the storage server can track which processed_date values are included in each order partition, giving partition pruning for queries referring to either order_date or processed_date. Other data sets that are physically ordered on disk, such as incrementing keys, can also benefit.

In Marc’s example he states there is correlation between the two columns PROCESSED_DATE and ORDER_DATE where PROCESSED_DATE = ORDER_DATE + [0..5 days]. That’s fine and all, but to claim partition pruning takes place when specifying ORDER_DATE (the partition key column) or PROCESSED_DATE (non partition key column) in the WHERE clause because the Storage Index can be used for PROCESSED_DATE is inaccurate. The reality is, partition pruning can only take place when the partition key, ORDER_DATE, is specified, regardless if a Storage Index is used for PROCESSED_DATE.

Partition Pruning and Storage Indexes are completely independent of each other and Storage Indexes know absolutely nothing about partitions, even if the partition key column and another column have some type of correlation, as in Marc’s example. The Storage Index simply will track which Storage Regions do or do not have rows that match the predicate filters and eliminate reading the unneeded Storage Regions.

Exadata Hybrid Columnar Compression

Marc writes:

style="text-decoration: underline;">Columnar compression: Hybrid columnar compression (HCC) introduces a new physical storage concept, the compression unit. By grouping many rows together in a compression unit, and by storing only unique values within each column, HCC provides storage savings in the range of 80 90% based on the compression level selected. Since data from full table scans remains compressed through I/O and buffer cache layers, disk savings translate to reduced I/O and buffer cache work as well. HCC does, however, introduce CPU and data modification overhead that will be discussed in the next section.

The Compression Unit (CU) for Exadata Hybrid Columnar Compression (EHCC) is actually a logical construct, not a physical storage concept. The compression gains from EHCC come from column-major organization of the rows contained in the CU and the encoding and transformations (compression) that can be done because of that organization (like values are more common within the same column across rows, vs different columns in the same row). To say EHCC only stores unique values within each column is inaccurate, however, the encoding and transformation algorithms use various techniques that yield very good compression by attempting to represent the column values with as few bytes as possible.

Data from EHCC full table scans only remains fully compressed if the table scan is not a Smart Scan, in which case the compressed CUs are passed directly up to the buffer cache and the decompression will then be done by the database servers. However, if the EHCC full table scan is a Smart Scan, then only the columns and rows being returned to the database nodes are decompressed by the Exadata servers, however, predicate evaluations can be performed directly on the EHCC compressed data.

Read more: Exadata Hybrid Columnar Compression Technical White Paper

Marc also writes:

Use columnar compression judiciously: Hybrid columnar compression (HCC) in Exadata has the dual advantages of reducing storage usage and reducing I/O for large reads by storing data more densely. However, HCC works only when data is inserted using bulk operations. If non-compatible operations like single-row inserts or updates are attempted, Exadata reverts transparently to the less restrictive OLTP compression method, losing the compression benefits of HCC. When performing data modifications such as updates or deletes, the entire compression unit must be uncompressed and written in OLTP-compressed form, involving an additional disk I/O penalty as well.

EHCC does require bulk direct path load operations to work. This is because the compression algorithms that are used for EHCC need sets of rows as input, not single rows. What is incorrect with Marc’s comments is that when a row in a CU is modified (UPDATE or DELETE), the entire CU is not uncompressed and changed to non-EHCC compression, only the rows that are UPDATED are migrated to non-EHCC compression. For DELETEs no row migrations take place at all. This is easily demonstrated by tracking ROWIDs as in the example at the bottom of this post.

Exadata Smart Flash Cache

Marc writes:

style="text-decoration: underline;">Flash cache: Exadata s flash cache supplements the database servers buffer caches by providing a large cache of 384 GB per storage server and up to 5 TB in a full Oracle Exadata Database Machine, considerably larger than the capacity of memory caches. Unlike generic caches in traditional SAN storage, the flash cache understands database-level operations, preventing large non-repeated operations such as backups and large table scans from polluting the cache. Since flash storage is nonvolatile, it can cache synchronous writes, providing performance benefits to commit-intensive applications.

While flash (SSD) storage is indeed non-volatile, the Exadata Smart Flash Cache is volatile – it loses all of its contents if the Exadata server is power cycled. Also, since the Exadata Smart Flash is currently a write-through cache, it offers no direct performance advantages to commit-intensive applications, however, it does offer indirect performance advantages by servicing read requests that would otherwise be serviced by the HDDs, thus allowing the HDDs to service more write operations.

Read more: Exadata Smart Flash Cache Technical White Paper

EHCC UPDATE and DELETE Experiment

--
-- EHCC UPDATE example - only modified rows migrate
--

SQL> create table order_items1
  2  compress for query high
  3  as
  4  select rownum as rnum, x.*
  5  from order_items x
  6  where rownum <= 10000;

Table created.

SQL> create table order_items2
  2  as
  3  select rowid as rid, x.*
  4  from order_items1 x;

Table created.

SQL> update order_items1
  2  set quantity=10000
  3  where rnum in (1,100,1000,10000);

4 rows updated.

SQL> commit;

Commit complete.

SQL> select b.rnum, b.rid before_rowid, a.rowid after_rowid
  2  from order_items1 a, order_items2 b
  3  where a.rnum(+) = b.rnum
  4  and (a.rowid != b.rid
  5    or a.rowid is null)
  6  order by b.rnum
  7  ;

           RNUM BEFORE_ROWID       AFTER_ROWID
--------------- ------------------ ------------------
              1 AAAWSGAAAAAO1aTAAA AAAWSGAAAAAO1aeAAA
            100 AAAWSGAAAAAO1aTABj AAAWSGAAAAAO1aeAAB
           1000 AAAWSGAAAAAO1aTAPn AAAWSGAAAAAO1aeAAC
          10000 AAAWSGAAAAAO1aXBEv AAAWSGAAAAAO1aeAAD

--
-- EHCC DELETE example - no rows migrate
--

SQL> create table order_items1
  2  compress for query high
  3  as
  4  select rownum as rnum, x.*
  5  from order_items x
  6  where rownum <= 10000;

Table created.

SQL> create table order_items2
  2  as
  3  select rowid as rid, x.*
  4  from order_items1 x;

Table created.

SQL> delete from order_items1
  2  where rnum in (1,100,1000,10000);

4 rows deleted.

SQL> commit;

Commit complete.

SQL> select b.rnum, b.rid before_rowid, a.rowid after_rowid
  2  from order_items1 a, order_items2 b
  3  where a.rnum(+) = b.rnum
  4  and (a.rowid != b.rid
  5    or a.rowid is null)
  6  order by b.rnum
  7  ;

           RNUM BEFORE_ROWID       AFTER_ROWID
--------------- ------------------ ------------------
              1 AAAWSIAAAAAO1aTAAA
            100 AAAWSIAAAAAO1aTABj
           1000 AAAWSIAAAAAO1aTAPn
          10000 AAAWSIAAAAAO1aXBEv
name="googleone_share_1" style="position:relative;z-index:5;float: right; margin-left: 10px;">{lang: 'ar'} class='wpfblike' >