Search

Top 60 Oracle Blogs

Recent comments

Negative Offload

At the Trivadis Performance Days 2019 I did a presentation on using execution plans to understand what a query was doing. One of the examples I showed was a plan from an Exadata system (using 11.2.0.4) that needed to go faster. The plan was from the SQL Monitor report and all I want to show you is one line that’s reporting a tablescan. To fit the screen comfortably I’ve removed a number of columns from the output.

The report had been generated while the statement was still running (hence the “->” at the left hand edge) and the query had scanned 166 segments (with no partition elimination) of a table with 4,500 data segments (450 range partitions and 10 hash sub-partitions – note the design error, by the way, hash partitioning in Oracle should always hash for a powert of 2).


SQL Plan Monitoring Details (Plan Hash Value=3764612084)  
============================================================================================================================================
| Id   |           Operation            | Name  | Read  | Read  | Write | Write |   Cell   | Mem  | Activity |       Activity Detail       |  
|      |                                |       | Reqs  | Bytes | Reqs  | Bytes | Offload  |      |   (%)    |         (# samples)         |   
============================================================================================================================================
| -> 5 |      TABLE ACCESS STORAGE FULL | TXN   |  972K | 235GB |       |       | -203.03% |   7M |    63.43 | Cpu (1303)                  | 
|      |                                |       |       |       |       |       |          |      |          | cell smart table scan (175) | 
============================================================================================================================================

In the presentation I pointed out that for a “cell smart table scan” (note the Activity Detail colum) this line was using a surprisingly large amount of CPU.

We had been told that the table was using hybrid columnar compression (HCC) and had been given some figures that showed the compression factor was slightly better than 4. I had also pointed out that the typical size of a read request was 256KB. (Compare Read Reqs with Read Bytes)

To explain the excessive CPU I claimed that we were seeing “double decompression” – the cell was decompressing (uncompressing) compression units (CUs), finding that the resulting decompressed data was larger than the 1MB unit that Exadata allows and sending the original compressed CU to the database server where it was decompressed again – and the server side decompression was burning up the CPU.

This claim is (almost certainly) true – but the justification I gave for the claim was at best incomplete (though, to be brutally honest, I have to admit that I’d made a mistake): I pointed out that the Cell Offload was negative 200% and that this was what told us about the double decompression. While double decompression was probably happening the implication I had made was that a negative offload automatically indicated double decompression – and that’s was an incorrect assumption on my part. Fortunately Maurice Müller caught up with me after the session was over and pointed out the error then emailed me a link to a relevant article by Ahmed Aangour.

The Cell Offload is a measure of the difference between the volume of data read and the volume of data returned to the server. If the cell reads 256KB from disc, but the column and row selection means the cell returns 128KB the Cell Offload would be 50%; if the cell returns 64KB the Cell Offload would be 75% (100 * (1 – 64KB/256KB)). But what if you select all the rows and columns from a compressed table – the volume of data after decompression would be larger than the compressed volume the cell had read from disc – and in this case we knew that we were reading 256KB at a time and the compression factor was slightly greater than 4, so the uncompressed data would probably be around 1MB, giving us a Cell Offload of 100 * (1 – 1024KB / 256KB) = negative 300%

Key Point: Any time that decompression, combined with the row and column selection, produces more data than the volume of data read from disc the Cell Offload will go negative. A negative Cell Offload is not inherently a problem (though it might hint at a suboptimal use of compression).

Follow-up Analysis

Despite the error in my initial understanding the claim that we were seeing double decompression was still (almost certainly) true – but we need to be a little more sophisticated in the analysis. The clue is in the arithmetic a few lines further up the page. We can see that we are basically reading 256KB chunks of the table, and we know that 256KB will expand to roughly 1MB so we ought to see a Cell Offload of about -300%; but the Cell Offload is -200%. This suggests fairly strongly that on some of the reads the decompressed data is slightly less than 1MB, which allows the cell to return the decompressed data to the database server, while some of the time the decompressed data is greater than 1MB, forcing the cell to send the original (compressed) CU to the databsae server.

We may even be able work the arithmetic backwards to estimate the number of times double decompression appeared.  Assume that two-thirds of the time the cell decompressed the data and successfully sent (just less than) 1MB back to the database server and one-third of the time the cell decompressed the data and found that the result was too large and sent 256KB of compressed data back to the server, and let’s work with the 972,000 read requests reported to see what drops out of the arithmetic:

  • Total data read: 972,000 * 256KB = 243,000 MB
  • Data sent to db server:  648,000 * 1MB + 324,000 * 256KB = 729,000 MB
  • Cell Offload = 100 * (1 – 729/243) = -200%   Q.E.D.

Of course it would be nice to avoid guessing – and if we were able to check the session activity stats (v$sessstat) while the query was running (or after it had completed) we could pick up several numbers that confirmed our suspicion. For 11.2.0.4, for example, we would keep an eye on:

	cell CUs sent uncompressed
	cell CUs processed for uncompressed
	EHCC {class} CUs Decompressed

Differences between these stats allows you to work out the number of compression units that failed the 1MB test on the cell server and were sent to the database server to be decompressed. There is actually another statistic named “cell CUs sent compressed” which would make life easy for us, but I’ve not seen it populated in my tests – so maybe it doesn’t mean what it seems to say.

Here’s an example from an 11.2.0.4 system that I presented a few years ago showing some sample numbers.

cell CUs sent uncompressed              5,601
cell CUs processed for uncompressed     5,601

EHCC CUs Decompressed                  17,903
EHCC Query High CUs Decompressed       12,302 

This reveals an annoying feature of 11g (continued in 12.1) that results in double counting of the statistics, confusing the issue when you’re trying to analyze what’s going on. In this case the table consisted of 12,302 compression units, and the query was engineered to cause the performance problem to appear. The first two statistics show us how many CUs were decompressed successfully (we’ll see a change appearing there in 12.1). We then see that all 12,302 of the table’s “query high” compression units were decompressed – but the “total” of all CUs decompressed was 17.903.

It’s not a coincidence that 12,302 + 5,601 = 17,903; there’s some double counting going on. I don’t know how many of the statistics are affected in this way, but Oracle has counted the CUs that passsed decompression once as they were processed at the cell server and again as they arrived at the database server. In this example we can infer that 12,302 – 5,601 = 6,701 compression units failed decompression at the cell server and were sent to the database server in compressed form to be decompressed again.

Here’s a couple of sets of figures from some similar tests run on 12.1.0.2 – one with a table compressed to query high another compressed to query low. There is one critical difference from the 11g figures but the same double-counting seems to have happened. In both cases the “EHCC Query [Low|High] CUs Decompressed” show the correct number of CUs in each table. Note, though that the “cell CUs processed for uncompress” in 12.1 appear to report the number of attempted decompressions rather than 11g’s number of successful decompressions.


=========================

cell CUs sent uncompressed                     19,561	-- successful decompressions at cell server
cell CUs processed for uncompressed            19,564	=> 3 failures

EHCC CUs Decompressed                          39,125	=  2 * 19,561 successes + 3 db server decompression
EHCC Query High CUs Decompressed               19,564

=========================

cell CUs sent uncompressed                     80,037	-- successful decompressions at cell server
cell CUs processed for uncompressed            82,178	=> 2,141 failures

EHCC CUs Decompressed                         162,215	=  2 * 80,037 successes + 2,141 db server decompressions
EHCC Query Low CUs Decompressed                82,178

=========================

I’ve annotated the figures to explain the arithmetic.

There has been some significant renaming and separation of statistics in 12.2, as described in this post by Roger MacNicol, and the problems of double-counting should have disappeared. I haven’t yet tested my old models in the latest versions of Oracle, though, so can’t show you anyy figures to demonstrate the change.

Takeaways

There are 4 key points to note in this posting.

  • Hash (sub)partitioning should be based on powers of 2, otherwise some partitions will be twice size of others.
  • There is a 1MB limit on the “data packet” sent between the cell server and database server in Exadata.
  • If you select a large fraction of the rows and columns from an HCC compressed table you may end up decompressing a lot of your data twice if the decompressed data for a read request is larger than the 1MB unit (and the cost will be highly visible at the database server as CPU usage).
  • The Cell Offload figure for a tablescan (in particular) will go negative if the volume of data sent from the cell server to the database server is larger than the volume of data read from the disk- even if double decompression hasn’t been happening.

A little corollary to the third point: if you are writing to a staging table with the expectation of doing an unfiltered tablescan (or a select *), then you probably don’t want to use hybrid columnar compression on the table as you will probably end up using a lot of CPU at the database server to compress it, then do double-decompression using even more CPU on the database server.  It’s only if you really need to minimise disk usage and have lots of CPU capacity to spare that you have a case for using hybrid columnar compression for the table (and Oracle In-Memory features may also change the degree of desirability).

Footnote

I haven’t said anything about accessing table data by index when the table is subject to HCC compression. I haven’t tested the mechanism in recent versions of Oracle but it used to be the case that the cell server would supply the whole compression unit (CU) to the database server which would decompress it to construct the relevant row. One side effect of this was that the same CU could be decompressed (with a high CPU load) many times in the course of a single query.