Search

Top 60 Oracle Blogs

Recent comments

VLDB

Linux large pages and non-uniform memory distribution

In my last post about large pages in 11.2.0.3 I promised a little more background information on how large pages and NUMA are related.

Background and some history about processor architecture

#optimizerbumperstickers Oracle VLDB Mary Poppins and me

In 1990, when Ken Jacobs hosted the RDBMS campground talks at the Anaheim International Oracle User Week appreciation event, one of the topic areas was whether we (some users representing the Very Large DataBases VLDB of the Oracle world which meant anything north of about 7 GB back then) thought that the rule based optimizer (RBO) was good enough, or whether we needed a cost based optimizer (CBO) for the real applications we were running at enterprise scale to work well. “Oracle’s optimizer is like Mary Poppins. It’s practically perfect in every way. But we do have some cases where it would be helpful for the optimizer to consider the relative sizes of tables and whether a table was local or remote when the plan for joining and filtering is constructed.

OUGN 2012 Third Day

The last day of the three (and second on the ferry and of the conference proper) had a lot of talks I wanted to see, especially Dan Morgan talking about “Insanely large Databases”. It was a good talk, with an interesting interlude when a very loud announcement let us know we had docked at Kiel. Dan handled it with aplomb. Dan was talking about one specific experience he had suffered recently and he covered quite a few things I did and some I planned to but never got that far – but he had more technical details about the issues he had encountered, so all in all probably of more immediate use to the audience than my talk. It was a very good session. I have to confess, there were times I laughed out loud as memories flooded in, prompted by his wise words – I fear others may have interpreted differently but, honestly, I was laughing tears of shared pain.

Sail Away, Sail Away, Sail Away

I’m just doing some last minute preparation for the Norwegian User Group Spring Seminar. This is quite a large conference with 5 concurrent streams and a very good line-up of presenters. You can see the agenda here.

What is also a little unusual about this conference is that it is mostly on a boat, or rather a ship. When I was first asked if I would put forward a talk or two I declined – as I am very prone to being sea-sick. Truls Bergersen, who helps organize the event, got in touch with me a few weeks later and said “we are doing a day on land first, do you want to present then?”. Sure! That would be great! So I am, I’ll be doing my talk on Index Organized Tables once more {and then I think I’ll put it to bed for a couple of years}.

Database Sizing – How much Disk do I need? (The Easy Way)

How much Disk do I need for my new Oracle database? Answer:-

  • 8-10 times the volume of raw data for an OLTP system
  • 2-4 times the raw data volume for a Data Warehouse.
  • The bigger the database, the nearer you will be to the lower multiplication factors.

{Disclaimer. This is of course just my opinion, based on some experience. If you use the above figures for a real project and get the total disc space you need wrong, don’t blame me. If you do and it is right, then of course you now owe me a beer.}

Many of us have probably had to calculate the expected size a database before, but the actual database is only one component of all the things you need to run the Oracle component of your system. You need to size the other components too – Archived redo logs, backup staging area, dataload staging area, external files, the operating system, swap space, the oracle binaries {which generally gets bigger every year but shrink in comparison to the average size of an Oracle DB} etc…

In a similar way to my thoughts on how much database space you need for a person, I also used to check out the total disk space every database I created and those that I came across took up. {A friend emailed me after my earlier posting to ask if I had an obsession about size. I think the answer must be “yes”}.

First of all, you need to know how much “raw data” you have. By this I mean what will become the table data. Back in the early 90’s this could be the total size of the flat files the old system was using, even the size of the data as it was in spreadsheets. An Oracle export file of the system gives a pretty good idea of the raw data volume too. Lacking all these then you need to roughly size your raw data. Do a calculation of “number_of_rows*sum_of_columns” for your biggest 10 tables (I might blog more on this later). Don’t be tempted to overestimate, my multipliers allow for the padding.

Let us say you have done this and it is 60GB of raw data for an OLTP system. Let the storage guys know you will probably want about 500GB of space. They will then mentally put it down as “of no consequence” as if you have dedicated storage guys you probably have many terabytes of storage. {Oh, I should mention that I am not considering redundancy at all but space that is provided. The amount of actual spinning disk is down to the level and type of RAID you storage guys make you use. That is a whole other discussion}.

If you come up with 5TB of raw data for a DW system then you need around 12-15TB of disk storage.

If you come up with more than a Terabyte or so of raw data for an OLTP system or 10 to 20 Terabytes for a DW, when you give you figures to the storage guys/procurement people then they may well go pale and say something like “you have got to be kidding!”. This is part of why the multiplication factor for Data Warehouses and larger systems in general is less, as you are forced to be more careful about the space you allocate and how you use it.

The overhead of total disk space over Raw data reduces as the database gets bigger for a number of reasons:

  • The size of the Oracle binaries and the OS does not change as the database gets bigger.
  • The size of swap space does not increase in line wiht the database as, generally speaking, if you increase the database size from 100GB to 1TB you do not have the luxury of increasing the system memory of your server. It probably doubles.
  • Very large databases tend to have something making them big, like images or embedded documents, which are not indexed. Thus the ratio of table segments to index segments increases.
  • If you have a very large database you start removing indexes (often those that support constraints) to aid performance of data load and management, again improving the ratio of table segments to index segments.
  • Backups become partial or incremental to reduce the size and duration of the backup.
  • As mentioned before, the sheer size of system is such that you just take more care over cleaning up work areas, paring down the archived redo log areas (those files to compress well) and other areas.
  • If things get extreme or you have been doing this for donkeys years {note to none-UK people, this means many, many years} you start altering PCTFREE and checking over extent sizes.

My best ever ratio of database size to raw data was around 1.6 and it took an awful lot of effort and planning to get there. And an IT manager who made me very, very aware of how much the storage was costing him (it is not the disks, it’s all the other stuff).

I should also just mention that the amount of disk you need is only one consideration. If you want your database to perform well you need to consider the number of spindles. After all, you can create a very large database indeed using a single 2TB disc – but any actual IO will perform terribly.

Reading Parallel Execution Plans With Bloom Pruning And Composite Partitioning

You’ve probably heard sayings like “sometimes things aren’t always what they seem” and “people lie”. Well, sometimes execution plans lie. It’s not really by intent, but it is sometimes difficult (or impossible) to represent everything in a query execution tree in nice tabular format like dbms_xplan gives. One of the optimizations that was introduced back in 10gR2 was the use of bloom filters. Bloom filters can be used in two ways: 1) for filtering or 2) for partition pruning (bloom pruning) starting with 11g. Frequently the data models used in data warehousing are dimensional models (star or snowflake) and most Oracle warehouses use simple range (or interval) partitioning on the fact table date key column as that is the filter that yields the largest I/O reduction from partition pruning (most queries in a time series star schema include a time window, right!). As a result, it is imperative that the join between the date dimension and the fact table results in partition pruning. Let’s consider a basic two table join between a date dimension and a fact table. For these examples I’m using STORE_SALES and DATE_DIM which are TPC-DS tables (I frequently use TPC-DS for experiments as it uses a [...]

The Core Performance Fundamentals Of Oracle Data Warehousing – Set Processing vs Row Processing

[back to Introduction] In over six years of doing data warehouse POCs and benchmarks for clients there is one area that I frequently see as problematic: “batch jobs”.  Most of the time these “batch jobs” take the form of some PL/SQL procedures and packages that generally perform some data load, transformation, processing or something similar.  The reason these are so problematic is that developers have hard-coded “slow” into them.  I’m generally certain these developers didn’t know they had done this when they coded their PL/SQL, but none the less it happened. So How Did “Slow” Get Hard-Coded Into My PL/SQL? Generally “slow” gets hard-coded into PL/SQL because the PL/SQL developer(s) took the business requirements and did a “literal translation” of each rule/requirement one at a time instead of looking at the “before picture” and the “after picture” and determining the most efficient way to make those data changes.  Many times this can surface as cursor based row-by-row processing, but it also can appear as PL/SQL just running a series of often poorly thought out SQL commands. Hard-Coded Slow Case Study The following is based on a true story. Only the facts names have been changed to protect the innocent. Here is [...]

The Core Performance Fundamentals Of Oracle Data Warehousing – Data Loading

[back to Introduction] Getting flat file data into your Oracle data warehouse is likely a daily (or more possibly frequent) task, but it certainly does not have to be a difficult one.  Bulk loading data rates are governed by the following operations and hardware resources: How fast can the data be read How fast can data be written out How much CPU power is available I’m always a bit amazed (and depressed) when I hear people complain that their data loading rates are slow and they proceed to tell me things like: The source files reside on a shared NFS filer (or similar) and it has just a single GbE (1 Gigabit Ethernet) network path to the Oracle database host(s). The source files reside on this internal disk volume which consists of a two disk mirror (or a volume with very few spindles). Maybe it’s not entirely obvious so let me spell it out (as I did in this tweet): One can not load data into a database faster than it can be delivered from the source. Database systems must obey the laws of physics! Or putting it another way: Don’t fall victim to slow data loading because of a slow performing data source. [...]

The Core Performance Fundamentals Of Oracle Data Warehousing – Parallel Execution

[back to Introduction] Leveraging Oracle’s Parallel Execution (PX) in your Oracle data warehouse is probably the most important feature/technology one can use to speed up operations on large data sets.  PX is not, however, “go fast” magic pixi dust for any old operation (if thats what you think, you probably don’t understand the parallel computing paradigm). With Oracle PX, a large task is broken up into smaller parts, sub-tasks if you will, and each sub-task is then worked on in parallel.  The goal of Oracle PX: divide and conquer.  This allows a significant amount of hardware resources to be engaged in solving a single problem and is what allows the Oracle database to scale up and out when working with large data sets. I though I’d touch on some basics and add my observations but this is by far not an exhaustive write up on Oracle’s Parallel Execution.  There is an entire chapter in the Oracle Database documentation on PX as well as several white papers.  I’ve listed all these in the Resources section at the bottom of this post.  Read them, but as always, feel free to post questions/comments here.  Discussion adds great value. A Basic Example of Parallel Execution [...]

The Core Performance Fundamentals Of Oracle Data Warehousing – Partitioning

[back to Introduction] Partitioning is an essential performance feature for an Oracle data warehouse because partition elimination (or partition pruning) generally results in the elimination of a significant amount of table data to be scanned. This results in a need for less system resources and improved query performance. Someone once told me “the fastest I/O is the one that never happens.” This is precisely the reason that partitioning is a must for Oracle data warehouses – it’s a huge I/O eliminator. I frequently refer to partition elimination as the anti-index. An index is used to find a small amount data that is required; partitioning is used to eliminate vasts amounts of data that is not required. Main Uses For Partitioning I would classify the main reasons to use partitioning in your Oracle data warehouse into these four areas: Data Elimination Partition-Wise Joins Manageability (Partition Exchange Load, Local Indexes, etc.) Information Lifecycle Management (ILM) Partitioning Basics The most common partitioning design pattern found in Oracle data warehouses is to partition the fact tables by range (or interval) on the event date/time column. This allows for partition elimination of all the data not in the desired time window in queries. For example: If I have a [...]