Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Little Things Doth Crabby Make – Part XXII. It’s All About Permissions, Dummy. I Mean yum(8).

Good grief. This is short and sweet, I know, but this installment in the Little Things Doth Crabby Make series is just that–short and sweet. Or, well, maybe short and sour?

Not root? Ok, yum(8), spew out a bunch of silliness at me. Thanks.

Sometimes, little things doth, well, crabby make!

#000000;" src="https://kevinclosson.files.wordpress.com/2017/08/fio-perms.png?w=500&h=299" alt="" width="500" height="299" srcset="https://kevinclosson.files.wordpress.com/2017/08/fio-perms.png?w=500&h=299 500w, https://kevinclosson.files.wordpress.com/2017/08/fio-perms.png?w=1000&h=598 1000w, https://kevinclosson.files.wordpress.com/2017/08/fio-perms.png?w=150&h=90 150w, https://kevinclosson.files.wordpress.com/2017/08/fio-perms.png?w=300&h=179 300w, https://kevinclosson.files.wordpress.com/2017/08/fio-perms.png?w=768&h=459 768w" sizes="(max-width: 500px) 100vw, 500px" />

Hey, yum(8), That is Ridiculous User Feedback

Filed under: oracle

Exadata Capacity on Demand and Elastic Rack

Since X4 we can do Capacity on Demand on Exadata: disable some CPU cores to lower cost of Oracle Database licenses. Depending on the models, and the configuration, there are different minimums and here is a recap table about those.

Here is the summary of Capacity on Demand minimum, maximum and increment. Those numbers come from the configuration file of OEDA, the Oracle Exadata Deployment Assistant (es.properties) and you can see that it already has an option for Exadata X7-2

Exadata model sockets cores per socket cores per server thread per core Capacity on Demand minimum Cod maximum CoD increment
X2-2 2 6 12 2
X3-2 2 8 16 2
X4-2 2 12 24 2 12
(not for 1/8th)
24 2
X5-2 2 18 36 2 14 36 2
X6-2 2 22 44 2 14
(8 for 1/8th)
44 2
X7-2 2 24 48 2 14
(8 for 1/8th)
48 2
X2-8 8 8 12 2
X3-8 8 10 12 2
X4-8 8 15 32 2 48 120 8
X5-8 8 18 32 2 56 144 8
X6-8 8 18 32 2 56
X7-8 8 24 32 2
SL6 2 32 64 8 14
(8 for 1/8th)
64 2
T7-2 2 32 62 8

 

Special minimums for 1/8th of Rack

The smallest configuration (1/8th of Rack) is a bit special. First, because it is physically identical to the 1/4th one with just some processors and disks disabled. But also, for this entry-level, the minimum required is lower – 8 cores per node – in X6.

Here is the Oracle Exadata Deployment Assistant for X6-2 1/8th of Rack:

CaptureOEDAx68002

When having selected 1/8th of Rack we are allowed to enable a minimum of 8 cores per nodes, as mentioned in the table above:

CaptureOEDAx68006

Elastic Rack

Elastic Rack configuration allows to configure any combination of database nodes and storage cells:

CaptureOEDAx68004

With Elastic Rack configuration, the next screen is not only displaying the configuration, but you can customize it.
Here I define the same configuration as an 8th of RAC:

CaptureOEDAx68005

However, because it is not an 1/8th Rack configuration, the minimum is 14 cores per node and not 8:

CaptureOEDAx68001

So be careful. Elastic configuration gives more flexibility, but CoD minimums are is different than the equivalent configuration.

/opt/oracle.SupportTools/resourcecontrol

As I’m talking about elastic configuration here is how the cores are enabled. The configuration assistant calls /opt/oracle.SupportTools/resourcecontrol which displays or updates the BIOS configuration. You may wonder why you can do that here and not in your own servers? Because here Oracle can trace what happened. You will find the log in /var/log/oracleexa/systemconfig.log and here is an example where the Elastic Rack has been deployed with 16 cores per database node Capacity on Demand:

Fri Aug 04 16:12:18 CEST 2017
Executing command: /opt/oracle.SupportTools/resourcecontrol -show
[INFO] Validated hardware and OS. Proceed.
[SHOW] Number of physical cores active per socket: 22
[SHOW] Total number of cores active: 44
 
Mon Aug 07 11:24:31 CEST 2017
Executing command: /opt/oracle.SupportTools/resourcecontrol -core 16 -force
[INFO] Validated hardware and OS. Proceed.
[INFO] Enabling 8 cores on each socket.
[INFO] Import all bios settings
[INFO] All bios settings have been imported with success
[ACTION] Reboot server for settings to take effect
[SHOW] Number of physical cores active per socket: 8
[SHOW] Total number of cores active: 16
 
Mon Aug 07 11:31:24 CEST 2017
Executing command: /opt/oracle.SupportTools/resourcecontrol -show
[INFO] Validated hardware and OS. Proceed.
[SHOW] Number of physical cores active per socket: 8
[SHOW] Total number of cores active: 16

This does not stay on your server. There is a rule that you can do Capacity on Demand only if you have configured Platinum support, or use Oracle Configuration Manager, or Enterprise Manager. All those may store history of the CPU count, which means that it is auditable.

 

Cet article Exadata Capacity on Demand and Elastic Rack est apparu en premier sur Blog dbi services.

Friday Philosophy – Sometime The Solution Has To Not Only Match The Problem But Also…

…The People!

When you design a system for end users, a good designer/developer considers the “UX” – User eXperience. The system has to be acceptable to the end user. This is often expressed as “easy to use” or “fun” or “Quick”. But in reality, the system can fail in all sort of ways but still be a success if the end user gets something out of using it. I’ve said it before and I’ll say it again and again until I give up on this career. In my opinion:

User Acceptance is the number one aim of any I.T. system.

OK, you all know about UX probably. But what about solutions that have no End Users? I’m thinking about when you create a technical solution or fix for an internal system, to be used by fellow I.T. professionals. How many have you considered the skills and temperament of the people who are going to house-keep the solution you create? I suppose I have had opportunity to think about this more than some of you due to how I work:- I’m a consultant who gets called in to fix things and then leave. At times I have chosen a solution that has been influenced by the people who will be looking after it.

I’ll give you an example. At one site that I worked at for about 9 months, I did a lot of work for one system. The developer/systems administrator who looked after the system was…stupid. I don’t really like saying that, we all vary in our skill set, experience, intelligence, *type* of intelligence (I know some people who can speak 3 languages or know a lot about history but could not wire a plug). But this guy really seemed to struggle with logic, cause-and-effect or learning anything new. And I had to help him look after this database application with one main, huge, hulking table. It had to be partitioned, those partitions maintained and the data archived. I implemented the partitioning, I explained partitions to him several times, what was needed to maintain them, where to look in the data dictionary for information. It was like talking to my mum about it. He just seemed not to understand and his efforts to code something to do what needed to be done were woeful.

I knew it was not me, I’ve run enough training sessions and presented so often that I know I can explain myself (Well, I hope so! Maybe I am deluded). He just was not getting it. Maybe he was in the wrong job. So I wrote him a set of SQL-generating scripts to get him going. He kept messing up running them. In the end, I knew I was about to leave and when I did within 3 months the real customer would have a broken system. So I wrote a mini-application in PL/SQL for him to do what needed to be done. And set it to email a central team if it failed. The team he would call when he broke it all again. I also simplified the solution. My original system had some bells and whistles to help with future changes, such as over-riding where new partitions went or how old ones were compressed. I stripped it out to keep it as simple as possible. I altered the solution to suit the person that would run it.

I’ve done something like this a few times over the years. Usually it is more to do with the skill set of the team as opposed to actual ability. I’ve on occasion worked with people who are new to Oracle and my time is limited so, rather than give them a solution written in PL/SQL that none of them know, I have done so with SQL and cookery instructions/shell scripts. It’s not the best solution but it is something they can live with.

More recently I had to look at fixing the performance of some SQL statements. Baselines would have done the job perfectly. However, the team were all Java experts and had no desire at all to learn about database administration. (To be frank, they had no time to learn either, it was the usual situation of them having 75 hours of work each every week as management thought just shouting would get things fixed, not hiring enough people). I strongly suspected that they would forget about the baselines and if they had a problem they would be confused as to what was going on. So I fixed the key SQL statements with a set of hints to force both the overall structure of the execution plans as well as which indexes to use etc – and said over and over and over and over that if they ever changed indexes or migrated to a later version of Oracle, those hints would need reviewing. They were, in effect, part of their code base. A big advantage of the hints was that they would see them in their code and it would remind them what had been done. They seemed happy with that.

My point is, sometimes the “best” solution is not the correct one, even when you are keeping within the walls of the computing department(s). Sometimes you need to think about who you are giving the solution to and change the solution accordingly.

Successful Evangelism

I’ve been asked what it takes to be a successful evangelist and realizing that what makes one successful at it, is often like holding sand in your hands- no matter how tightly you hold your fists, it’s difficult to contain the grains.

The term evangelist is one that either receives very positive or very negative responses.  I’m not a fan of the term, but no matter if you use this term or call them advocates, representative, influencer-  it doesn’t matter, they are essential to the business, product or technology that they become the voice for.

Those that I view as successful evangelists in the communities that I am part of?

There are a number of folks I’m sure I missed I also admire as I interact and observe their contributions, but these are a few that come to mind when I think of fellow evangelists.

What makes an evangelist successful?  It may not be what you think.

1. It’s Not Just About the Company

Most companies think they hire an evangelist to promote and market the company and yet, when all you do it push out company info, company marketing- People STOP listening to you.  What you say, do and are interested in should drive people to want to know more about you, including the company you work for and what that company does.

All of these folks talk about interests outside of work.  They post about their lives, their interests and contribute to their communities.  This is what it means to be really authentic and setting an example.  People want to be more like them because they see the value they add to the world than just talking points.

2.  They’re Authentic

Authenticity is something most find very elusive.  If you’re just copying what another does, there’s nothing authentic about that.  There’s nothing wrong finding a tip or tidbit that someone else is doing and adopting it, but it has to WORK for you.  I was just part of a conversation yesterday, where Jeff and I were discussing that he doesn’t use Buffer, (social media scheduling tool) where I live by it.  It doesn’t work for Jeff and there’s nothing wrong with that.  We are individuals and what makes us powerful evangelists is that we figured out what works for each of us.

3.  In the Know

As a technical evangelist, you can’t just read the docs and think you’re going to be received well.  Theory is not practice and I’ve had a couple disagreements with managers explaining why I needed to work with the product.  I’ve had to battle for hardware to build out what I’ve been expected to talk on and only once I didn’t fight for it and I paid for it drastically.  I won’t write on a topic unless I can test it out on my own.  Being in the trenches provides you a point of view no document can provide.

Documentation is secondary to experience.

4.  Your View is Outward

This is a difficult one for most companies when they’re trying to create evangelists from internal employees.  Those that may be deeply involved at the company level may interact well with others, but won’t redirect to an external view.  I’ve had people ask me why my husband isn’t doing as much as I am in the community.  Due to his position, he must be more internally and customer facing.  My job is very separate from my fellow employees.  I must always be focused outward and interact at least 95% of my time with the community.  You’ll notice all of the folks listed are continually interacting with people outside of their company and are considered very “approachable.”

We volunteer our time in the community- user groups, board of directors, events and partnering with companies.  We socialize, as we know our network is essential to the companies we represent.

5.  We Promote

I wish I did more public promotion like I see some of these other folks.  I’m like my parents-  I stand up for others and support them on initiatives and goals.  I do a lot of mentoring, but less when I’m blogging.  My mother was never about empty compliments and I did take after her on this.  I’m just not very good at remembering to compliment people on social media and feel I lack in this area, but I continually watch others do this for folks in the community and this is so important.

We ensure to work with those that may need introductions in our network, support in the community and reach out to offer our help.  In the public view, this is quite transparent, so when others pay this forward or return the favor, it can appear that people just bend over backwards for us, but we often have been their for the folks in question in the past, with no expectations and people remembered this.

We do promote our company, but for the right reasons.  The company has done something good for the community, has something special going on, but rarely do we push out anything marketing, as it just doesn’t come across very well from us.  It’s not authentic.

Additional Recommendations

  • Refrain from internet arguments, social media confrontations

I’m not saying to be a pushover.  I literally have friends muted and even blocked.  There’s nothing wrong with NOT being connected to individuals that have very different beliefs or social media behavior.  You shouldn’t take it personally– this is professional and you should treat it as such.

You may find, (especially for women and people of color) that certain individuals will challenge you on ridiculous topics and battle you on little details.  This is just the standard over-scrutinizing that we go through and if it’s not too bad, I tell people to just ignore it and not respond.  If it escalates, don’t hesitate to mute or block the person.  You’re not there to entertain them and by removing your contributions from their feed- “out of sight, out of mind”, offering peace to both of you… </p />
</p></div></div>

    	  	<div class=

Postgres vs. Oracle access paths III – Partial Index

In the previous post I said that an Index Only Access needs to find all rows in the index. Here is a case where, with similar data, Postgres can find all rows but Oracle needs additional considerations.

In the previous post I’ve executed:
select sum(n) from demo1
The execution plan was:

Aggregate (cost=295.29..295.30 rows=1 width=8) (actual time=2.192..2.193 rows=1 loops=1)
Output: sum(n)
Buffers: shared hit=30
-> Index Only Scan using demo1_n on public.demo1 (cost=0.29..270.29 rows=10000 width=4) (actual time=0.150..1.277 rows=10000 loops=1)
Output: n
Heap Fetches: 0
Buffers: shared hit=30

Basically, this reads all values of the column N and then aggregates them to the sum.
If I remove the SUM() I have only the part that reads all values from N:

explain (analyze,verbose,costs,buffers) select n from demo1 ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using demo1_n on public.demo1 (cost=0.29..270.29 rows=10000 width=4) (actual time=0.150..1.284 rows=10000 loops=1)
Output: n
Heap Fetches: 0
Buffers: shared hit=30
Planning time: 0.440 ms
Execution time: 1.972 ms

Oracle

This sounds logical. Now let’s run the same query, a simple ‘select n from demo1′ in Oracle:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID ad4z7tpt0dkta, child number 0
-------------------------------------
select /*+ */ n from demo1
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 397 (100)| 10000 |00:00:00.01 | 1451 |
| 1 | TABLE ACCESS FULL| DEMO1 | 1 | 10000 | 397 (0)| 10000 |00:00:00.01 | 1451 |
--------------------------------------------------------------------------------------------------
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "N"[NUMBER,22]

Here the access path is different: a full table scan instead of an index only access (Index Fast Full Scan). It is not a cost decision. If we try to force an index access, with INDEX_FFS() or INDEX() hints, the query will still do a Full Table Scan. The reason is that and index only access is possible only if all columns and all rows are present in the index. But Oracle does not always index all rows. The Oracle index has no entry for the rows where all the indexed columns are nulls.

Where n is not null

If I run the same query with the purpose of showing only non-null values, with a ‘where n is not null’ predicate, then an index only access is possible:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 2gbjpw5u0v9cw, child number 0
-------------------------------------
select /*+ */ n from demo1 where n is not null
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 7 (100)| 10000 |00:00:00.01 | 28 |
| 1 | INDEX FAST FULL SCAN| DEMO1_N | 1 | 10000 | 7 (0)| 10000 |00:00:00.01 | 28 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("N" IS NOT NULL)

Constraints

An alternative, if we know that we will never have null values here, is to give the information to the optimizer that there are no null values in the column N:
In Oracle:
alter table demo1 modify n not null;
This is the equivalent of the PostgreSQL
alter table demo1 alter column n set not null;
Then, in addition to ensuring the verification of the constraint, the constraint informs the optimizer that there is no null values and that all rows can be find in the index:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID ad4z7tpt0dkta, child number 0
-------------------------------------
select /*+ */ n from demo1
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 7 (100)| 10000 |00:00:00.01 | 28 |
| 1 | INDEX FAST FULL SCAN| DEMO1_N | 1 | 10000 | 7 (0)| 10000 |00:00:00.01 | 28 |
-------------------------------------------------------------------------------------------------------
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "N"[NUMBER,22]

Additional columns

Even if the column can have some null values, it is easy to have an index on null values in Oracle, just by adding a non-null column or expression. And if you don’t need this additional column, you can even add a constant, such as in the following index definition:

create unique index demo1_n on demo1(n,0);

This works because all index entries have at least one non null value. But looking at the buffers you can see that this additional byte (0 is stored in 1 byte) has a little overhead (31 blocks read here instead of 28):

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID ad4z7tpt0dkta, child number 0
-------------------------------------
select /*+ */ n from demo1
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 8 (100)| 10000 |00:00:00.01 | 31 |
| 1 | INDEX FAST FULL SCAN| DEMO1_N | 1 | 10000 | 8 (0)| 10000 |00:00:00.01 | 31 |
-------------------------------------------------------------------------------------------------------
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "N"[NUMBER,22]

Oracle Partial Indexes

In Oracle, all indexes that include a nullable column are partial indexes: not all rows are indexed, and an index access is possible only if the WHERE clause, or a constraint, guarantees that we don’t need the non-indexed rows. Combined with expression, it can be a way to implement partial indexes when the expression returns null for a specific condition. Oracle even provides computed columns (aka virtual columns) so that the expression does not have to be coded in the where clause of the query.

As an example with expressions, the following index has entries only for the values lower than 10:
create index demo_top10 on demo1(case when n<=10 then n end)

However, to use it, we must mention the expression explicitly:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 863drbjwayrt7, child number 0
-------------------------------------
select /*+ */ (case when n<=10 then n end) from demo1 where (case when
n<=10 then n end)<=5
---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 (100)| 4 |00:00:00.01 | 2 |
|* 1 | INDEX RANGE SCAN| DEMO1_N_TOP10 | 1 | 5 | 1 (0)| 4 |00:00:00.01 | 2 |
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("DEMO1"."SYS_NC00004$"<=5)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "DEMO1"."SYS_NC00004$"[NUMBER,22]

We can see that internally, a virtual column (“SYS_NC00004$”) has been created for the indexed expression, and is used for the predicate and the projection which uses the same expression. There is another possibility with the ‘partial index’ feature introduced in 12c but it has not the flexibility of a predicate: it is based on partitioning where only some partitions can be indexed.

Postgres Partial Indexes

Postgres does not need those workarounds. An index indexes all rows, including null entries, and partial indexes can be defined with a where clause:
create index demo_top10 on demo1(n) where n<=10

No need to change the query. As long as the result can come from the partial index, we can use the column without an expression on it:

explain (analyze,verbose,costs,buffers) select n from demo1 where n<=5 ;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using demo1_n_top10 on public.demo1 (cost=0.14..4.21 rows=4 width=4) (actual time=0.114..0.114 rows=5 loops=1)
Output: n
Index Cond: (demo1.n <= 5)
Heap Fetches: 0
Buffers: shared hit=2
Planning time: 0.557 ms
Execution time: 0.129 ms

Here the smaller partial index (demo1_n_top10) has been chosen by the query planner.

As you see I’ve not used exactly the same condition. The query planner understood that n<=5 (in the WHERE clause) is a subset of n<=10 (in the index definition). However, if the predicate is too different, it cannot use the index:

fpa=# explain (analyze,verbose,costs,buffers) select n from demo1 where 2*n<=10;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using demo1_n on public.demo1 (cost=0.29..320.29 rows=3333 width=4) (actual time=0.020..1.086 rows=5 loops=1)
Output: n
Filter: ((2 * demo1.n) <= 10)
Rows Removed by Filter: 9995
Heap Fetches: 0
Buffers: shared hit=30

Here, instead of “Index Cond” we have a simple “Filter”. The Index Only Scan has read all the rows, and they were filtered afterward (“Rows Removed by Filter”).

Index condition

With the VERBOSE option of EXPLAIN we see the condition used by the index access:
Index Cond: (demo1.n <= 5)
‘Index Cond.’ is not a simple filter removing rows after an operation, but it is the condition which is used for fast access to the index entries in the sorted index structure. We have the equivalent in Oracle with the ‘+predicate’ format of dbms_xplan:

Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("N"<=5)

Before going further on index access for WHERE clause predicate, the next post will show the major characteristic of indexes (besides the fact that it stores a redundant subset of columns and rows): they are maintained sorted and may return the resulting rows in order.

 

Cet article Postgres vs. Oracle access paths III – Partial Index est apparu en premier sur Blog dbi services.

Random null value in plsql generated data

I have added another small feature to the testdata library. The ability to create random data is good, but
sometimes you also need to have "gaps" in that data. So I made it possible to randomly create null instead of a
value for any generated field. Simply surround the generator function in parentheses and add how big a percentage the chance
are of a null value after the parentheses.

Rebuilding Indexes

One of the special events that can make it necessary to rebuild an index is the case of the “massive DML”, typically a bulk delete that purges old data from a table. You may even find cases where it’s a good idea to mark a couple of your indexes as unusable before doing a massive delete and then rebuild them after the delete.

Despite the fact that a massive delete is an obvious special case it’s still not necessary in many cases to worry about a rebuild afterwards because the space made free by the delete will be smoothly reused over time with very little variation in performance. There is, however, one particular feature that increases the probability of a rebuild becoming necessary – global (or globally partitioned) indexes on partitioned tables. The problem (and the absence of problem in non-partitioned tables) is in the nature of the rowid.

For non-partitioned tables, and partitioned tables with local indexes, the rowid stored in an index is (assuming we’re thinking only of heap tables) stored as a sequence of 6 bytes consisting, in order, of: (tablespace relative file number, block number within file, row number within block). If the table is non-partitioned, or if this is an index segment from a locally partitioned index, all the index entries will be pointing to the same table segment and Oracle knows which segment that is from the data dictionary information – so Oracle can derive the data_object_id of the table segment and convert the tablespace relative file number into the absolute file number to navigate to the right row in the table.

When the index is global or globally partitioned any index entry may point to any of the table’s segments, so the rowid that is stored in the index is expanded by a further 4 bytes to hold the data_object_id of the table segment it is pointing to – and the data_object_id is the leading component: (data_object_id, tablespace relative file number, block number within file, row number within block). Think about what this means when you start to drop “old” partitions and add new partitions. Compare this with what happens when you simply delete a large volume of old data from a table and starting inserting new data. There’s an important difference to the way in which indexes will evolve.

Purging data

When you delete a large volume of data from a (simple, heap) table you will create a lot of empty space in a lot of existing table blocks. If that delete is supposed to get rid of “old” data (and to keep the description simple we’ll assume it’s the first time you’ve done this) it’s very likely that the delete will result in lots of empty blocks near the start of the table – rows that were inserted at the same time will tend to be physically close to each other in the table. This means that future inserts will soon start to reuse those table blocks. Think about what this means for index entries – especially for non-unique keys.

Assume you have 100 rows with value ‘XXX’ for an indexed column. Breaking the rowid into its component parts the index entries will be (‘XXX’,{file_id, block_id, row_number}).  Now assume you delete the oldest 10 rows then, over time, insert 10 more rows with the same key value. You’ve deleted the 10 index entries with the lowest values for (file_id, block_id) but the space that’s become available in the table will be in and around exactly that range of blocks – so the new index entries will probably end up looking very similar to the deleted index entries and inserted in and around the existing index entries for value ‘XXX’, so over time the index is unlikely to allocate much new space.

Now think about what happens when your table it partitioned but the index is global; your index entries are (‘XXX’,{data_object_id, file_id, block_id, row_number}). When you drop the oldest partition you will probably[1] delete all the index entries with the lowest data_object_id. When you start inserting new rows for ‘XXX’ the new table partition will have a data_object_id that is going to be higher than any previous data_object_id – which means you’re going to be inserting rows into the right-hand (high-value) edge of this section of the index. In some cases – typically those where you have a couple of leaf blocks per key value – the index may end up growing significantly because the insertion point for rows in the new partition isn’t in the leaf block with the available space, and it won’t be until you’ve done a few more bulk deletes and the leaf blocks relating to the oldest table partitions become completely empty that the space can be reused.

An example of this type of behaviour probably appeared on the OTN database forum quite recently.  Of course, there are various reasons why indexes can become inefficient, and the degree of inefficiency may only become noticeable over a fairly long period of time; moreover there are various reasons why global indexes are a little problematic, and various reasons why a bulk delete (which is what executing “alter table drop partition” does to a global index) has unpleasant side effects dependent somewhat on the number (and size) of the partitions and on how many you try to drop in one go.

There’s not  a lot you can do about this quirk of global indexes, but it’s always worth taking extra care with partitioned tables and focusing even more carefully on a strategic review of indexes:

  • Does this index really need to exist at all
  • Could this index be replaced by a selective function-based index
  • Does this index really need to be global / globally partitioned
  • How big is this index compared to the size it ought to be
  • Should this index be (basic) compressed
  • Is this index likely to be disrupted by a historic purge – or is there another reason for its undesirable behaviour

 

[1] probably delete entries with the lowest data_object_id” – I have to say this because if you’ve executed a “move partition” at any time a new data_object_id will have been generated for the partition, so the oldest partition could, in principal, have the highest data_object_id. The issue of changing data_object_ids brings a whole new level of complexity to global indexes – but only in a few special cases, fortunately.

 

 

Postgres vs. Oracle access paths II – IndexOnlyScan

In the previous post I’ve explained a sequential scan by accident: my query needed only one column which was indexed, and I expected to read the index rather than the table. And I had to hint the Oracle example to get the same because the Oracle optimizer chooses the index scan over the table scan in that case. Here is where I learned a big difference between Postgres and Oracle. They both use MVCC to query without locking, but Postgres MVCC is for table rows (tuples) only whereas Oracle MVCC is for all blocks – tables and indexes.

So this second post is about Index Only Scan and the second constant you find in the documentation for the query planner:
random_page_cost (floating point)
Sets the planner’s estimate of the cost of a non-sequentially-fetched disk page. The default is 4.0.


I am here in the situation after the previous post: created table and index, have run a query which did a sequential scan on the table:

explain (analyze,verbose,costs,buffers) select sum(n) from demo1 ;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1554.00..1554.01 rows=1 width=8) (actual time=17.430..17.430 rows=1 loops=1)
Output: sum(n)
Buffers: shared hit=1429
-> Seq Scan on public.demo1 (cost=0.00..1529.00 rows=10000 width=4) (actual time=0.031..13.011 rows=10000 loops=1)
Output: n, a, x
Buffers: shared hit=1429
Planning time: 1.791 ms
Execution time: 17.505 ms

Index Only Scan

I want to understand why the query planner did not choose an access to the index only. This is where hints are useful: force a plan that is not chosen by the optimizer in order to check if this plan is possible, and then check its cost:

/*+ IndexOnlyScan(demo1) */
explain (analyze,verbose,costs,buffers) select sum(n) from demo1 ;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1727.29..1727.30 rows=1 width=8) (actual time=5.424..5.425 rows=1 loops=1)
Output: sum(n)
Buffers: shared hit=1429 read=29
-> Index Only Scan using demo1_n on public.demo1 (cost=0.29..1702.29 rows=10000 width=4) (actual time=0.177..4.613 rows=10000 loops=1)
Output: n
Heap Fetches: 10000
Buffers: shared hit=1429 read=29
Planning time: 0.390 ms
Execution time: 5.448 ms

From there you see that an Index Only Scan is possible but more expensive. The estimated cost is higher than the Seq Scan (cost=0.29..1702.29 instead of cost=0.00..1529.00). And the execution statistics shows that I’ve read the 1429 table pages in addition to the 29 pages of the index.

From the hit/read statistics we can note that the create table has left all the table pages in the buffer cache, but this is not the case for the create index. But that’s another story. My concern is why and index only access goes to read all table blocks in addition to the index ones, which brings the cost to 1727.30-1554.01=173.29 higher than the sequential scan.

The clue is in this line showing that all my rows were fetched from heap page, which is the table: Heap Fetches: 10000

Tuple visibility

In ACID databases, a modification must not be visible by others until the transaction completion (commit). There are two ways to achieve that. The first way is to read the latest version of data: lock in share mode what you read, so that no concurrent update can happen. The other solution is to query a previous version of data (MVCC – Multi Version Concurrency Control) where uncommitted changes are not visible. Both Oracle and Postgres use MVCC which is great because you can have transactions and queries on the same database. But they do the versioning at a different level.

Oracle MVCC is physical, at block level. Then everything is versioned: tables as well as index, with their transaction information (ITL) which, with the help of the transaction table, give all information about visibility: committed or not, and with the commit SCN. With this architecture, a modified block can be written to disk even with uncommitted changes and there is no need to re-visit it later once the transaction is committed.

Postgres MVCC is logical at row (‘tuple’) level: new version is a new row, and committed changes set the visibility of the row. The table row is versioned but not the index entry. If you access by index, you still need to go to the table to see if the row is visible to you. This is why I had heap fetches here and the table blocks were read.

This explains that the cost of Index Only Scan is high here. In addition to about 30 index blocks to read, I’ve read about 1429 table blocks. But that can be worse. For each index entry, and I have 10000 of them, we need to go to the table row, which is exactly what the 10000 heap fetches are. But I’m lucky because I have a very good clustering factor: I have created the table with increasing values for the column N (generated by generate_series). With a bad clustering factor (physical storage of rows in the table not correlated with the order of index) you would see up to 10000 additional shared hits. Thankfully, the query planner estimates this and has switched to table scan which is cheaper in this case.

Vacuum and Visibility Map

Always going to the table rows to see if they are committed would always be more expensive than a table scan. The Postgres vacuum process maintains a Visibility Map as a bitmap of pages that have been vacuumed and have no more tuples to vacuum. This means that all rows in those pages are visible to all transactions. When there is an update on the page, the flag is unset, and remains unset until the modification is committed and the vacuum runs on it. This visibility flag is used by the Index Only Scan to know if it is needed to get to the page.

Let’s run the vacuum and try again the same query:

vacuum demo1;
VACUUM
 
explain (analyze,verbose,costs,buffers) select sum(n) from demo1 ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=295.29..295.30 rows=1 width=8) (actual time=2.192..2.193 rows=1 loops=1)
Output: sum(n)
Buffers: shared hit=30
-> Index Only Scan using demo1_n on public.demo1 (cost=0.29..270.29 rows=10000 width=4) (actual time=0.150..1.277 rows=10000 loops=1)
Output: n
Heap Fetches: 0
Buffers: shared hit=30
Planning time: 0.450 ms
Execution time: 2.213 ms

Here, without any hint, the query planner has chosen the Index Only Scan which is now less expensive than a Seq Scan: cost=0.29..270.29

Cost of Index Only Scan

There is an initial cost of 0.29 is calculated from cpu_operator_cost which defaults 0.0025 which means that about 0.29/0.0025=116 operations were charged here. This cost is minimal and I don’t go into details.
CaptureIndexScanpgora
Then, to get rows we have to

  • read 30 blocks from the index. Those seem to be random scan (with random_page_cost=4) and then the cost for all rows is 4*30=120
  • process the index entries (with cpu_index_tuple_cost=0.005) and then the cost for all 10000 rows is 0.005*10000=50
  • process the result rows (with cpu_tuple_cost=0.01) and then the cost for all 10000 rows is 0.01*10000=100

This brings the cost to the total of 270.29

For the above operation, the SUM(N) this is exactly the same as in the previous post on Seq Scan: cost=25 (cpu_operator_cost=0.0025 for 10000 rows) and is this initial cost because the sum is now only when all rows are processed, and an additional 0.01 for the result row.

Oracle

In the previous post I used the FULL() hint to compare Oracle Full Table Scan to Postgres Seq Scan, but by default, Oracle chose an index only access because the index covers all the rows and columns we need.

All columns that we need:

In the previous post we have seen the column projection (from the +projeciton format of dbms_xplan):

Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=0) SUM("N")[22]
2 - (rowset=256) "N"[NUMBER,22]

I need only the column N from the table DEMO1, and this column is in the index DEMO1_N

All rows that we need:

In Oracle an index does not have an entry for every row but only for rows where at least one of the indexed columns is not null. Here because we have no where clause predicate on N, and because we have not declared the column N as NOT NULL, the access by index may not return all rows. However, the SUM() function does not need to know about the null values, because they don’t change the sum and then the optimizer can safely choose to do an index only access.

Here is the query without hints:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 6z194712fvcfu, child number 0
-------------------------------------
select /*+ */ sum(n) from demo1
--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 7 (100)| 1 |00:00:00.01 | 26 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:00.01 | 26 |
| 2 | INDEX FAST FULL SCAN| DEMO1_N | 1 | 10000 | 7 (0)| 10000 |00:00:00.01 | 26 |
--------------------------------------------------------------------------------------------------------
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=0) SUM("N")[22]
2 - "N"[NUMBER,22]

This plan looks very similar to the Postgres one after the vacuum: 51 buffers which is approximately the number of blocks in my index here. However, Oracle does not have the ‘vacuum’ requirement because the MVCC applies to the index and Oracle does not need to go to the table to undo the uncommitted changes. But there is something else here. If you remember the previous post, the Oracle cost=1 is equivalent to the cost of a random read (single block) and the cost of reading one block through a larger I/O (multiblock read) is, with default statistics, about 0.278 times cheaper. Here, 7/26= 0.2692 which proves that the cost is based on multiblock reads. Oracle can read indexes with INDEX FAST FULL SCAN in the same way it reads table with FULL TABLE SCAN: with larger I/O. We don’t need any ordering of rows here, because we just do the sum, and then we don’t need to follow the chain of leaf blocks, scattered within the index segment. Just read all blocks as they come, with fast I/O.

Index Fast Full Scan is possible in Oracle because MVCC is at block level for indexes as well as tables. You can just read the blocks as of the point in time of the query, without being concerned by concurrent operations that update the index entries or split the blocks. Postgres Index Only Scan is limited because MVCC is on tables only, and then must scan the index in the order of leaves, and must read the visibility map and maybe the table pages.

In Oracle, an index can be used to partition vertically a table, as a redundant storage of a few columns in order to avoid full table scans on large rows, allowing queries to avoid completely to read the table when the index covers all required rows and columns. We will see more about the ‘all rows’ requirement in the next post.

 

Cet article Postgres vs. Oracle access paths II – IndexOnlyScan est apparu en premier sur Blog dbi services.

Oracle Cloud: script to stop all PaaS services

With metered cloud services, keeping all your instances running may become expensive. The goal is to start them only when you need them. Here is a script that stops all instances you have on the Oracle Cloud Service PaaS. You can schedule it for example to stop them at the end of the business day, or when they are not active for a long time. The scripts use the REST API called with curl, JSON output parsed with jq, HTTP status explained with links.

In the first part of the script, I set the variables. Set them to your user:password, identity domain, cloud service url, ssh key:

u="MyEmail@Domain.net:MyPassword"
i=a521642
r=https://dbcs.emea.oraclecloud.com
k="ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCxAEm1WHYbJa50t61YhM53u4sljjSFGK458fgdljjkNqfihcRxSf2ENw6iaYhiBTPogG9IDaEqW+SbwpororD2/bep16/hHybGswD34jU7bf9kaaKi5gOgASChid4e322zrnwOtlzfHiiquhiUDgLwpQxCYVV5zU1RQ2NS3F3a45bepqkn/GuPY5x/KSn576+7HBCYVbao/PTwZAeTVbo6Xb4ZQQrOIsLJxxDCQsr0/g7ZS8/OJHt8kotydu13n3rANB2y312XtTsW9mAwpfXuCuqDM5+dIjUdmtflkYtqsfrqSpLevVfVt1L7hqo+JGo7OBO0doVs6lQSCblZhYHh Me@MyLaptop"

Here is the script. It starts to download the certificate if not already there. Then queries for all non stopped services and stops them. Finally, the last line displays the status of all services.


[ -f cacert.pem ] || curl --remote-name --time-cond cacert.pem https://curl.haxx.se/ca/cacert.pem
 
for s in $( curl -s --request GET --cacert cacert.pem --user $u --header "X-ID-TENANT-NAME:$i" $r/paas/service/dbcs/api/v1.1/instances/$i | jq -r '.services[]|select(.status!="Stopped")|.service_name' )
do
# call the 'Stop service' REST API and get the http status
httpstatus=$(curl --include --request POST --cacert cacert.pem --user $u --header "X-ID-TENANT-NAME:$i" --header "Content-Type:application/json" --data '{"lifecycleState":"Stop"}' $r/paas/service/dbcs/api/v1.1/instances/$i/$s | awk '{print >"/dev/stderr"} /^HTTP/{print $2}')
# look for http status in documentation
links -dump -width 300 https://docs.oracle.com/en/cloud/paas/java-cloud/jsrmr/Status%20Codes.html | grep -B 1 -A 1 " $httpstatus "
done
 
sleep 1
curl -s --request GET --cacert cacert.pem --user $u --header "X-ID-TENANT-NAME:$i" $r/paas/service/dbcs/api/v1.1/instances/$i | jq .

The script requires:

  • curl to call the REST API
  • jq to format and extract the returned JSON
  • links to get the HTTP status description from the documentation

The Cloud is all about automation and the REST API makes it very easy to do from command line or script.

 

Cet article Oracle Cloud: script to stop all PaaS services est apparu en premier sur Blog dbi services.

Basic Index Compression Made Simple (It Ain’t Easy)

I’ve discussed Index Compression a number of times as it’s an excellent way of minimizing the size of indexes without unnecessary, expensive index rebuilds. One of the nice features of Advanced Index Compression is that not only does it potentially compress indexes more effectively than possible with Basic Index Compression, but that it also eliminates […]