Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

nVision Bug in PeopleTools 8.55/8.56 Impacts Performance

I have recently come across an interesting bug in nVision that has a significant performance impact on nVision reports in particular and can impact the database as a whole.

Problem nVision SQL

This is an example of the problematic SQL generated by nVision.  The problem is that all of the SQL looks like this. There is never any group by clause, nor any grouping columns in the select clause in from of the SUM().

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT SUM(A.POSTED_BASE_AMT) 
FROM PS_LEDGER A, PSTREESELECT10 L2, PSTREESELECT10 L1
WHERE A.LEDGER='ACTUAL' AND A.FISCAL_YEAR=2018 AND A.ACCOUNTING_PERIOD BETWEEN 1 AND 8
AND L2.SELECTOR_NUM=159077 AND A.ACCOUNT=L2.RANGE_FROM_10
AND (A.BUSINESS_UNIT='10000')
AND L1.SELECTOR_NUM=159075 AND A.DEPTID=L1.RANGE_FROM_10
AND A.CURRENCY_CD='GBP' AND A.STATISTICS_CODE=' '

Each query only returns a single row, that only populates a single cell in the report, and therefore a different SQL statement is generated and executed for every cell in the report.  Therefore, more statements are parsed and executed, and more scans of the ledger indexes and look-ups of the ledger table and performed.  This consumes more CPU, more logical I/O.

Normal nVision SQL

This is how I would expect normal nVision SQL to look.  This example, although obfuscated, came from a real customer system.  Note how the query is grouped by TREE_NODE_NUM from two of the tree selector tables, so this one query now populates a block of cells.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT L2.TREE_NODE_NUM,L3.TREE_NODE_NUM,SUM(A.POSTED_TOTAL_AMT) 
FROM PS_LEDGER A, PSTREESELECT05 L2, PSTREESELECT10 L3
WHERE A.LEDGER='S_UKMGT'
AND A.FISCAL_YEAR=2018
AND A.ACCOUNTING_PERIOD BETWEEN 0 AND 12
AND (A.DEPTID BETWEEN 'A0000' AND 'A8999' OR A.DEPTID BETWEEN 'B0000' AND 'B9149'
OR A.DEPTID='B9156' OR A.DEPTID='B9158' OR A.DEPTID BETWEEN 'B9165' AND 'B9999'
OR A.DEPTID BETWEEN 'C0000' AND 'C9999' OR A.DEPTID BETWEEN 'D0000' AND 'D9999'
OR A.DEPTID BETWEEN 'G0000' AND 'G9999' OR A.DEPTID BETWEEN 'H0000' AND 'H9999'
OR A.DEPTID='B9150' OR A.DEPTID=' ')
AND L2.SELECTOR_NUM=10228
AND A.BUSINESS_UNIT=L2.RANGE_FROM_05
AND L3.SELECTOR_NUM=10231
AND A.ACCOUNT=L3.RANGE_FROM_10
AND A.CHARTFIELD1='0012345'
AND A.CURRENCY_CD='GBP'
GROUP BY L2.TREE_NODE_NUM,L3.TREE_NODE_NUM

The Bug

This Oracle note details an nVision bug:

"UPTO SET2A-C Fixes - Details-only nPlosion not happening for Single Chart-field nPlosion Criteria.
And also encountered a performance issue when enabled details-only nPlosion for most of the row criteria in the same layout
Issue was introduced on build 8.55.19.
Condition: When most of the row filter criteria enabled Details-only nPlosion. This is solved in 8.55.22 & 8.56.07.
UPTO SET3 Fixes - Performance issue due to the SET2A-C fixes has solved but encountered new one. Performance issue when first chart-field is same for most of the row criteria in the same layout.
Issue was introduced on builds 8.55.22 & 8.56.07.
Condition: When most of the filter criteria’s first chart-field is same. The issue is solved in 8.55.25 & 8.56.10."

In summary

  • Bug introduced in PeopleTools 8.55.19, fully resolved in 8.55.25.
  • Bug introduced in PeopleTools 8.56.07, fully resolved in 8.56.10.

Consumer Group Mapping Rules Use Pattern Matching from 12.1

I recently noticed a small, but I think significant, change in the way consumer group mapping rules behave from Oracle 11.2.04.  Session attributes can be matched to resource groups using LIKE expressions and simple regular expressions specified in the matching rules, though only for certain attributes.
(Updated 12.11.2019) I am grateful to Mikhail Velikikh for his comment.  It depends on which version of Oracle's documentation for 11.2 you read.  Pattern matching does work in 11.2.0.4 for the attributes listed in the 12.1 documentation. My testing indicates that pattern matching does not happen in 11.2.0.3.
You cannot pattern match the SERVICE_NAME in 11.2.  The attribute value is validated against the list of valid services.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">exec DBMS_RESOURCE_MANAGER.SET_CONSUMER_GROUP_MAPPING(DBMS_RESOURCE_MANAGER.SERVICE_NAME, 'WIBBLE', 'MYGROUP1');
ORA-00042: Unknown Service name WIBBLE

However, you can pattern match SERVICE_NAME in 12.1, although SERVICE_NAME is not in the list of attributes for which the documentation says pattern matching is available.  This may be documentation bug (see Oracle support note 1992704.1).
The parameters to procedure DBMS_RESOURCE_MANAGER.SET_CONSUMER_GROUP_MAPPING are:

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">DBMS_RESOURCE_MANAGER.SET_CONSUMER_GROUP_MAPPING(
attribute IN VARCHAR2,
value IN VARCHAR2,
consumer_group IN VARCHAR2 DEFAULT NULL);

11.2 Documentation (c)2010

You use the SET_CONSUMER_GROUP_MAPPING procedure to map a session attribute/value pair to a consumer group. The parameters for this procedure are the following:

Parameter
Description
attribute The session attribute type, specified as a package constant
value The value of the attribute
consumer group The consumer group to map to for this attribute/value pair


11.2 Documentation (c)2015

This procedure adds, deletes, or modifies entries that map sessions to consumer groups, based on the session's login and runtime attributes.

Parameter
Description
attribute Mapping attribute to add or modify. It can be one of the Constants listed.
value Attribute value to match. This includes both absolute mapping and regular expressions.
consumer group Name of the mapped consumer group, or NULL to delete a mapping


Usage Notes 
If no mapping exists for the given attribute and value, a mapping to the given consumer group will be created. If a mapping already exists for the given attribute and value, the mapped consumer group will be updated to the one given. If the consumer_group argument is NULL, then any mapping from the given attribute and value will be deleted.
The subprogram supports simple regex expressions for the value parameter. It implements the same semantics as the SQL 'LIKE' operator. Specifically, it uses '%' as a multi-character wildcard and '_' as a single character wildcard. The '\' character can be used to escape the wildcards. Note that wildcards can only be used if the attribute is one of the following:

  • CLIENT_OS_USER
  • CLIENT_PROGRAM*
  • CLIENT_MACHINE
  • MODULE_NAME
  • MODULE_NAME_ACTION
  • SERVICE_MODULE
  • SERVICE_MODULE_ACTION 

*Consumer group mapping comparisons for DBMS_RESOURCE_MANAGER.CLIENT_PROGRAM are performed by stripping the @ sign and following characters from V$SESSION.PROGRAM before comparing it to the CLIENT_PROGRAM value supplied.

Autonomous Database Example

The mappings are reported in the view DBA_RSRC_GROUP_MAPPINGS. An example of using a LIKE expression in a mapping can be found in the standard configuration of autonomous cloud databases. There are 5 standard mappings for 5 services to 5 consumer groups. The automatically created service names are prefixed with the database name, then HIGH, LOW, MEDIUM, TP or TPURGENT., and are suffixed .atp.oraclecloud.com.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT * FROM v$services ORDER BY 1;

NAME
----------------------------------------------------------------
KZPKSBZK3RK4G1X_GOFASTER1A_high.atp.oraclecloud.com
KZPKSBZK3RK4G1X_GOFASTER1A_low.atp.oraclecloud.com
KZPKSBZK3RK4G1X_GOFASTER1A_medium.atp.oraclecloud.com
KZPKSBZK3RK4G1X_GOFASTER1A_tp.atp.oraclecloud.com
KZPKSBZK3RK4G1X_GOFASTER1A_tpurgent.atp.oraclecloud.com
kzpksbzk3rk4g1x_gofaster1a

However, the mappings are defined using pattern matching rather than referencing the explicit name of the service, thus exactly the same mappings can be defined on every autonomous database.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">select * from DBA_RSRC_GROUP_MAPPINGS order by 1,2;

Consumer
ATTRIBUTE VALUE Group STATUS
------------ --------------- ---------- ----------
ORACLE_USER GGADMIN LOW
SERVICE_NAME %\_HIGH.% HIGH
SERVICE_NAME %\_LOW.% LOW
SERVICE_NAME %\_MEDIUM.% MEDIUM
SERVICE_NAME %\_TP.% TP
SERVICE_NAME %\_TPURGENT.% TPURGENT

Note:

  • By default, the ORACLE_USER attribute's priority 6 takes precedence over SERVICE_NAME whose priority is 7. Therefore, GGADMIN, the GoldenGate user, always uses the LOW consumer group irrespective of the service that it connects to. The default priorities can be changed.
  • Wildcards cannot be used for the ORACLE_USER attribute.

Video : Oracle REST Data Services (ORDS) : Database Authentication

Today’s video is a run through the Database Authentication functionality in Oracle REST Data Services (ORDS).

As always, this is based on an article on the same subject.

There are better methods of authentication and authorization available from ORDS, which you can read about here.

The star of today’s video is Stew Ashton, who is barely audible over the noise of the bar. </p />
</p></div>

    	  	<div class=

Nested Tables

This note is a little side effect of answering a question about the total space associated with a table, including its indexes, LOB columns and nested tables. The first suggestion I published failed to take account of the fact that nested tables can contain their own nested tables so I had to modify the code.

The basic change was easy – in the original code I had joined to the view dba_nested_tables to translate between a table_name and its parent_table_name. The only change I needed to make was to replace the reference to the view with a different view that joined a table_name to its “oldest ancestor”. To achieve this I had to do two things: first, create a table with multiple levels of nesting, then create a suitable view definition.  For reference – because it may help somebody – I’ve published the two stages here.

A revolting nested table:

The following code creates three table types, but the second table type


rem
rem     Script:         nested_tables.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2019
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create or replace type jpl_item3 as object (v1 varchar2(10), n3 number);
/

create or replace type jpl_tab3_type as table of jpl_item3;
/

create or replace type jpl_item2 as object(v2 varchar2(10), jpl3 jpl_tab3_type);
/

create or replace type jpl_tab2_type as table of jpl_item2;
/

create or replace type jpl_item1 as object(v3 varchar2(10), jpl2 jpl_tab2_type);
/

create or replace type jpl_tab1_type as table of jpl_item1;
/

create table demo_nest_2 (
        id      number  constraint d2_pk primary key,
        jpl1    jpl_tab1_type
)
segment creation immediate
nested table jpl1 store as t_jpl1
        (
        nested table jpl2  store as t_jpl2
                (
                nested table jpl3 store as t_jpl3 
                return as locator
                )
        return as locator
        )
return as locator
/

I’ve never seen nested tables manually created in a production system though I believe they are likely to appear (along with varrays and LOBs) as a side effect of some XML or JSON mechanisms, but many years ago (in Practical Oracle 8i, possibly) I discussed the pros and cons of returning them “by value” or (as I have here) “by reference”. As you can see, you need to exercise some care with brackets and locating the text as you try to refine multiple levels of nesting.

Tne Ancestor View

We’ll take this in three steps – first, report from user_nested_tables. (The final script for reporting space used dba_nested_tables with a predicate available on the owner column, but I don’t give myself DBA privileges while writing quick and dirty bits of SQL.).


select 
        parent_table_name, parent_table_column, table_name, 
        ltrim(storage_spec) storage_spec,       -- the DBA view definition includes lpad( , 30) !!
        ltrim(return_type ) return_type         -- the DBA view definition includes lpad( , 20) !!
from 
        user_nested_tables
order by
        parent_table_name, parent_table_column
/

PARENT_TABLE_NAME    PARENT_TABLE_COLUMN  TABLE_NAME           STORAGE_SPEC         RETURN_TYPE
-------------------- -------------------- -------------------- -------------------- --------------------
DEMO_NEST_2          JPL1                 T_JPL1               USER_SPECIFIED       LOCATOR
T_JPL1               JPL2                 T_JPL2               USER_SPECIFIED       LOCATOR
T_JPL2               JPL3                 T_JPL3               DEFAULT              LOCATOR

You’ll notice the odd ltrim() – I have no idea why the view defines these columns to be left-padded the way they are, possibly it dates back to the time when something like cast(… as vachar2(30)) wasn’t a possible option.

Next a simple “connect by” query what uses the above list in a materialize “with subquery” (CTE):


with my_nested_tables as (
select
        /*+ materialize */
        parent_table_name, table_name
from
        user_nested_tables
)
select
        parent_table, child_table, pathname
from    (
        select
                level,
                sys_connect_by_path(table_name, '/')    pathname,
                connect_by_root parent_table_name parent_table,
                table_name child_table
        from
                my_nested_tables
        connect by
                parent_table_name = prior table_name
        )
order by
        parent_table, child_table, pathname
/

PARENT_TABLE         CHILD_TABLE          PATHNAME
-------------------- -------------------- ----------------------------------------
DEMO_NEST_2          T_JPL1               /T_JPL1
DEMO_NEST_2          T_JPL2               /T_JPL1/T_JPL2
DEMO_NEST_2          T_JPL3               /T_JPL1/T_JPL2/T_JPL3
T_JPL1               T_JPL2               /T_JPL2
T_JPL1               T_JPL3               /T_JPL2/T_JPL3
T_JPL2               T_JPL3               /T_JPL3


As required this shows me demo_nest_2 as the owning ancestor of t_jpl1, t_jpl2 and t_jpl3. Unfortunately it has also produced three rows that we don’t want to see in our final space-summing code. But it’s easy enough to get rid of those – the only rows we want are the rows with a parent_table that doesn’t appear as a child_table:


with my_nested_tables as (
select
        /*+ materialize */
        parent_table_name, table_name
from
        user_nested_tables
)
select  parent_table, child_table, pathname
from    (
        select
                level,
                sys_connect_by_path(table_name, '/')    pathname,
                connect_by_root parent_table_name parent_table,
                table_name child_table
        from
                my_nested_tables
        connect by
                parent_table_name = prior table_name
        )
where   (parent_table) not in (
                select table_name
                from   my_nested_tables
        )
order by
        parent_table, child_table, pathname
/

PARENT_TABLE         CHILD_TABLE          PATHNAME
-------------------- -------------------- ----------------------------------------
DEMO_NEST_2          T_JPL1               /T_JPL1
DEMO_NEST_2          T_JPL2               /T_JPL1/T_JPL2
DEMO_NEST_2          T_JPL3               /T_JPL1/T_JPL2/T_JPL3

3 rows selected.


And there’s the result we wanted to see. A first simple corroboration of the fact is that the output corresponds with the “NESTED TABLE” segments reported by user_segments.

Of course, having written a query that gets the right result from a table definition we used to help us define the query in the first place we now ought to create a few more tables with different structures to see if the query continues to give the right results in more complex cases.

What happens, for example, if I create a table with two columns of nested tables, and one of the nested table typed also contained two nested tables ? What happens if the base table is an index organized table ?

It’s easy to do the second test – just add “organization index” immediately after “segment creation immediate” in the table creation statement. The correct results drop out.

As for the first test – here’s the SQL to create one example (and the query still gets the right answers – even if you change the table to be index organized).


drop type jpl_tab1_type;
drop type jpl_item1;

drop type jpl_tab2_type;
drop type jpl_item2;

drop type jpl_tab3_type;
drop type jpl_item3;

purge recyclebin;

create or replace type jpl_item3 as object (v1 varchar2(10), n3 number);
/

create or replace type jpl_tab3_type as table of jpl_item3;
/

create or replace type jpl_item2 as object(v2 varchar2(10), jpl3 jpl_tab3_type, jpl3x jpl_tab3_type);
/

create or replace type jpl_tab2_type as table of jpl_item2;
/

create or replace type jpl_item1 as object(v3 varchar2(10), jpl2 jpl_tab2_type)
/

create or replace type jpl_tab1_type as table of jpl_item1;
/

create table demo_nest_3 (
        id      number  constraint d2_pk primary key,
        jpl1    jpl_tab1_type,
        jpl1a   jpl_tab1_type
)
segment creation immediate
-- organization index
nested table jpl1 store as t_jpl1
        (
        nested table jpl2  store as t_jpl2
                (
                nested table jpl3 store as t_jpl3 
                return as locator
                nested table jpl3x store as t_jpl3x 
                return as locator
                )
        return as locator
        )
return as locator
nested table jpl1a store as t_jpl1a
        (
        nested table jpl2  store as t_jpl2a
                (
                nested table jpl3 store as t_jpl3a
                return as locator
                nested table jpl3x store as t_jpl3xa
                return as locator
                )
        return as locator
        )
return as locator
/

All that remains now is to modify the code to use the equivalent DBA views, with the addition of the owner column, then slot the resulting code into the original query in place of the simple references to dba_nested_tables. If you go to the original posting you’ll see that I’ve done this by wrapping the code into a CTE so that the only changes to the rest of the code appear as two (flagged) changes where the CTE is then used.

PostgreSQL Invalid Page and Checksum Verification Failed

At the Seattle PostgreSQL User Group meetup this past Tuesday, we got onto the topic of invalid pages in PostgreSQL. It was a fun discussion and it made me realize that it’d be worth writing down a bunch of the stuff we talked about – it might be interesting to a few more people too!

Invalid Page In Block

You see an error message that looks like this:

ERROR: invalid page in block 1226710 of relation base/16750/27244

First and foremost – what does this error mean? I like to think of PostgreSQL as having a fairly strong “boundary” between (1) the database itself and (2) the operating system [and by extension everything else… firmware, disks, network, remote storage, etc]. PostgreSQL maintains page validity primarily on the way in and out of its buffer cache.

https://ardentperf.files.wordpress.com/2019/11/pg-page-validity.png?w=150 150w, https://ardentperf.files.wordpress.com/2019/11/pg-page-validity.png?w=300 300w, https://ardentperf.files.wordpress.com/2019/11/pg-page-validity.png?w=768 768w, https://ardentperf.files.wordpress.com/2019/11/pg-page-validity.png 1629w" sizes="(max-width: 1024px) 100vw, 1024px" />

What does this mean in practice? Suppose there’s a physical memory failure and somehow the ECC parity is unable to detect it. This means that a little bit of physical memory on the server now has incorrect garbage and the correct data from that memory is lost.

  • If the garbage bytes map to part of the kernel page cache, then when PostgreSQL tries to copy the page into it’s buffer cache then it will (if possible) detect that something is wrong, refuse to poison its buffer cache with this invalid 8k page, and error out any queries that require this page for processing with the ERROR message above.
  • If the garbage bytes map to part of PostgreSQL’s database buffer cache, then PostgreSQL will quite happily assume nothing is wrong and attempt to process the data on the page. Results are unpredictable; probably all sorts of ERROR messages, crashes and failure modes could result – or maybe even incorrect data returned with no ERROR message at all. (Note that this is probably the case for nearly all software… and also note that ECC is pretty good.)

How PostgreSQL Checks Page Validity

PostgreSQL has two main “validity checks” that it performs on pages. You can read the code in the function PageIsVerified() but I’ll summarize here. You can tell from your error message which validity check failed. It depends on whether you see a second additional WARNING right before the ERROR. The warning would look like this:

WARNING: page verification failed, calculated checksum 3482 but expected 32232
  1. If the above warning is not present, this means the page header failed a basic sanity check. This could conceivably be caused by both problems inside and outside of PostgreSQL.
  2. If you see the above warning (page verification failed), this means the checksum recorded in the block did not match the checksum calculated for the block. This most likely indicates that there was a problem outside of (underneath) the database – operating system, memory, networking, storage, etc.

About Basic Page Header Sanity Checks

As of when I’m writing this article in 2019, the following basic sanity checks are performed on the page header:

  • There are 32 bits reserved for page flag bits; at present only three are used and the other 29 bits should always be zero/off.
  • Every page is divided into four parts (header, free space, tuples, special space).  Offsets for the divisions are stored as 16-bit numbers in the page header; the offsets should go in order and should not have a value pointing off the page.
  • The offset of the special space should always be aligned.

About PostgreSQL Checksums

PostgreSQL version 9.3 (released in 2013) added the ability to calculate a checksum on data pages and store the checksum in the page. There are two inputs to the checksum: (1) every single byte of the data page, with zeros in the four bytes where the checksum will be stored later and (2) the page offset/address. This means that PostgreSQL doesn’t just detect if a byte is changed in the page – it also detects if a perfectly valid page gets somehow dropped into the wrong place.

Checksums are not maintained for blocks while they are in the shared buffers – so if you look at a buffer in the PostgreSQL page cache with pageinspect and you see a checksum value, note that it’s probably just leftover from the last read but wouldn’t have been maintained when the page was changed. The checksum is calculated and stamped onto the page when the page is written out of the buffer cache into the operating system page cache (remember the pages get flushed to disk later by a separate fsync call).

The checksum algorithm is specifically designed to take advantage of SIMD instructions. The slowest part of the algorithm is multiplication, so if possible PostgreSQL will be compiled to perform 32 multiplications at a time in parallel. In 2013 there were two platforms specifically documented to support this: x86 SSE4.1 and ARM NEON. The implementation is specifically tuned for optimal performance on x86 SSE. This is an important point actually – particularly for folks running PostgreSQL on embedded devices – PostgreSQL certainly compiles and works well on a lot of platforms, but evidently the checksum implementation is explicitly optimized to run the fastest on Intel. (To be clear… I think everyone should use checksums on every platform unless you have a really really good reason not to – just don’t be surprised if you start comparing benchmarks with Intel and you see a difference!)

For folks interested in digging a little more into the history… here’s the original commit using a CRC algorithm which never shipped in an actual PostgreSQL release (Simon Riggs, Jeff Davis and Greg Smith), here’s the subsequent commit introducing the FNV-1a algorithm instead of CRC which is what we still use today (Ants Aasma), and here’s the commit doing the major refactor which moved the algorithm into a header file for easier external use (Tom Lane).

More Ways To Check Validity

At the SEAPUG meetup this led to a discussion about checking validity. Do checksums cover everything? (hint: no) Are there more ways we can validate our databases? (hint: yes)

I haven’t yet made a full list of which files are covered by checksums and which ones aren’t, but I know that not everything is. For example: I’m pretty sure that the visiblity map and SLRU files aren’t covered with checksums. But for what it’s worth, there are two extra tools we can use verification in PostgreSQL databases:

  • The amcheck extension can scan a B-Tree index for a number of logical problems – for example, verifying that all B-Tree pages have items in “logical” order. (This could be useful, for example, if you’re not specifying ICU collation and you recently upgraded your operating system collation libraries… since PostgreSQL uses OS collation by default.)
  • The pg_visibility_map extension includes two functions to check for corruption in the visibility map – pg_check_frozen() and pg_check_visible().

Responding To Checksum Failures

Finally, what if you actually run into a checksum failure? What should you do, and are there any additional tools you should know about?

First of all – on any database – there are a few things you should always do immediately when you see something indicating that a data corruption may have occurred:

  • Verify that your backup retention and your log file retention are sufficiently long – I recommend at least a month (this is a Happiness Hint). You never know how long the investigation might take, or how long ago something important might have happened.
  • Take a moment to articulate and write out the impact to the business. (Are important queries failing right now? Is this causing an application outage?) This seems small but it can be very useful in making decisions later. Don’t exaggerate the impact but don’t understate it either. It can also be helpful to note important timelines that you already know about. For example: management is willing to use yesterday’s backup and take a day of data loss to avoid an 12 hour outage, but not to avoid a 4 hour outage …or… management needs a status update at 11:00am Eastern Time.
  • If there’s a larger team working on the system in question, communicate a freeze on changes until things are resolved.
  • Make a list or inventory of all copies of the data. Backups, physical replicas or hot standbys, logical replicas, copies, etc. Sometimes the very process of making this list can immediately give you ideas for simple solutions (like checking if a hot standby has the block intact). The next thing you probably want to do is check all items in this list to see if they have a valid copy of the data in question. Do not take any actions to remediate the problem right away, collect all of the information first. The data you collect now might useful during RCA investigation after you’re back online.
  • If there was one checksum failure, then you probably want to make sure there aren’t more.
    • If it’s a small database, consider whether you can scan the whole thing and verify the checksum on every single block.
    • If it’s a large database, consider whether you can at least scan all of the system/catalog tables and perhaps scan the tables which are throwing errors in their entirety. (PostgreSQL stops on the first error, so there isn’t an easy way to check if other blocks in the same table also have checksum problems.)
  • A few general best practices… (1) have a second person glance at your screen before you execute any actual changes, (2) never delete anything but always rename/move instead, (3) when investigating individual blocks, also look at the block before and the block after to verify they look like what you’d normally expect, (4) test the remediation plan before running it in production, and (5) document everything. If you’ve never seen Paul Vallée’s FIT-ACER checklist then it’s worth reviewing.

There’s no single prescriptive process for diagnosing the scope of the problem and finding the right path forward for you. It involves learning what you need to know about PostgreSQL, a little creative thinking about possible resolutions, and balancing the needs of the business (for example, how long can you be down and how much data loss can you tolerate).

That being said, there are a few tools I know about which can be very useful in the process. (And there are probably more that I don’t know about; please let me know and I can add them to this list!)


Data investigation:

Unix/Linux Commands
You might be surprised at what you can do with the tools already installed on your operating system. I’ve never been on a unix system that didn’t have dd and od installed, and I find that many Linux systems have hexdump and md5sum installed as well. A few examples of how useful these tools are: dd can extract the individual block with invalid data on the primary server and extract the same block on the hot standby, then od/hexdump can be used to create a human-readable textual dump of the binary data. You can even use diff to find the differences between the blocks. If you have a standby cluster with storage-level replication then you could use md5sum to see at a glance if the blocks match. (Quick word of caution on comparing hot standbys: last I checked, PostgreSQL doesn’t seem to maintain the free space identically on hot standbys, so the checksums might differ on perfectly healthy blocks. You can still look at the diff and verify whether free space is the only difference.) Drawbacks: low-level utilities can only do binary dumps but cannot interpret the data. Also, utilities like dd are “sharp knives” – powerful tools which can cause damage if misused!

For a great example of using dd and od, read the code in Bertrand Drouvot‘s pg_toolkit script collection.
Data investigation and checksum verification:

pg_filedump
This is a crazy awesome utility and I have no idea why it’s not in core PostgreSQL. It makes an easy-to-read textual dump of the binary contents of PostgreSQL data blocks. You can process a whole file or just specify a range of blocks to dump. It can verify checksums and it can even decode the contents of the tuples. As far as I can tell, it was originally written by Patrick Macdonald at Red Hat some time in the 2000’s and then turned over to the PostgreSQL Global Development Group for stewardship around 2011. It’s now in a dedicated repository at git.postgresql.org and it seems that Tom Lane, Christoph Berg and Teodor Sigaev keep it alive but don’t invest heavily in it. Drawbacks: be aware that it doesn’t address the race condition with a running server (see Credativ pg_checksums below). For dumping only a block with a checksum problem, this is not an issue since the server won’t let the block into its buffer cache anyway.
Checksum verification:

PostgreSQL pg_checksums
PostgreSQL itself starting in version 11 has a command-line utility to scan one relation or everything and verify the checksum on every single block. It’s called pg_verify_checksums in v11 and pg_checksums in v12. Drawbacks: first, this utility requires you to shut down the database before it will run. It will throw an error and refuse to run if the database is up. Second, you can scan a single relation but you can’t say which database it’s in… so if the OID exists in multiple databases, there’s no way to just scan the one you care about.
Checksum verification:

Credativ pg_checksums
The fine engineers of Credativ have published an enhanced version of pg_checksums which can verify checksums on a running database. It looks to me like the main case they needed to protect against was the race condition between pg_checksum reading blocks while the running PostgreSQL server was writing those same blocks. Linux of course work on a 4k page size; so if an 8k database page is half written when pg_checksum reads it then we will get a false positive. The version from credativ however is smart enough to deal with this. Drawbacks: check the github issues; there are a couple notable drawbacks but this project was only announced last week and all the drawbacks might be addressed by the time you read this article. Also, being based on the utility in PostgreSQL, the same limitation about scanning a single relation applies.

Note that both Credativ’s and PostgreSQL’s pg_checksums utilities access the control file, even when just verifying checksums. As a result, you need to make sure you compile against the same version of PostgreSQL code as the target database you’re scanning.
Checksum verification:

Satoshi Nagayasu postgres-toolkit
I’m not sure if this is still being maintained, but Satoshi Nagayasu wrote postgres-toolkit quite a long time ago which includes a checksum verification utility. It’s the oldest one I have seen so far – and it still compiles and works! (Though if you want to compile it on PostgreSQL 11 or newer then you need to use the patch in this pull request.) Satoshi’s utility also has the very useful capability of scanning an arbitrary file that you pass in – like pg_filedump but stripped down to just do the checksum verification. It’s clever enough to infer the segment number from the filename and scan the file, even if the file isn’t part of a PostgreSQL installation. This would be useful, for example, if you were on a backup server and wanted to extract a single file from your backup and check if the damaged block has valid checksum in the backup. Drawbacks: be aware that it doesn’t address the race condition with a running server.
Checksum verification:

Google pg_page_verification
Simple program; you pass in a data directory and it will scan every file in the data directory to verify the checksums on all blocks. Published to Github in early 2018. Drawbacks: be aware that it doesn’t address the race condition with a running server. Probably superseded by the built-in PostgreSQL utilities.
Mitigation:

PostgreSQL Developer Options
PostgreSQL has hundreds of “parameters” – knobs and button you can use to configure how it runs. There are 294 entries in the pg_settings table on version 11. Buried in these parameters are a handful of “Developer Options” providing powerful (and dangerous) tools for mitigating data problems – such as ignore_checksum_failure, zero_damaged_pages and ignore_system_indexes. Read very carefully and exercise great care with these options – when not fully understood, they can have unexpected side effects including unintended data loss. Exercise particular care with the ignore_checksum_failure option – even if you set that in an individual session, the page will be readable to all connections… think of it as poisoning the buffer cache. That being said, sometimes an option like zero_damaged_pages is the fastest way to get back up and running. (Just make sure you’ve saved a copy of that block!) By the way… a trick to trigger a read of one specific block is to SELECT * FROM table WHERE ctid='(blockno,1)'
Mitigation:

Unix/Linux Commands
I would discourage the use of dd to mitigate invalid data problems. It’s dangerous even for experienced engineers; simple mistakes can compound the problem. I can’t imagine a situation where this is a better approach than the zero_damaged_pages developer option and a query to read a specific block. That said, I have seen cases where dd was used to zero out a page.

More About Data Investigation

In order to put some of this together, I’ll just do a quick example session. I’m running PostgreSQL 11.5 an on EC2 instance and I used dd to write a few evil bytes into a couple blocks of my database.

First, lets start by just capturing the information from the log files:

$ grep "invalid page" ../log/postgresql.log|sed 's/UTC.*ERROR//'
 2019-10-15 19:53:37 :  invalid page in block 0 of relation base/16385/16493
 2019-10-16 22:26:30 :  invalid page in block 394216 of relation base/16385/16502
 2019-10-16 22:43:24 :  invalid page in block 394216 of relation base/16385/16502
 2019-11-05 23:55:33 :  invalid page in block 394216 of relation base/16385/16502
 2019-11-05 23:57:58 :  invalid page in block 394216 of relation base/16385/16502
 2019-11-05 23:59:14 :  invalid page in block 262644 of relation base/16385/16502
 2019-11-05 23:59:21 :  invalid page in block 262644 of relation base/16385/16502
 2019-11-05 23:59:22 :  invalid page in block 262644 of relation base/16385/16502
 2019-11-05 23:59:23 :  invalid page in block 262644 of relation base/16385/16502
 2019-11-06 00:01:12 :  invalid page in block 262644 of relation base/16385/16502
 2019-11-06 00:01:16 :  invalid page in block 0 of relation base/16385/16493
 2019-11-06 00:02:05 :  invalid page in block 250 of relation base/16385/16492

With a little command-line karate we can list each distinct block and see the first time we got an error on that block:

$ grep "invalid page" ../log/postgresql.log |
sed 's/UTC.*ERROR//' |
awk '{print $1" "$2" "$11" invalid_8k_block "$8" segment "int($8/131072)" offset "($8%131072)}' |
sort -k3,5 -k1,2 |
uniq -f2
   2019-11-06 00:02:05 base/16385/16492 invalid_8k_block 250 segment 0 offset 250
   2019-10-15 19:53:37 base/16385/16493 invalid_8k_block 0 segment 0 offset 0
   2019-11-05 23:59:14 base/16385/16502 invalid_8k_block 262644 segment 2 offset 500
   2019-10-16 22:26:30 base/16385/16502 invalid_8k_block 394216 segment 3 offset 1000 

So we know that there are at least 4 blocks corrupt. Lets scan the whole data directory using Credativ’s pg_checksum (without shutting down the database) to see if there are any more blocks with bad checksums:

$ pg_checksums -D /var/lib/pgsql/11.5/data |& fold -s
 pg_checksums: error: checksum verification failed in file
 "/var/lib/pgsql/11.5/data/base/16385/16502.3", block 1000: calculated checksum
 2ED4 but block contains 4EDF
 pg_checksums: error: checksum verification failed in file
 "/var/lib/pgsql/11.5/data/base/16385/16502.3", block 1010: calculated checksum
 9ECF but block contains ACBE
 pg_checksums: error: checksum verification failed in file
 "/var/lib/pgsql/11.5/data/base/16385/16502.2", block 500: calculated checksum
 5D6 but block contains E459
 pg_checksums: error: checksum verification failed in file
 "/var/lib/pgsql/11.5/data/base/16385/16493", block 0: calculated checksum E7E4
 but block contains 78F9
 pg_checksums: error: checksum verification failed in file
 "/var/lib/pgsql/11.5/data/base/16385/16492", block 250: calculated checksum
 44BA but block contains 3ABA
 Checksum operation completed
 Files scanned:  1551
 Blocks scanned: 624158
 Bad checksums:  5
 Data checksum version: 1

Ah-ha… there was one more bad checksum which didn’t show up in the logs! Next lets choose one of the bad blocks and dump the contents using unix command line tools.

$ dd status=none if=base/16385/16492 bs=8192 count=1 skip=250 | od -A d -t x1z -w16
0000000 00 00 00 00 e0 df 6b b0 ba 3a 04 00 0c 01 80 01  >......k..:......<
0000016 00 20 04 20 00 00 00 00 80 9f f2 00 00 9f f2 00  >. . ............<
0000032 80 9e f2 00 00 9e f2 00 80 9d f2 00 00 9d f2 00  >................<
0000048 80 9c f2 00 00 9c f2 00 80 9b f2 00 00 9b f2 00  >................<
0000064 80 9a f2 00 00 9a f2 00 80 99 f2 00 00 99 f2 00  >................<
0000080 80 98 f2 00 00 98 f2 00 80 97 f2 00 00 97 f2 00  >................<

Here we see the page header and the beginning of the line pointers. One thing I think it’s easy to remember is that the first 8 bytes are the page LSN and the next 2 bytes are the page checksum. Notice that the page checksum bytes contain “ba 3a” which matches the error message from the scan above (3ABA). Sometimes it can be useful to know just the very top of the page even if you don’t remember anything else!

This is useful, but lets try the pg_filedump utility next. This utility takes a lot of options. In this example I’m going to ask it to verify the checksum (-k), only scan one block at offset 250 (-R 250 250) and even to decode the tuples (table row data) to a human-readable format (-D int,int,int,charN). There’s another argument (-f) that can even tell pg_filedump to show hexdump/od style raw data inline but I won’t demonstrate that here.

$ pg_filedump -k -R 250 250 -D int,int,int,charN base/16385/16492

*******************************************************************
* PostgreSQL File/Block Formatted Dump Utility - Version 11.0
*
* File: base/16385/16492
* Options used: -k -R 250 250 -D int,int,int,charN
*
* Dump created on: Fri Nov  8 21:48:38 2019
*******************************************************************

Block  250 ********************************************************
----- Block Offset: 0x001f4000 Offsets: Lower 268 (0x010c) Block: Size 8192 Version 4 Upper 384 (0x0180) LSN: logid 0 recoff 0xb06bdfe0 Special 8192 (0x2000) Items: 61 Free Space: 116 Checksum: 0x3aba Prune XID: 0x00000000 Flags: 0x0004 (ALL_VISIBLE) Length (including item array): 268 Error: checksum failure: calculated 0x44ba. ------ Item 1 -- Length: 121 Offset: 8064 (0x1f80) Flags: NORMAL COPY: 15251 1 0 Item 2 -- Length: 121 Offset: 7936 (0x1f00) Flags: NORMAL COPY: 15252 1 0 Item 3 -- Length: 121 Offset: 7808 (0x1e80) Flags: NORMAL COPY: 15253 1 0

That was the block header and the first few item. (Item pointer data is displayed first, then the table row data itself is displayed on the following line after the word COPY.) Looking down a little bit, we can even see where I wrote the bytes “budstuff” into a random location in this block – it turns out those bytes landed in the middle of a character field. This means that without checksums, PostgreSQL would not have thrown any errors at all but just returned an incorrect string the next time that row was queried!

COPY: 15297	1	0
 Item  48 -- Length:  121  Offset: 2048 (0x0800)  Flags: NORMAL
COPY: 15298	1	0
 Item  49 -- Length:  121  Offset: 1920 (0x0780)  Flags: NORMAL
COPY: 15299	1	0	                                           badstuff
 Item  50 -- Length:  121  Offset: 1792 (0x0700)  Flags: NORMAL
COPY: 15300	1	0

It’s immediately clear how useful this is (and easier to read ). The part where it decodes the table row data into a human readable form is an especially cool trick. Two notes about this.

  • First, the lines are prefixed with the word COPY for a reason – they are actually intended to be formatted so you can grep on the word COPY and then use the “copy” command (or it’s psql cousin) to feed the data directly back into a staging table in the database for cleanup. How cool is that!
  • Second, it can decode only a set of fairly standard data types and relation types.
$ pg_filedump -h

Version 11.0 (for PostgreSQL 11.x)
Copyright (c) 2002-2010 Red Hat, Inc.
Copyright (c) 2011-2018, PostgreSQL Global Development Group

Usage: pg_filedump [-abcdfhikxy] [-R startblock [endblock]] 
          [-D attrlist] [-S blocksize] [-s segsize] [-n segnumber] file

Display formatted contents of a PostgreSQL heap/index/control file
Defaults are: relative addressing, range of the entire file, block
               size as listed on block 0 in the file

The following options are valid for heap and index files:
...
...
...
  -D  Decode tuples using given comma separated list of types
      Supported types:
        bigint bigserial bool char charN date float float4 float8 int
        json macaddr name oid real serial smallint smallserial text
        time timestamp timetz uuid varchar varcharN xid xml
      ~ ignores all attributes left in a tuple

Now you see what happens when I start having fun… a quick update about our SEAPUG meetup this past Tuesday turned into a blog article that’s way too long. :) Hope it’s useful, and as always let me know what I can improve!

PostgreSQL Invalid Page and Checksum Verification Failed

At the Seattle PostgreSQL User Group meetup this past Tuesday, we got onto the topic of invalid pages in PostgreSQL. It was a fun discussion and it made me realize that it’d be worth writing down a bunch of the stuff we talked about – it might be interesting to a few more people too!

Invalid Page In Block

You see an error message that looks like this:

ERROR: invalid page in block 1226710 of relation base/16750/27244

First and foremost – what does this error mean? I like to think of PostgreSQL as having a fairly strong “boundary” between (1) the database itself and (2) the operating system [and by extension everything else… firmware, disks, network, remote storage, etc]. PostgreSQL maintains page validity primarily on the way in and out of its buffer cache.

https://ardentperf.files.wordpress.com/2019/11/pg-page-validity.png?w=150 150w, https://ardentperf.files.wordpress.com/2019/11/pg-page-validity.png?w=300 300w, https://ardentperf.files.wordpress.com/2019/11/pg-page-validity.png?w=768 768w, https://ardentperf.files.wordpress.com/2019/11/pg-page-validity.png 1629w" sizes="(max-width: 1024px) 100vw, 1024px" />

What does this mean in practice? Suppose there’s a physical memory failure and somehow the ECC parity is unable to detect it. This means that a little bit of physical memory on the server now has incorrect garbage and the correct data from that memory is lost.

  • If the garbage bytes map to part of the kernel page cache, then when PostgreSQL tries to copy the page into it’s buffer cache then it will (if possible) detect that something is wrong, refuse to poison its buffer cache with this invalid 8k page, and error out any queries that require this page for processing with the ERROR message above.
  • If the garbage bytes map to part of PostgreSQL’s database buffer cache, then PostgreSQL will quite happily assume nothing is wrong and attempt to process the data on the page. Results are unpredictable; probably all sorts of ERROR messages, crashes and failure modes could result – or maybe even incorrect data returned with no ERROR message at all. (Note that this is probably the case for nearly all software… and also note that ECC is pretty good.)

How PostgreSQL Checks Page Validity

PostgreSQL has two main “validity checks” that it performs on pages. You can read the code in the function PageIsVerified() but I’ll summarize here. You can tell from your error message which validity check failed. It depends on whether you see a second additional WARNING right before the ERROR. The warning would look like this:

WARNING: page verification failed, calculated checksum 3482 but expected 32232
  1. If the above warning is not present, this means the page header failed a basic sanity check. This could conceivably be caused by both problems inside and outside of PostgreSQL.
  2. If you see the above warning (page verification failed), this means the checksum recorded in the block did not match the checksum calculated for the block. This most likely indicates that there was a problem outside of (underneath) the database – operating system, memory, networking, storage, etc.

About Basic Page Header Sanity Checks

As of when I’m writing this article in 2019, the following basic sanity checks are performed on the page header:

  • There are 32 bits reserved for page flag bits; at present only three are used and the other 29 bits should always be zero/off.
  • Every page is divided into four parts (header, free space, tuples, special space).  Offsets for the divisions are stored as 16-bit numbers in the page header; the offsets should go in order and should not have a value pointing off the page.
  • The offset of the special space should always be aligned.

About PostgreSQL Checksums

PostgreSQL version 9.3 (released in 2013) added the ability to calculate a checksum on data pages and store the checksum in the page. There are two inputs to the checksum: (1) every single byte of the data page, with zeros in the four bytes where the checksum will be stored later and (2) the page offset/address. This means that PostgreSQL doesn’t just detect if a byte is changed in the page – it also detects if a perfectly valid page gets somehow dropped into the wrong place.

Checksums are not maintained for blocks while they are in the shared buffers – so if you look at a buffer in the PostgreSQL page cache with pageinspect and you see a checksum value, note that it’s probably just leftover from the last read but wouldn’t have been maintained when the page was changed. The checksum is calculated and stamped onto the page when the page is written out of the buffer cache into the operating system page cache (remember the pages get flushed to disk later by a separate fsync call).

The checksum algorithm is specifically designed to take advantage of SIMD instructions. The slowest part of the algorithm is multiplication, so if possible PostgreSQL will be compiled to perform 32 multiplications at a time in parallel. In 2013 there were two platforms specifically documented to support this: x86 SSE4.1 and ARM NEON. The implementation is specifically tuned for optimal performance on x86 SSE. This is an important point actually – particularly for folks running PostgreSQL on embedded devices – PostgreSQL certainly compiles and works well on a lot of platforms, but evidently the checksum implementation is explicitly optimized to run the fastest on Intel. (To be clear… I think everyone should use checksums on every platform unless you have a really really good reason not to – just don’t be surprised if you start comparing benchmarks with Intel and you see a difference!)

For folks interested in digging a little more into the history… here’s the original commit using a CRC algorithm which never shipped in an actual PostgreSQL release (Simon Riggs, Jeff Davis and Greg Smith), here’s the subsequent commit introducing the FNV-1a algorithm instead of CRC which is what we still use today (Ants Aasma), and here’s the commit doing the major refactor which moved the algorithm into a header file for easier external use (Tom Lane).

More Ways To Check Validity

At the SEAPUG meetup this led to a discussion about checking validity. Do checksums cover everything? (hint: no) Are there more ways we can validate our databases? (hint: yes)

I haven’t yet made a full list of which files are covered by checksums and which ones aren’t, but I know that not everything is. For example: I’m pretty sure that the visiblity map and SLRU files aren’t covered with checksums. But for what it’s worth, there are two extra tools we can use verification in PostgreSQL databases:

  • The amcheck extension can scan a B-Tree index for a number of logical problems – for example, verifying that all B-Tree pages have items in “logical” order. (This could be useful, for example, if you’re not specifying ICU collation and you recently upgraded your operating system collation libraries… since PostgreSQL uses OS collation by default.)
  • The pg_visibility_map extension includes two functions to check for corruption in the visibility map – pg_check_frozen() and pg_check_visible().

Responding To Checksum Failures

Finally, what if you actually run into a checksum failure? What should you do, and are there any additional tools you should know about?

First of all – on any database – there are a few things you should always do immediately when you see something indicating that a data corruption may have occurred:

  • Verify that your backup retention and your log file retention are sufficiently long – I recommend at least a month (this is a Happiness Hint). You never know how long the investigation might take, or how long ago something important might have happened.
  • Take a moment to articulate and write out the impact to the business. (Are important queries failing right now? Is this causing an application outage?) This seems small but it can be very useful in making decisions later. Don’t exaggerate the impact but don’t understate it either. It can also be helpful to note important timelines that you already know about. For example: management is willing to use yesterday’s backup and take a day of data loss to avoid an 12 hour outage, but not to avoid a 4 hour outage …or… management needs a status update at 11:00am Eastern Time.
  • If there’s a larger team working on the system in question, communicate a freeze on changes until things are resolved.
  • Make a list or inventory of all copies of the data. Backups, physical replicas or hot standbys, logical replicas, copies, etc. Sometimes the very process of making this list can immediately give you ideas for simple solutions (like checking if a hot standby has the block intact). The next thing you probably want to do is check all items in this list to see if they have a valid copy of the data in question. Do not take any actions to remediate the problem right away, collect all of the information first. The data you collect now might useful during RCA investigation after you’re back online.
  • If there was one checksum failure, then you probably want to make sure there aren’t more.
    • If it’s a small database, consider whether you can scan the whole thing and verify the checksum on every single block.
    • If it’s a large database, consider whether you can at least scan all of the system/catalog tables and perhaps scan the tables which are throwing errors in their entirety. (PostgreSQL stops on the first error, so there isn’t an easy way to check if other blocks in the same table also have checksum problems.)
  • A few general best practices… (1) have a second person glance at your screen before you execute any actual changes, (2) never delete anything but always rename/move instead, (3) when investigating individual blocks, also look at the block before and the block after to verify they look like what you’d normally expect, (4) test the remediation plan before running it in production, and (5) document everything. If you’ve never seen Paul Vallée’s FIT-ACER checklist then it’s worth reviewing.

There’s no single prescriptive process for diagnosing the scope of the problem and finding the right path forward for you. It involves learning what you need to know about PostgreSQL, a little creative thinking about possible resolutions, and balancing the needs of the business (for example, how long can you be down and how much data loss can you tolerate).

That being said, there are a few tools I know about which can be very useful in the process. (And there are probably more that I don’t know about; please let me know and I can add them to this list!)


Data investigation:

Unix/Linux Commands
You might be surprised at what you can do with the tools already installed on your operating system. I’ve never been on a unix system that didn’t have dd and od installed, and I find that many Linux systems have hexdump and md5sum installed as well. A few examples of how useful these tools are: dd can extract the individual block with invalid data on the primary server and extract the same block on the hot standby, then od/hexdump can be used to create a human-readable textual dump of the binary data. You can even use diff to find the differences between the blocks. If you have a standby cluster with storage-level replication then you could use md5sum to see at a glance if the blocks match. (Quick word of caution on comparing hot standbys: last I checked, PostgreSQL doesn’t seem to maintain the free space identically on hot standbys, so the checksums might differ on perfectly healthy blocks. You can still look at the diff and verify whether free space is the only difference.) Drawbacks: low-level utilities can only do binary dumps but cannot interpret the data. Also, utilities like dd are “sharp knives” – powerful tools which can cause damage if misused!

For a great example of using dd and od, read the code in Bertrand Drouvot‘s pg_toolkit script collection.
Data investigation and checksum verification:

pg_filedump
This is a crazy awesome utility and I have no idea why it’s not in core PostgreSQL. It makes an easy-to-read textual dump of the binary contents of PostgreSQL data blocks. You can process a whole file or just specify a range of blocks to dump. It can verify checksums and it can even decode the contents of the tuples. As far as I can tell, it was originally written by Patrick Macdonald at Red Hat some time in the 2000’s and then turned over to the PostgreSQL Global Development Group for stewardship around 2011. It’s now in a dedicated repository at git.postgresql.org and it seems that Tom Lane, Christoph Berg and Teodor Sigaev keep it alive but don’t invest heavily in it. Drawbacks: be aware that it doesn’t address the race condition with a running server (see Credativ pg_checksums below). For dumping only a block with a checksum problem, this is not an issue since the server won’t let the block into its buffer cache anyway.
Checksum verification:

PostgreSQL pg_checksums
PostgreSQL itself starting in version 11 has a command-line utility to scan one relation or everything and verify the checksum on every single block. It’s called pg_verify_checksums in v11 and pg_checksums in v12. Drawbacks: first, this utility requires you to shut down the database before it will run. It will throw an error and refuse to run if the database is up. Second, you can scan a single relation but you can’t say which database it’s in… so if the OID exists in multiple databases, there’s no way to just scan the one you care about.
Checksum verification:

Credativ pg_checksums
The fine engineers of Credativ have published an enhanced version of pg_checksums which can verify checksums on a running database. It looks to me like the main case they needed to protect against was the race condition between pg_checksum reading blocks while the running PostgreSQL server was writing those same blocks. Linux of course work on a 4k page size; so if an 8k database page is half written when pg_checksum reads it then we will get a false positive. The version from credativ however is smart enough to deal with this. Drawbacks: check the github issues; there are a couple notable drawbacks but this project was only announced last week and all the drawbacks might be addressed by the time you read this article. Also, being based on the utility in PostgreSQL, the same limitation about scanning a single relation applies.

Note that both Credativ’s and PostgreSQL’s pg_checksums utilities access the control file, even when just verifying checksums. As a result, you need to make sure you compile against the same version of PostgreSQL code as the target database you’re scanning.
Checksum verification:

Satoshi Nagayasu postgres-toolkit
I’m not sure if this is still being maintained, but Satoshi Nagayasu wrote postgres-toolkit quite a long time ago which includes a checksum verification utility. It’s the oldest one I have seen so far – and it still compiles and works! (Though if you want to compile it on PostgreSQL 11 or newer then you need to use the patch in this pull request.) Satoshi’s utility also has the very useful capability of scanning an arbitrary file that you pass in – like pg_filedump but stripped down to just do the checksum verification. It’s clever enough to infer the segment number from the filename and scan the file, even if the file isn’t part of a PostgreSQL installation. This would be useful, for example, if you were on a backup server and wanted to extract a single file from your backup and check if the damaged block has valid checksum in the backup. Drawbacks: be aware that it doesn’t address the race condition with a running server.
Checksum verification:

Google pg_page_verification
Simple program; you pass in a data directory and it will scan every file in the data directory to verify the checksums on all blocks. Published to Github in early 2018. Drawbacks: be aware that it doesn’t address the race condition with a running server. Probably superseded by the built-in PostgreSQL utilities.
Mitigation:

PostgreSQL Developer Options
PostgreSQL has hundreds of “parameters” – knobs and button you can use to configure how it runs. There are 294 entries in the pg_settings table on version 11. Buried in these parameters are a handful of “Developer Options” providing powerful (and dangerous) tools for mitigating data problems – such as ignore_checksum_failure, zero_damaged_pages and ignore_system_indexes. Read very carefully and exercise great care with these options – when not fully understood, they can have unexpected side effects including unintended data loss. Exercise particular care with the ignore_checksum_failure option – even if you set that in an individual session, the page will be readable to all connections… think of it as poisoning the buffer cache. That being said, sometimes an option like zero_damaged_pages is the fastest way to get back up and running. (Just make sure you’ve saved a copy of that block!) By the way… a trick to trigger a read of one specific block is to SELECT * FROM table WHERE ctid='(blockno,1)'
Mitigation:

Unix/Linux Commands
I would discourage the use of dd to mitigate invalid data problems. It’s dangerous even for experienced engineers; simple mistakes can compound the problem. I can’t imagine a situation where this is a better approach than the zero_damaged_pages developer option and a query to read a specific block. That said, I have seen cases where dd was used to zero out a page.

More About Data Investigation

In order to put some of this together, I’ll just do a quick example session. I’m running PostgreSQL 11.5 an on EC2 instance and I used dd to write a few evil bytes into a couple blocks of my database.

First, lets start by just capturing the information from the log files:

$ grep "invalid page" ../log/postgresql.log|sed 's/UTC.*ERROR//'
 2019-10-15 19:53:37 :  invalid page in block 0 of relation base/16385/16493
 2019-10-16 22:26:30 :  invalid page in block 394216 of relation base/16385/16502
 2019-10-16 22:43:24 :  invalid page in block 394216 of relation base/16385/16502
 2019-11-05 23:55:33 :  invalid page in block 394216 of relation base/16385/16502
 2019-11-05 23:57:58 :  invalid page in block 394216 of relation base/16385/16502
 2019-11-05 23:59:14 :  invalid page in block 262644 of relation base/16385/16502
 2019-11-05 23:59:21 :  invalid page in block 262644 of relation base/16385/16502
 2019-11-05 23:59:22 :  invalid page in block 262644 of relation base/16385/16502
 2019-11-05 23:59:23 :  invalid page in block 262644 of relation base/16385/16502
 2019-11-06 00:01:12 :  invalid page in block 262644 of relation base/16385/16502
 2019-11-06 00:01:16 :  invalid page in block 0 of relation base/16385/16493
 2019-11-06 00:02:05 :  invalid page in block 250 of relation base/16385/16492

With a little command-line karate we can list each distinct block and see the first time we got an error on that block:

$ grep "invalid page" ../log/postgresql.log |
sed 's/UTC.*ERROR//' |
awk '{print $1" "$2" "$11" invalid_8k_block "$8" segment "int($8/131072)" offset "($8%131072)}' |
sort -k3,5 -k1,2 |
uniq -f2
   2019-11-06 00:02:05 base/16385/16492 invalid_8k_block 250 segment 0 offset 250
   2019-10-15 19:53:37 base/16385/16493 invalid_8k_block 0 segment 0 offset 0
   2019-11-05 23:59:14 base/16385/16502 invalid_8k_block 262644 segment 2 offset 500
   2019-10-16 22:26:30 base/16385/16502 invalid_8k_block 394216 segment 3 offset 1000 

So we know that there are at least 4 blocks corrupt. Lets scan the whole data directory using Credativ’s pg_checksum (without shutting down the database) to see if there are any more blocks with bad checksums:

$ pg_checksums -D /var/lib/pgsql/11.5/data |& fold -s
 pg_checksums: error: checksum verification failed in file
 "/var/lib/pgsql/11.5/data/base/16385/16502.3", block 1000: calculated checksum
 2ED4 but block contains 4EDF
 pg_checksums: error: checksum verification failed in file
 "/var/lib/pgsql/11.5/data/base/16385/16502.3", block 1010: calculated checksum
 9ECF but block contains ACBE
 pg_checksums: error: checksum verification failed in file
 "/var/lib/pgsql/11.5/data/base/16385/16502.2", block 500: calculated checksum
 5D6 but block contains E459
 pg_checksums: error: checksum verification failed in file
 "/var/lib/pgsql/11.5/data/base/16385/16493", block 0: calculated checksum E7E4
 but block contains 78F9
 pg_checksums: error: checksum verification failed in file
 "/var/lib/pgsql/11.5/data/base/16385/16492", block 250: calculated checksum
 44BA but block contains 3ABA
 Checksum operation completed
 Files scanned:  1551
 Blocks scanned: 624158
 Bad checksums:  5
 Data checksum version: 1

Ah-ha… there was one more bad checksum which didn’t show up in the logs! Next lets choose one of the bad blocks and dump the contents using unix command line tools.

$ dd status=none if=base/16385/16492 bs=8192 count=1 skip=250 | od -A d -t x1z -w16
0000000 00 00 00 00 e0 df 6b b0 ba 3a 04 00 0c 01 80 01  >......k..:......<
0000016 00 20 04 20 00 00 00 00 80 9f f2 00 00 9f f2 00  >. . ............<
0000032 80 9e f2 00 00 9e f2 00 80 9d f2 00 00 9d f2 00  >................<
0000048 80 9c f2 00 00 9c f2 00 80 9b f2 00 00 9b f2 00  >................<
0000064 80 9a f2 00 00 9a f2 00 80 99 f2 00 00 99 f2 00  >................<
0000080 80 98 f2 00 00 98 f2 00 80 97 f2 00 00 97 f2 00  >................<

Here we see the page header and the beginning of the line pointers. One thing I think it’s easy to remember is that the first 8 bytes are the page LSN and the next 2 bytes are the page checksum. Notice that the page checksum bytes contain “ba 3a” which matches the error message from the scan above (3ABA). Sometimes it can be useful to know just the very top of the page even if you don’t remember anything else!

This is useful, but lets try the pg_filedump utility next. This utility takes a lot of options. In this example I’m going to ask it to verify the checksum (-k), only scan one block at offset 250 (-R 250 250) and even to decode the tuples (table row data) to a human-readable format (-D int,int,int,charN). There’s another argument (-f) that can even tell pg_filedump to show hexdump/od style raw data inline but I won’t demonstrate that here.

$ pg_filedump -k -R 250 250 -D int,int,int,charN base/16385/16492

*******************************************************************
* PostgreSQL File/Block Formatted Dump Utility - Version 11.0
*
* File: base/16385/16492
* Options used: -k -R 250 250 -D int,int,int,charN
*
* Dump created on: Fri Nov  8 21:48:38 2019
*******************************************************************

Block  250 ********************************************************
----- Block Offset: 0x001f4000 Offsets: Lower 268 (0x010c) Block: Size 8192 Version 4 Upper 384 (0x0180) LSN: logid 0 recoff 0xb06bdfe0 Special 8192 (0x2000) Items: 61 Free Space: 116 Checksum: 0x3aba Prune XID: 0x00000000 Flags: 0x0004 (ALL_VISIBLE) Length (including item array): 268 Error: checksum failure: calculated 0x44ba. ------ Item 1 -- Length: 121 Offset: 8064 (0x1f80) Flags: NORMAL COPY: 15251 1 0 Item 2 -- Length: 121 Offset: 7936 (0x1f00) Flags: NORMAL COPY: 15252 1 0 Item 3 -- Length: 121 Offset: 7808 (0x1e80) Flags: NORMAL COPY: 15253 1 0

That was the block header and the first few item. (Item pointer data is displayed first, then the table row data itself is displayed on the following line after the word COPY.) Looking down a little bit, we can even see where I wrote the bytes “budstuff” into a random location in this block – it turns out those bytes landed in the middle of a character field. This means that without checksums, PostgreSQL would not have thrown any errors at all but just returned an incorrect string the next time that row was queried!

COPY: 15297	1	0
 Item  48 -- Length:  121  Offset: 2048 (0x0800)  Flags: NORMAL
COPY: 15298	1	0
 Item  49 -- Length:  121  Offset: 1920 (0x0780)  Flags: NORMAL
COPY: 15299	1	0	                                           badstuff
 Item  50 -- Length:  121  Offset: 1792 (0x0700)  Flags: NORMAL
COPY: 15300	1	0

It’s immediately clear how useful this is (and easier to read ). The part where it decodes the table row data into a human readable form is an especially cool trick. Two notes about this.

  • First, the lines are prefixed with the word COPY for a reason – they are actually intended to be formatted so you can grep on the word COPY and then use the “copy” command (or it’s psql cousin) to feed the data directly back into a staging table in the database for cleanup. How cool is that!
  • Second, it can decode only a set of fairly standard data types and relation types.
$ pg_filedump -h

Version 11.0 (for PostgreSQL 11.x)
Copyright (c) 2002-2010 Red Hat, Inc.
Copyright (c) 2011-2018, PostgreSQL Global Development Group

Usage: pg_filedump [-abcdfhikxy] [-R startblock [endblock]] 
          [-D attrlist] [-S blocksize] [-s segsize] [-n segnumber] file

Display formatted contents of a PostgreSQL heap/index/control file
Defaults are: relative addressing, range of the entire file, block
               size as listed on block 0 in the file

The following options are valid for heap and index files:
...
...
...
  -D  Decode tuples using given comma separated list of types
      Supported types:
        bigint bigserial bool char charN date float float4 float8 int
        json macaddr name oid real serial smallint smallserial text
        time timestamp timetz uuid varchar varcharN xid xml
      ~ ignores all attributes left in a tuple

Now you see what happens when I start having fun… a quick update about our SEAPUG meetup this past Tuesday turned into a blog article that’s way too long. :) Hope it’s useful, and as always let me know what I can improve!

PostgreSQL Invalid Page and Checksum Verification Failed

At the Seattle PostgreSQL User Group meetup this past Tuesday, we got onto the topic of invalid pages in PostgreSQL. It was a fun discussion and it made me realize that it’d be worth writing down a bunch of the stuff we talked about – it might be interesting to a few more people too!

Invalid Page In Block

You see an error message that looks like this:

ERROR: invalid page in block 1226710 of relation base/16750/27244

First and foremost – what does this error mean? I like to think of PostgreSQL as having a fairly strong “boundary” between (1) the database itself and (2) the operating system [and by extension everything else… firmware, disks, network, remote storage, etc]. PostgreSQL maintains page validity primarily on the way in and out of its buffer cache.

https://ardentperf.files.wordpress.com/2019/11/pg-page-validity.png?w=150 150w, https://ardentperf.files.wordpress.com/2019/11/pg-page-validity.png?w=300 300w, https://ardentperf.files.wordpress.com/2019/11/pg-page-validity.png?w=768 768w, https://ardentperf.files.wordpress.com/2019/11/pg-page-validity.png 1629w" sizes="(max-width: 1024px) 100vw, 1024px" />

What does this mean in practice? Suppose there’s a physical memory failure and somehow the ECC parity is unable to detect it. This means that a little bit of physical memory on the server now has incorrect garbage and the correct data from that memory is lost.

  • If the garbage bytes map to part of the kernel page cache, then when PostgreSQL tries to copy the page into it’s buffer cache then it will (if possible) detect that something is wrong, refuse to poison its buffer cache with this invalid 8k page, and error out any queries that require this page for processing with the ERROR message above.
  • If the garbage bytes map to part of PostgreSQL’s database buffer cache, then PostgreSQL will quite happily assume nothing is wrong and attempt to process the data on the page. Results are unpredictable; probably all sorts of ERROR messages, crashes and failure modes could result – or maybe even incorrect data returned with no ERROR message at all. (Note that this is probably the case for nearly all software… and also note that ECC is pretty good.)

How PostgreSQL Checks Page Validity

PostgreSQL has two main “validity checks” that it performs on pages. You can read the code in the function PageIsVerified() but I’ll summarize here. You can tell from your error message which validity check failed. It depends on whether you see a second additional WARNING right before the ERROR. The warning would look like this:

WARNING: page verification failed, calculated checksum 3482 but expected 32232
  1. If the above warning is not present, this means the page header failed a basic sanity check. This could conceivably be caused by both problems inside and outside of PostgreSQL.
  2. If you see the above warning (page verification failed), this means the checksum recorded in the block did not match the checksum calculated for the block. This most likely indicates that there was a problem outside of (underneath) the database – operating system, memory, networking, storage, etc.

About Basic Page Header Sanity Checks

As of when I’m writing this article in 2019, the following basic sanity checks are performed on the page header:

  • There are 32 bits reserved for page flag bits; at present only three are used and the other 29 bits should always be zero/off.
  • Every page is divided into four parts (header, free space, tuples, special space).  Offsets for the divisions are stored as 16-bit numbers in the page header; the offsets should go in order and should not have a value pointing off the page.
  • The offset of the special space should always be aligned.

About PostgreSQL Checksums

PostgreSQL version 9.3 (released in 2013) added the ability to calculate a checksum on data pages and store the checksum in the page. There are two inputs to the checksum: (1) every single byte of the data page, with zeros in the four bytes where the checksum will be stored later and (2) the page offset/address. This means that PostgreSQL doesn’t just detect if a byte is changed in the page – it also detects if a perfectly valid page gets somehow dropped into the wrong place.

Checksums are not maintained for blocks while they are in the shared buffers – so if you look at a buffer in the PostgreSQL page cache with pageinspect and you see a checksum value, note that it’s probably just leftover from the last read but wouldn’t have been maintained when the page was changed. The checksum is calculated and stamped onto the page when the page is written out of the buffer cache into the operating system page cache (remember the pages get flushed to disk later by a separate fsync call).

The checksum algorithm is specifically designed to take advantage of SIMD instructions. The slowest part of the algorithm is multiplication, so if possible PostgreSQL will be compiled to perform 32 multiplications at a time in parallel. In 2013 there were two platforms specifically documented to support this: x86 SSE4.1 and ARM NEON. The implementation is specifically tuned for optimal performance on x86 SSE. This is an important point actually – particularly for folks running PostgreSQL on embedded devices – PostgreSQL certainly compiles and works well on a lot of platforms, but evidently the checksum implementation is explicitly optimized to run the fastest on Intel. (To be clear… I think everyone should use checksums on every platform unless you have a really really good reason not to – just don’t be surprised if you start comparing benchmarks with Intel and you see a difference!)

For folks interested in digging a little more into the history… here’s the original commit using a CRC algorithm which never shipped in an actual PostgreSQL release (Simon Riggs, Jeff Davis and Greg Smith), here’s the subsequent commit introducing the FNV-1a algorithm instead of CRC which is what we still use today (Ants Aasma), and here’s the commit doing the major refactor which moved the algorithm into a header file for easier external use (Tom Lane).

More Ways To Check Validity

At the SEAPUG meetup this led to a discussion about checking validity. Do checksums cover everything? (hint: no) Are there more ways we can validate our databases? (hint: yes)

I haven’t yet made a full list of which files are covered by checksums and which ones aren’t, but I know that not everything is. For example: I’m pretty sure that the visiblity map and SLRU files aren’t covered with checksums. But for what it’s worth, there are two extra tools we can use verification in PostgreSQL databases:

  • The amcheck extension can scan a B-Tree index for a number of logical problems – for example, verifying that all B-Tree pages have items in “logical” order. (This could be useful, for example, if you’re not specifying ICU collation and you recently upgraded your operating system collation libraries… since PostgreSQL uses OS collation by default.)
  • The pg_visibility_map extension includes two functions to check for corruption in the visibility map – pg_check_frozen() and pg_check_visible().

Responding To Checksum Failures

Finally, what if you actually run into a checksum failure? What should you do, and are there any additional tools you should know about?

First of all – on any database – there are a few things you should always do immediately when you see something indicating that a data corruption may have occurred:

  • Verify that your backup retention and your log file retention are sufficiently long – I recommend at least a month (this is a Happiness Hint). You never know how long the investigation might take, or how long ago something important might have happened.
  • Take a moment to articulate and write out the impact to the business. (Are important queries failing right now? Is this causing an application outage?) This seems small but it can be very useful in making decisions later. Don’t exaggerate the impact but don’t understate it either. It can also be helpful to note important timelines that you already know about. For example: management is willing to use yesterday’s backup and take a day of data loss to avoid an 12 hour outage, but not to avoid a 4 hour outage …or… management needs a status update at 11:00am Eastern Time.
  • If there’s a larger team working on the system in question, communicate a freeze on changes until things are resolved.
  • Make a list or inventory of all copies of the data. Backups, physical replicas or hot standbys, logical replicas, copies, etc. Sometimes the very process of making this list can immediately give you ideas for simple solutions (like checking if a hot standby has the block intact). The next thing you probably want to do is check all items in this list to see if they have a valid copy of the data in question. Do not take any actions to remediate the problem right away, collect all of the information first. The data you collect now might useful during RCA investigation after you’re back online.
  • If there was one checksum failure, then you probably want to make sure there aren’t more.
    • If it’s a small database, consider whether you can scan the whole thing and verify the checksum on every single block.
    • If it’s a large database, consider whether you can at least scan all of the system/catalog tables and perhaps scan the tables which are throwing errors in their entirety. (PostgreSQL stops on the first error, so there isn’t an easy way to check if other blocks in the same table also have checksum problems.)
  • A few general best practices… (1) have a second person glance at your screen before you execute any actual changes, (2) never delete anything but always rename/move instead, (3) when investigating individual blocks, also look at the block before and the block after to verify they look like what you’d normally expect, (4) test the remediation plan before running it in production, and (5) document everything. If you’ve never seen Paul Vallée’s FIT-ACER checklist then it’s worth reviewing.

There’s no single prescriptive process for diagnosing the scope of the problem and finding the right path forward for you. It involves learning what you need to know about PostgreSQL, a little creative thinking about possible resolutions, and balancing the needs of the business (for example, how long can you be down and how much data loss can you tolerate).

That being said, there are a few tools I know about which can be very useful in the process. (And there are probably more that I don’t know about; please let me know and I can add them to this list!)


Data investigation:

Unix/Linux Commands
You might be surprised at what you can do with the tools already installed on your operating system. I’ve never been on a unix system that didn’t have dd and od installed, and I find that many Linux systems have hexdump and md5sum installed as well. A few examples of how useful these tools are: dd can extract the individual block with invalid data on the primary server and extract the same block on the hot standby, then od/hexdump can be used to create a human-readable textual dump of the binary data. You can even use diff to find the differences between the blocks. If you have a standby cluster with storage-level replication then you could use md5sum to see at a glance if the blocks match. (Quick word of caution on comparing hot standbys: last I checked, PostgreSQL doesn’t seem to maintain the free space identically on hot standbys, so the checksums might differ on perfectly healthy blocks. You can still look at the diff and verify whether free space is the only difference.) Drawbacks: low-level utilities can only do binary dumps but cannot interpret the data. Also, utilities like dd are “sharp knives” – powerful tools which can cause damage if misused!

For a great example of using dd and od, read the code in Bertrand Drouvot‘s pg_toolkit script collection.
Data investigation and checksum verification:

pg_filedump
This is a crazy awesome utility and I have no idea why it’s not in core PostgreSQL. It makes an easy-to-read textual dump of the binary contents of PostgreSQL data blocks. You can process a whole file or just specify a range of blocks to dump. It can verify checksums and it can even decode the contents of the tuples. As far as I can tell, it was originally written by Patrick Macdonald at Red Hat some time in the 2000’s and then turned over to the PostgreSQL Global Development Group for stewardship around 2011. It’s now in a dedicated repository at git.postgresql.org and it seems that Tom Lane, Christoph Berg and Teodor Sigaev keep it alive but don’t invest heavily in it. Drawbacks: be aware that it doesn’t address the race condition with a running server (see Credativ pg_checksums below). For dumping only a block with a checksum problem, this is not an issue since the server won’t let the block into its buffer cache anyway.
Checksum verification:

PostgreSQL pg_checksums
PostgreSQL itself starting in version 11 has a command-line utility to scan one relation or everything and verify the checksum on every single block. It’s called pg_verify_checksums in v11 and pg_checksums in v12. Drawbacks: first, this utility requires you to shut down the database before it will run. It will throw an error and refuse to run if the database is up. Second, you can scan a single relation but you can’t say which database it’s in… so if the OID exists in multiple databases, there’s no way to just scan the one you care about.
Checksum verification:

Credativ pg_checksums
The fine engineers of Credativ have published an enhanced version of pg_checksums which can verify checksums on a running database. It looks to me like the main case they needed to protect against was the race condition between pg_checksum reading blocks while the running PostgreSQL server was writing those same blocks. Linux of course work on a 4k page size; so if an 8k database page is half written when pg_checksum reads it then we will get a false positive. The version from credativ however is smart enough to deal with this. Drawbacks: check the github issues; there are a couple notable drawbacks but this project was only announced last week and all the drawbacks might be addressed by the time you read this article. Also, being based on the utility in PostgreSQL, the same limitation about scanning a single relation applies.

Note that both Credativ’s and PostgreSQL’s pg_checksums utilities access the control file, even when just verifying checksums. As a result, you need to make sure you compile against the same version of PostgreSQL code as the target database you’re scanning.
Checksum verification:

Satoshi Nagayasu postgres-toolkit
I’m not sure if this is still being maintained, but Satoshi Nagayasu wrote postgres-toolkit quite a long time ago which includes a checksum verification utility. It’s the oldest one I have seen so far – and it still compiles and works! (Though if you want to compile it on PostgreSQL 11 or newer then you need to use the patch in this pull request.) Satoshi’s utility also has the very useful capability of scanning an arbitrary file that you pass in – like pg_filedump but stripped down to just do the checksum verification. It’s clever enough to infer the segment number from the filename and scan the file, even if the file isn’t part of a PostgreSQL installation. This would be useful, for example, if you were on a backup server and wanted to extract a single file from your backup and check if the damaged block has valid checksum in the backup. Drawbacks: be aware that it doesn’t address the race condition with a running server.
Checksum verification:

Google pg_page_verification
Simple program; you pass in a data directory and it will scan every file in the data directory to verify the checksums on all blocks. Published to Github in early 2018. Drawbacks: be aware that it doesn’t address the race condition with a running server. Probably superseded by the built-in PostgreSQL utilities.
Mitigation:

PostgreSQL Developer Options
PostgreSQL has hundreds of “parameters” – knobs and button you can use to configure how it runs. There are 294 entries in the pg_settings table on version 11. Buried in these parameters are a handful of “Developer Options” providing powerful (and dangerous) tools for mitigating data problems – such as ignore_checksum_failure, zero_damaged_pages and ignore_system_indexes. Read very carefully and exercise great care with these options – when not fully understood, they can have unexpected side effects including unintended data loss. Exercise particular care with the ignore_checksum_failure option – even if you set that in an individual session, the page will be readable to all connections… think of it as poisoning the buffer cache. That being said, sometimes an option like zero_damaged_pages is the fastest way to get back up and running. (Just make sure you’ve saved a copy of that block!) By the way… a trick to trigger a read of one specific block is to SELECT * FROM table WHERE ctid='(blockno,1)'
Mitigation:

Unix/Linux Commands
I would discourage the use of dd to mitigate invalid data problems. It’s dangerous even for experienced engineers; simple mistakes can compound the problem. I can’t imagine a situation where this is a better approach than the zero_damaged_pages developer option and a query to read a specific block. That said, I have seen cases where dd was used to zero out a page.

More About Data Investigation

In order to put some of this together, I’ll just do a quick example session. I’m running PostgreSQL 11.5 an on EC2 instance and I used dd to write a few evil bytes into a couple blocks of my database.

First, lets start by just capturing the information from the log files:

$ grep "invalid page" ../log/postgresql.log|sed 's/UTC.*ERROR//'
 2019-10-15 19:53:37 :  invalid page in block 0 of relation base/16385/16493
 2019-10-16 22:26:30 :  invalid page in block 394216 of relation base/16385/16502
 2019-10-16 22:43:24 :  invalid page in block 394216 of relation base/16385/16502
 2019-11-05 23:55:33 :  invalid page in block 394216 of relation base/16385/16502
 2019-11-05 23:57:58 :  invalid page in block 394216 of relation base/16385/16502
 2019-11-05 23:59:14 :  invalid page in block 262644 of relation base/16385/16502
 2019-11-05 23:59:21 :  invalid page in block 262644 of relation base/16385/16502
 2019-11-05 23:59:22 :  invalid page in block 262644 of relation base/16385/16502
 2019-11-05 23:59:23 :  invalid page in block 262644 of relation base/16385/16502
 2019-11-06 00:01:12 :  invalid page in block 262644 of relation base/16385/16502
 2019-11-06 00:01:16 :  invalid page in block 0 of relation base/16385/16493
 2019-11-06 00:02:05 :  invalid page in block 250 of relation base/16385/16492

With a little command-line karate we can list each distinct block and see the first time we got an error on that block:

$ grep "invalid page" ../log/postgresql.log |
sed 's/UTC.*ERROR//' |
awk '{print $1" "$2" "$11" invalid_8k_block "$8" segment "int($8/131072)" offset "($8%131072)}' |
sort -k3,5 -k1,2 |
uniq -f2
   2019-11-06 00:02:05 base/16385/16492 invalid_8k_block 250 segment 0 offset 250
   2019-10-15 19:53:37 base/16385/16493 invalid_8k_block 0 segment 0 offset 0
   2019-11-05 23:59:14 base/16385/16502 invalid_8k_block 262644 segment 2 offset 500
   2019-10-16 22:26:30 base/16385/16502 invalid_8k_block 394216 segment 3 offset 1000 

So we know that there are at least 4 blocks corrupt. Lets scan the whole data directory using Credativ’s pg_checksum (without shutting down the database) to see if there are any more blocks with bad checksums:

$ pg_checksums -D /var/lib/pgsql/11.5/data |& fold -s
 pg_checksums: error: checksum verification failed in file
 "/var/lib/pgsql/11.5/data/base/16385/16502.3", block 1000: calculated checksum
 2ED4 but block contains 4EDF
 pg_checksums: error: checksum verification failed in file
 "/var/lib/pgsql/11.5/data/base/16385/16502.3", block 1010: calculated checksum
 9ECF but block contains ACBE
 pg_checksums: error: checksum verification failed in file
 "/var/lib/pgsql/11.5/data/base/16385/16502.2", block 500: calculated checksum
 5D6 but block contains E459
 pg_checksums: error: checksum verification failed in file
 "/var/lib/pgsql/11.5/data/base/16385/16493", block 0: calculated checksum E7E4
 but block contains 78F9
 pg_checksums: error: checksum verification failed in file
 "/var/lib/pgsql/11.5/data/base/16385/16492", block 250: calculated checksum
 44BA but block contains 3ABA
 Checksum operation completed
 Files scanned:  1551
 Blocks scanned: 624158
 Bad checksums:  5
 Data checksum version: 1

Ah-ha… there was one more bad checksum which didn’t show up in the logs! Next lets choose one of the bad blocks and dump the contents using unix command line tools.

$ dd status=none if=base/16385/16492 bs=8192 count=1 skip=250 | od -A d -t x1z -w16
0000000 00 00 00 00 e0 df 6b b0 ba 3a 04 00 0c 01 80 01  >......k..:......<
0000016 00 20 04 20 00 00 00 00 80 9f f2 00 00 9f f2 00  >. . ............<
0000032 80 9e f2 00 00 9e f2 00 80 9d f2 00 00 9d f2 00  >................<
0000048 80 9c f2 00 00 9c f2 00 80 9b f2 00 00 9b f2 00  >................<
0000064 80 9a f2 00 00 9a f2 00 80 99 f2 00 00 99 f2 00  >................<
0000080 80 98 f2 00 00 98 f2 00 80 97 f2 00 00 97 f2 00  >................<

Here we see the page header and the beginning of the line pointers. One thing I think it’s easy to remember is that the first 8 bytes are the page LSN and the next 2 bytes are the page checksum. Notice that the page checksum bytes contain “ba 3a” which matches the error message from the scan above (3ABA). Sometimes it can be useful to know just the very top of the page even if you don’t remember anything else!

This is useful, but lets try the pg_filedump utility next. This utility takes a lot of options. In this example I’m going to ask it to verify the checksum (-k), only scan one block at offset 250 (-R 250 250) and even to decode the tuples (table row data) to a human-readable format (-D int,int,int,charN). There’s another argument (-f) that can even tell pg_filedump to show hexdump/od style raw data inline but I won’t demonstrate that here.

$ pg_filedump -k -R 250 250 -D int,int,int,charN base/16385/16492

*******************************************************************
* PostgreSQL File/Block Formatted Dump Utility - Version 11.0
*
* File: base/16385/16492
* Options used: -k -R 250 250 -D int,int,int,charN
*
* Dump created on: Fri Nov  8 21:48:38 2019
*******************************************************************

Block  250 ********************************************************
----- Block Offset: 0x001f4000 Offsets: Lower 268 (0x010c) Block: Size 8192 Version 4 Upper 384 (0x0180) LSN: logid 0 recoff 0xb06bdfe0 Special 8192 (0x2000) Items: 61 Free Space: 116 Checksum: 0x3aba Prune XID: 0x00000000 Flags: 0x0004 (ALL_VISIBLE) Length (including item array): 268 Error: checksum failure: calculated 0x44ba. ------ Item 1 -- Length: 121 Offset: 8064 (0x1f80) Flags: NORMAL COPY: 15251 1 0 Item 2 -- Length: 121 Offset: 7936 (0x1f00) Flags: NORMAL COPY: 15252 1 0 Item 3 -- Length: 121 Offset: 7808 (0x1e80) Flags: NORMAL COPY: 15253 1 0

That was the block header and the first few item. (Item pointer data is displayed first, then the table row data itself is displayed on the following line after the word COPY.) Looking down a little bit, we can even see where I wrote the bytes “budstuff” into a random location in this block – it turns out those bytes landed in the middle of a character field. This means that without checksums, PostgreSQL would not have thrown any errors at all but just returned an incorrect string the next time that row was queried!

COPY: 15297	1	0
 Item  48 -- Length:  121  Offset: 2048 (0x0800)  Flags: NORMAL
COPY: 15298	1	0
 Item  49 -- Length:  121  Offset: 1920 (0x0780)  Flags: NORMAL
COPY: 15299	1	0	                                           badstuff
 Item  50 -- Length:  121  Offset: 1792 (0x0700)  Flags: NORMAL
COPY: 15300	1	0

It’s immediately clear how useful this is (and easier to read ). The part where it decodes the table row data into a human readable form is an especially cool trick. Two notes about this.

  • First, the lines are prefixed with the word COPY for a reason – they are actually intended to be formatted so you can grep on the word COPY and then use the “copy” command (or it’s psql cousin) to feed the data directly back into a staging table in the database for cleanup. How cool is that!
  • Second, it can decode only a set of fairly standard data types and relation types.
$ pg_filedump -h

Version 11.0 (for PostgreSQL 11.x)
Copyright (c) 2002-2010 Red Hat, Inc.
Copyright (c) 2011-2018, PostgreSQL Global Development Group

Usage: pg_filedump [-abcdfhikxy] [-R startblock [endblock]] 
          [-D attrlist] [-S blocksize] [-s segsize] [-n segnumber] file

Display formatted contents of a PostgreSQL heap/index/control file
Defaults are: relative addressing, range of the entire file, block
               size as listed on block 0 in the file

The following options are valid for heap and index files:
...
...
...
  -D  Decode tuples using given comma separated list of types
      Supported types:
        bigint bigserial bool char charN date float float4 float8 int
        json macaddr name oid real serial smallint smallserial text
        time timestamp timetz uuid varchar varcharN xid xml
      ~ ignores all attributes left in a tuple

Now you see what happens when I start having fun… a quick update about our SEAPUG meetup this past Tuesday turned into a blog article that’s way too long. :) Hope it’s useful, and as always let me know what I can improve!

PASS Summit 2nd Day Keynote

Woohoo, we’ve made it to the second day of PASS Summit!  After an awesome first day running from the keynote, to my first session with Denise McInerney, late for my executive meeting with less than a dozen other attendees with Rohan Kumar and sessions/networking/parties, here we are for the second day of PASS Summit keynote!

Info and Awards

We started the day with Wendy Pastrick to tell us about the financial status of PASS and she decided to make sure we were awake and serenade us with a lovely song first!

Tim Ford was next to talk about what the community and PASS means to all of us.  He announced the PASSion award, which was given to my wonderful friend and co-presenter in webinars and fellow Linux Learning Pathway sessions, Hamish Watson!  Congratulations, Hamish!!

2nd Day Keynote- TARAH WHEELER!!

Tarah Wheeler is a cybersecurity expert and has written a book on “Women in Tech, Take Your Career to the Next Level“.  She came to speak to us about the daily challenges of data breaches in companies and why we are an important part of working to solve it.  She wants DBAs and others in our industry to be diplomats.  We should recognize the vulnerabilities and work with our companies to make them more cyber-secure.

She discussed the isolation and separation of the internet into three functional internets due to GDPR and other data privacy and government policies in the world today.

https://dbakevlar.com/wp-content/uploads/2019/11/7ADFA9F8-14FA-4CE3-A1DE... 300w, https://dbakevlar.com/wp-content/uploads/2019/11/7ADFA9F8-14FA-4CE3-A1DE... 768w" sizes="(max-width: 800px) 100vw, 800px" />
She discussed how data is simply collected in countries like China vs. the rest of the world, along with GDPR and how most data retention isn’t designed for GDPR policies.  Data retention, backup and collection all has implications depending on these.  How we have to think about data has changed drastically in recent years.  Some of this has come from capabilities in technology, some has come from regulations/policies and others through breaches.

AWESOMESAUCE

I really, really enjoyed this keynote-  as much as I enjoyed hearing about the future of our own platform of products from Microsoft after the MSIgnite announcements, Tarah’s presentation was timely and dramatically important to us as data professionals.

The redundancy we put into our systems to protect our data also makes us vulnerable and this talk resonated with me deeply.  The constant conflict between retain the data and remove it when policy/regulations/individuals rights state we must.  It’s a constant push and pull of data professionals today.

She discussed the importance of future investment in data retention policies, cybersecurity and what I know many of us knew was coming.  We have a tendency to procrastinate.  We think it’s always tomorrow that we need to worry about it because there is so much else that we need to worry about, but critical data and policies around GDPR and cybersecurity is an essential topic TODAY.

I did a lot of GDPR talks, including an intro talk last year at PASS Summit.  Her use cases on China regulations, GDPR and California combinations, etc. are similar to ones I created in my own examples.  I was able to quickly demonstrate to any attendee of the talk who told me they weren’t susceptible to GDPR regulations how they were.  The ease at which I was able to make them susceptible zwas quite mind-blowing, so I was thrilled she was able to demonstrate this to the entire attendance at Summit.  I’m all @TeamTarah now.

Thank you to PASS for having Tarah as the second day keynote-  it was a great keynote today and with yesterday’s Rohan yesterday, I’m looking forward to tomorrow’s even more!

 

 



Tags:  , ,


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright ©  [PASS Summit 2nd Day Keynote], All Right Reserved. 2019.

Oracle database and grid home patches to install

This blogpost is about Oracle database and grid infrastructure software homes, which patches should be applied to which homes, and what it then looks like. This is fully documented by MyOracleSupport notes, but you will see that with version 18 and up this is unclear.

I keep a script-set that automatically installs and patches the Oracle database software and creates a database. This script-set is called vagrant-builder, and it can install any version with any PSU applied between 11.2.0.2 up to 19.5, which is the latest PSU of the latest version, with a few exceptions: for 11.2.0.2 and 11.2.0.3 I only created an install for the base version and the latest PSU for the database, and version 12.1.0.1 is left out entirely.

I recently reviewed my installs and verified everything is carried out correctly. First a simple overview of what I think should be applied on the database and grid infrastructure install:

Version  Grid               Database
-------- ------             ----------
11.2.0.2 -                  DB PSU
11.2.0.3 -                  DB PSU+OJVM
11.2.0.4 GI PSU+JDBC patch  DB PSU+OJVM
12.1.0.2 GI PSU+JDBC patch  DB PSU+OJVM
12.2.0.1 GI PSU             DB PSU+OJVM
18       GI PSU             DB PSU+OJVM
19       GI PSU             DB PSU+OJVM

(‘-‘ means not investigated)
My idea of what should be applied is based on MOS note 1929745.1: Oracle recommended patches.

Grid patches, JDBC patch
The JDBC patch for grid infrastructure 11.2.0.4 and 12.1.0.2 is a patch that updates java classes. Therefore the patch is a generic one, the java classes do not contain operating system dependent machine code. The table in the MOS note also show differences for the JDBC patch between = januari 2014 = july 2016 and higher. No surprise there. It’s especially easy because all versions in premier support as of the date of this blogpost do not need the JDBC patch.

Grid patches, GI PSU
For the GI PSU, there are a lot of patches that contain the GI PSU, because outside of the GI patch itself, there are also combination patches that for example contain both the GI and the DB PSU. I like to keep it as simple as I can. Therefore, I stick to MOS note 2118136.2: Assistant: Download Reference for Oracle Database/GI Update, Revision, PSU, SPU(CPU), bundle patches, Patchsets and Base releases, and look at the following:
– versions 11.2.0.4/12.1.0.2: Oracle Database PSU, SPU(CPU), Bundle Patches (Versions 12.1 & lower), version (11.2.0.4/12.1.0.2), GI PSU column.
– versions 12.2.0.1 and up: Oracle Database Updates, version (12.2.0.1,18.0.0.0,19.0.0.0), GI Update column.

Database patches, DB PSU
Here too there are multiple patches that can be used to apply the DB PSU, and I stick with MOS note 2118136.2: Assistant: Download Reference for Oracle Database/GI Update, Revision, PSU, SPU(CPU), bundle patches, Patchsets and Base releases, and look at the following:
– versions 11.2.0.4/12.1.0.2: Oracle Database PSU, SPU(CPU), Bundle Patches (Versions 12.1 & lower), version (11.2.0.4/12.1.0.2), PSU column.
– versions 12.2.0.1 and up: Oracle Database Updates, version (12.2.0.1,18.0.0.0,19.0.0.0), DB Update column.

Database patches, OJVM
There are multiple MOS documents talking about the database JavaVM patch, and there are multiple patches, but here I stick with MOS note 2118136.2: Assistant: Download Reference for Oracle Database/GI Update, Revision, PSU, SPU(CPU), bundle patches, Patchsets and Base releases once again, and look at the following:
– OJVM Update/PSU/Bundle Patches, 11.2.0.4/12.1.0.2/12.2.0.1/18.0.0.0/19.0.0.0, OJVM Update.

Okay. So we got the table above that is based on MOS note 1929745.1, and we got all the patches organised in MOS note 2118136.2. So that’s nice and simple, right?

Well, not entirely…

As I said, I was checking up on the latest PSU installs. This is the ‘opatch lspatches’ overview of 12.1.0.2 and 12.2.0.1:

-- 12.1.0.2
GI 190716
---------
23727148;
29509318;OCW PATCH SET UPDATE 12.1.0.2.190716 (29509318)
29494060;Database Patch Set Update : 12.1.0.2.190716 (29494060)
29423125;ACFS PATCH SET UPDATE 12.1.0.2.190716 (29423125)
26983807;WLM Patch Set Update: 12.1.0.2.180116 (26983807)
DB 190716
---------
29774383;Database PSU 12.1.0.2.190716, Oracle JavaVM Component (JUL2019)
29494060;Database Patch Set Update : 12.1.0.2.190716 (29494060)
-- 12.2.0.1
GI 191015
---------
30138470;Database Oct 2019 Release Update : 12.2.0.1.191015 (30138470)
30122828;ACFS OCT 2019 RELEASE UPDATE 12.2.0.1.0 (30122828)
30122814;OCW OCT 2019 RELEASE UPDATE 12.2.0.1.191015 (30122814)
30093408;TOMCAT RELEASE UPDATE 12.2.0.1.0(ID:190208.0920) (30093408)
26839277;DBWLM RELEASE UPDATE 12.2.0.1.0(ID:170913) (26839277
DB 191015
---------
30133625;OJVM RELEASE UPDATE: 12.2.0.1.191015 (30133625)
30138470;Database Oct 2019 Release Update : 12.2.0.1.191015 (30138470)

For the database (DB), we see two patches, which is exactly what is expected.
– The database patch, which is called release update with one and patch set update with the other.
– The OJVM update which also named differently between the two versions, but very much recognisable as such.
I think it’s bad that the names vary, but this is totally expected.

For the grid infrastructure (GI), we see 5 patches in both situations, but these are not the same patches between the homes!
– The unnamed patch in the 12.1.0.2 home is the JDBC patch, which should only be applied to 12.1.0.2, not to higher versions.
– The OCW patch (oracle clusterware).
– The database patch.
– The ACFS patch. The version indication in the patch name changed.
– The WLM/DBWLM patch. Sadly the name changed, and the name with 12.2.0.1 is actually quite useless, I can’t tell the actual version, I have to look up the patch number.
– Starting from version 12.2.0.1, there is a tomcat installation in the grid home, as this patch indicates. The name here is not helpful because it doesn’t indicate the actual version, like with the DBWLM patch.

So, outside of in my opinion bad naming, and a weird inclusion of a competing product of Oracle (Tomcat versus Weblogic), this still follows the rules of logic.

Now let’s look at the same output for version 18 and 19:

--18
GI 18.8
30116128;ACFS RELEASE UPDATE 18.8.0.0.0 (30116128)
30113775;OCW RELEASE UPDATE 18.8.0.0.0 (30113775)
30112122;Database Release Update : 18.8.0.0.191015 (30112122)
30093398;TOMCAT RELEASE UPDATE 18.0.0.0.0 (30093398)
28655963;DBWLM RELEASE UPDATE 18.4.0.0.0 (28655963)
27923415;OJVM RELEASE UPDATE: 18.3.0.0.180717 (27923415)
DB 18.8
30133603;OJVM RELEASE UPDATE: 18.8.0.0.191015 (30133603)
30112122;Database Release Update : 18.8.0.0.191015 (30112122)
28090553;OCW RELEASE UPDATE 18.3.0.0.0 (28090553)
--19
GI 19.5
30125133;Database Release Update : 19.5.0.0.191015 (30125133)
30122167;ACFS RELEASE UPDATE 19.5.0.0.0 (30122167)
30122149;OCW RELEASE UPDATE 19.5.0.0.0 (30122149)
29401763;TOMCAT RELEASE UPDATE 19.0.0.0.0 (29401763)
DB 19.5
30128191;OJVM RELEASE UPDATE: 19.5.0.0.191015 (30128191)
30125133;Database Release Update : 19.5.0.0.191015 (30125133)
29585399;OCW RELEASE UPDATE 19.3.0.0.0 (29585399)

I must say that the naming in general looks more consistent, that is a good thing!

For the database (DB) we see THREE patches (I suspected 2):
– The database release update patch, which nicely reports its version.
– The OVM patch, which also reports its version.
– This is weird. One of the grid infrastructure patches, the “OCW” patch, is applied to the database home. In fact, this is applied to the base release. Because it’s not a patch that is documented to be needed to be applied, this will sit at this version and never be updated. However, this unexpected patch is consistently applied to the base release for both version 18 and 19.

For the grid home (GI) we see an inconsistent number of patches (!) between 18.8 and 19.5. This is partly as expected, but I found an weird patch applied too. Let’s go over the patches:
– The ACFS patch is totally expected and appropriately named.
– The OCW patch is expected and appropriately named too.
– The Database Release Update patch is expected and appropriately named.
– The TOMCAT patch is expected. I don’t understand why it can’t have the RU numbering, but at least it’s consistent between 18 and 19.
– The DBWLM patch is only applied to the 18 home. As far as I understand, this is how it’s supposed to be, DBWLM is not regularly updated like the other ones above, so it’s okay to have an older version of it with the other patches, and if there isn’t a patch to apply, it can simply not be there, like with version 19 in this case.
– The OJVM patch puzzles me. I don’t know what to think of it. Also, it’s only applied to the base release of version 18, not to the base release of version 19. This, very much like the OCW patch, will never be updated. But I just don’t understand, this patches the java virtual machine in the database, which for GI is the ASM instance, for which, as far as I know, the java virtual machine isn’t used.

Conclusion.
The naming of the patches as visible with “opatch lspatches” has certainly improved with version 18 and higher. Still it would be helpful if the grid infrastructure tomcat patch would follow the same naming of the other patches.

I am in doubts about the inclusion of two patches in the base releases of versions 18 and 19:
The OJVM patch inclusion in the base release of grid infrastructure of version 18 only.
The OCW patch inclusion in the base release of the database of version 18 and 19.

Addendum.
After debating this on twitter and with my colleagues, I found that my OCW assumptions were incorrect. The grid infrastructure patch versions 12.2 and up reasonably clearly describes that the OCW sub-patch that is part of the GI PSU/RU patch should be applied to the database home too (if cluster ware is used for that home). So that means that if you got another version of the database home than the grid infrastructure home and it is used with grid infrastructure, you should download the grid infrastructure PSU/RU patch and apply the OCW sub patch to the oracle database home, because the OCW patch is not in the database PSU/RU patch.
Because the database home patch itself is also in the grid infrastructure PSU/RU patch, I see no reason to bother downloading the database patch, and now only download and use the grid infrastructure PSU/RU patch, because that contains the database home patch as well as the OCW patch.

Addendum 2.
The OCW patch being installed into the grid infrastructure home and additionally in the database home turns out to be a change with PSU 11.2.0.4.5. Before that, the OCW patch did exist with the grid infrastructure home patch, but couldn’t be applied to the database home. This change was documented in the patch readme, but wasn’t really heavily marketed by Oracle.

Addendum 3.
Despite OCW being documented as being additionally applied to the database home for 11.2.0.4(.5) and higher, in my tests it was not possible to apply the 11.2.0.4 OCW patch to a database home for any 11.2.0.4 PSU; opatch fails with a dependency problem. It does succeed starting from 12.1.0.2 (12.1.0.1 not tested) and higher.

I have to say that when looking in the 11.2.0.4 OCW patch metadata, it says ‘rac’, so the OCW patch might succeed when the home is explicitly installed for a cluster database. I tested with grid infrastructure installed for a single machine (“siha”).

Docker Birmingham – November 2019

https://oracle-base.com/blog/wp-content/uploads/2019/09/docker-birmingha... 300w, https://oracle-base.com/blog/wp-content/uploads/2019/09/docker-birmingha... 768w" sizes="(max-width: 265px) 85vw, 265px" />

Yesterday evening I went to the Docker Birmingham meetup, sponsored by Black Cat Technology Solutions.

This event was a single longer session by Matt Todd called “Make Data Science Great Again (Part 2)” I didn’t see part 1, but he gave a summary, so that wasn’t too much of a problem.

Matt started off by introducing the scientific method and discussed reducing variables when testing, so you know the impact of a change. The suggestion being that development and data science should be the same. What better way to reduce variables than to package up a data science lab to make sure everyone is working on the same thing, so there is no/less variability between researchers, and they can focus on their work, not piecing together the kit.

He then went on to discuss Cloud Native Application Bundles (CNAB), and how they can be used to more reliably package multi-container applications, guaranteeing dependencies to a level greater than that possible by using Docker Compose alone. As an example he demoed his Digital Scientists Lab, which is a CNAB bundle containing a bunch of common kit used by data scientists (Jupyter, Spark, Flink, Kafka, RabbitMQ, Nifi, Elastic Stack etc.), which he could in theory give to several people to run experiments. It’s just his playground, but it gives you an idea of what’s possible. Using Nifi to link stuff together looked cool!

I started to make a few mental connections when he discussed the relationship to experimental data, because I look after the infrastructure for a research publishing system, and being able to keep not only the published research, but also the experimental data and potentially a way to reproduce the research findings is pretty important. It keeps those academics honest, if you know what I mean. It’s funny how just a few links to something you know a little bit about, and other stuff starts falling into place.

It was all a bit mind-blowing, but in a good way. I’m still only scratching the surface of this stuff, but it’s really good to see what else is going on in this space. I’ve added a few more things to my list of things to play around with.

It’s a couple of months until the next event, but there’s a CNCF event next month, so watch out for that!

Thanks to the folks at Black Cat Technology Solutions for sponsoring and organising the event, and to the Matt Todd for doing a great session. See you soon!

Cheers

Tim…


Docker Birmingham – November 2019 was first posted on November 7, 2019 at 8:58 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.