Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Power BI 101- Logging and Tracing, Part III

Power BI, like many Microsoft products, is multi-threaded.  This can be seen from the logs and even the Task Manager.  I know, I know…you’ve probably heard this part all before…

The importance of this information, is that the logs will display Process IDs, (PID) that are separate from the main Power BI Desktop executable, including the secondary processes..  Moving from the Power BI logs that reside in the Performance folder, (see Part I here) we can view and connect the PIDs and TID, (Transaction IDs) to information from the Task Manager and the data displayed:

https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/07/Powerbi_tm1.p... 300w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/07/Powerbi_tm1.p... 768w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/07/Powerbi_tm1.p... 1400w" sizes="(max-width: 650px) 100vw, 650px" data-recalc-dims="1" />

Note that I’ve highlighted the thread count in the image above and we can see the total resource usage, but if we want to see it broken down, we can simply expand the left hand arrow to the application name:

https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/07/powerbi_tm2.p... 300w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/07/powerbi_tm2.p... 768w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/07/powerbi_tm2.p... 1400w" sizes="(max-width: 650px) 100vw, 650px" data-recalc-dims="1" />

We can see that there are numerous threads, with a few taking considerable memory over others-  The CefSharp.BrowserSubprocess can be a bit misleading-  It’s Power BI using Chromium to render the visuals that are part of the Power BI Desktop that’s part of the current run.  Chromium (CefSharp.BrowserSubprocess) subprocesses will always come in pairs, one for rendering and one for messaging.

In the Task Manager Details, we can see each of the PIDs that correspond with the processes IDs listed in the logs.  By updating our viewable columns, (right click, choose “threads” and click OK) you can now view how many threads are associated with a given PID.

https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/07/TM1.png?resiz... 300w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/07/TM1.png?resiz... 768w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/07/TM1.png?w=140... 1400w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/07/TM1.png?w=210... 2100w" sizes="(max-width: 650px) 100vw, 650px" data-recalc-dims="1" />

In the main view of the Task Manager, you can do something similar, right clicking in the top tabs and choosing to display the PID, the process type, (to verify what is what) and the executable to pair up entries in the log with the Task Manager.

https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/07/TM2.png?resiz... 300w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/07/TM2.png?resiz... 768w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/07/TM2.png?w=140... 1400w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/07/TM2.png?w=210... 2100w" sizes="(max-width: 650px) 100vw, 650px" data-recalc-dims="1" />

The SQL Server Analysis Service, the Windows console host and Power BI application subprocess are visible in the list as well.  Different types of Power BI data models, depending on the data sources, features and functions will effect what subprocesses are required to satisfy the demand.  By viewing them in the Task Manager, it helps identify what processing requires heavier resources.  This is just another step, another view into what’s going on behind the scenes with Power BI.

 

 



Tags:  , ,


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright © DBAKevlar [Power BI 101- Logging and Tracing, Part III], All Right Reserved. 2018.

Observations About the Scalability of Data Loads in ADWC

In the last days, I am running a number of tests based on the TPC-DS benchmark against Oracle’s Autonomous Data Warehouse Cloud service (ADWC). One of the first thing I did is of course to create the TPC-DS schema and populate it. The aim of this blog post is to share some observations related to the population step.

I started by generating the data with the tool provided by TPC-DS: dsdgen. With the following command, I generated 1TB of data:

$ dsdgen -scale 1000 -dir /data/tpcdsdata

The tool generated, in about 24 hours, the following files:

$ ls -1s --block-size=M /data/tpcdsdata/ *.dat
     1M /data/tpcdsdata/call_center.dat
     5M /data/tpcdsdata/catalog_page.dat
 22375M /data/tpcdsdata/catalog_returns.dat
302796M /data/tpcdsdata/catalog_sales.dat
   642M /data/tpcdsdata/customer_address.dat
  1560M /data/tpcdsdata/customer.dat
    77M /data/tpcdsdata/customer_demographics.dat
    10M /data/tpcdsdata/date_dim.dat
     1M /data/tpcdsdata/dbgen_version.dat
     1M /data/tpcdsdata/household_demographics.dat
     1M /data/tpcdsdata/income_band.dat
 16373M /data/tpcdsdata/inventory.dat
    83M /data/tpcdsdata/item.dat
     1M /data/tpcdsdata/promotion.dat
     1M /data/tpcdsdata/reason.dat
     1M /data/tpcdsdata/ship_mode.dat
     1M /data/tpcdsdata/store.dat
 34016M /data/tpcdsdata/store_returns.dat
399328M /data/tpcdsdata/store_sales.dat
     5M /data/tpcdsdata/time_dim.dat
     1M /data/tpcdsdata/warehouse.dat
     1M /data/tpcdsdata/web_page.dat
 10349M /data/tpcdsdata/web_returns.dat
151044M /data/tpcdsdata/web_sales.dat
     1M /data/tpcdsdata/web_site.dat

There are two main ways to load that data into ADWC: SQL*Loader or the DBMS_CLOUD package. Since I decided to use the latter, I had to load those files into an object cloud service.

I settled to load them into Oracle’s Object Storage with the following command:

$ oci os object bulk-upload --bucket-name tpcdsdata --src-dir /data/tpcdsdata --include "*.dat"

Since I used the defaults, the load was carried out by ten parallel threads. In case you wonder, it took almost 6 hours and during that time, two CPU cores were fully used.

Once all data was available in Oracle’s Object Storage, to decide how many CPU cores to allocate for the full load, I ran a number of loads of a single table of average size (CATALOG_RETURNS, ca. 22GB) and an increasing number of CPU cores allocated in ADWC. The simplified version of the PL/SQL block I used for one run was the following:

DECLARE
  l_field_list CLOB;
  l_table_name VARCHAR2(128) := 'CATALOG_RETURNS';
BEGIN
  -- get rid of old data   
  EXECUTE IMMEDIATE 'TRUNCATE TABLE '||l_table_name;

  -- generate field list
  SELECT listagg(column_name || decode(data_type, 'DATE', ' DATE "YYYY-MM-DD"', ''), ',') WITHIN GROUP (ORDER BY column_id)
  INTO l_field_list
  FROM user_tab_cols
  WHERE table_name = l_table_name;

  -- load data
  dbms_cloud.copy_data(
    schema_name => user,
    table_name => l_table_name,
    credential_name => 'CHRIS',
    file_uri_list => 'https://swiftobjectstorage.eu-frankfurt-1.oraclecloud.com/v1/chris/tpcdsdata/catalog_returns.dat',
    field_list => l_field_list
  );
END;

The following chart summarizes what I observed:

As you can see, as the number of CPU cores doubles, the load time decreased of factor 1.7-1.8. Which, in my opinion, is rather good. Note that I did not test with 128 CPU cores (the maximum ADWC supports), because I was not able to allocate so many CPU cores for a single service.

What I found interesting is to have a look at the way the database engine parallelize the loads. By the way, if you are asking yourself how the data is loaded, it is done with a simple SQL statement like “INSERT /*+ append enable_parallel_dml */ INTO “TPCDS1″.”CATALOG_RETURNS” SELECT * FROM COPY$RD0ZOZY5DB25HSH0CB24”. For that purpose, for every load, I generated a real-time SQL monitoring report for the INSERT statement. Here is, for each of them, the information about the parallel execution:

  • 2 CPU cores – DOP 4, one instance involved

  • 4 CPU cores – DOP 8, one instance involved

  • 8 CPU cores – DOP 16, one instance involved

  • 16 CPU cores – DOP 32, one instance involved

  • 32 CPU cores – DOP 64, two instances involved (distribution of parallel execution processes: 32/32)

  • 64 CPU cores – DOP 128, three instances involved (distribution of parallel execution processes: 43/43/42)

As you can see:

  • The database engine selects a DOP which is two times the number of allocated CPU cores.
  • Up to a DOP of 32, all parallel execution processes are started on the same instance (5) as the query coordinator.
  • With a DOP of 64, half of the parallel execution processes are started on instance 5 and the other half is started on instance 8.
  • With a DOP of 128, one third of the parallel execution processes are started on instance 5, another third on instance 7, and the others on instance 8.
  • Since the processing is CPU bound and that the number of involved parallel execution processes is two times the number of allocated CPU cores, the Resource Manager kicks in. As a result, a lot of time was spent on the “resmgr: cpu quantum” wait event.

The last test I did was loading the full 1TB of data. The PL/SQL code I used was similar to the one used for the CATALOG_RETURNS table. The only major difference is an additional loop to process all tables/files. With 32 CPU cores allocated, the load took 67 minutes.

In summary, the loads scale well with an increasing number of CPU cores. But, be careful, because of the high DOP, they can also use all CPU resources of the service. It is therefore essential to start them wisely. E.g. by associating them with a sensible Resource Manager group.

Humble pie made with NULL strings

I was helping out a client a while back with an issue where a panicked email came into my inbox along the lines of “SELECT IS BROKEN IN ORACLE!!!”, which seemed perhaps a little extreme Smile. So I pursued it further asking for some concrete details, and I must concede it had me a little bamboozled for a while. I’ve simplified the example to keep it easy to digest, but the premise is the same.

My colleague had a table with a couple of VARCHAR2 columns:

image

and the data inside the two columns for the single row was identical:

image

The panicky email was sent because a simple query to check that the two column were identical was not working as expected:

image

At this point, I was pretty relaxed about the situation because this is a “problem” I have seen many many times over the years, especially as people come to Oracle from other database management systems. So I responded to the email:

“Yeah…I know what the issue is. You have trailing spaces in one of the columns. Try RTRIM”

and filed the original email in that special folder where emails never resurface Smile

image

But as quickly as I had hit “Send”, a reply came back with “That doesn’t work either”. Working on the assumption that my email had not been explicit enough (and perhaps I was being a little smug about it) I logged in to the system to run a query on the same table, and lo and behold – my smugness was wiped off my face Smile

image

So then I tried a variation on a theme and concatenated a known value to the end of each column to see if I was missing anything:

image

So everything looks OK but obviously something is awry here. When you have doubts on the data, a good option is to use the DUMP function to see exactly what bytes are stored:

image

and the problem is revealed.  The second column in the table had a trailing ASCII code zero after the word SUCCESS. This can be a nasty (and common) issue when developers are using C or similar languages to store data in an Oracle database. In many languages, ASCII zero is used to terminate an arbitrary length string, and hence it can easily end up being incorrectly stored along with the string bytes in a VARCHAR2 column especially if you are not using the delivered string handling functions in the pre-compiler layer (such as Pro*C). This can also be due to a confusion between the concept of NULL (two L’s) in the database, and the concept of the NUL (one L) character in a programming language.

Compounding the issue is that a casual glance at the data (as per the above screen shots from SQL Developer) typically suggest all is well – the NUL character is not observed. Things can get even worse – I’ve seen some GUI database tools interpret the NUL character as a true NULL and hence when a column contains only a single NUL character, those tools will report the column as being empty (ie NULL), which just makes for even more confusion. Thankfully SQL Developer does not do that, and the null indicator column can be used to see the difference between the two:

image

So I sent a terse email back to the developer reminding them about NUL versus NULL and that if they are using the pre-compiler correctly, then everything will take care of itself. But I must admit, I had to chuckle when they replied with a link to an AskTOM question I answered recently describing the following:

We made the same mistake ourselves! Albeit in a very specific circumstance. Look what happens when you force a STOP command onto a running scheduler job:



SQL> create table t ( x timestamp, y int );

Table created.

SQL> create or replace
  2  procedure myproc is
  3  begin
  4    for i in 1 .. 20
  5    loop
  6      insert into t values (systimestamp,i);commit;
  7      dbms_lock.sleep(2);
  8    end loop;
  9  end;
 10  /

Procedure created.

SQL>
SQL> begin
  2    dbms_scheduler.create_job (
  3      job_name        => 'myjob',
  4      job_type        => 'plsql_block',
  5      job_action      => 'begin myproc; end;',
  6      start_date      => systimestamp,
  7      repeat_interval => 'freq=minutely; bysecond=0;',
  8      enabled         => true);
  9  end;
 10  /

PL/SQL procedure successfully completed.
--
-- waited for 1 successful execution, and then stop the second one in flight
--
SQL> exec dbms_scheduler.stop_job('MYJOB',force=>true)

PL/SQL procedure successfully completed.

SQL> select status, session_id, dump(session_id)
  2  from USER_SCHEDULER_JOB_RUN_DETAILS;

STATUS
------------------------------
SESSION_ID
-------------------------------------------------------------
DUMP(SESSION_ID)
-------------------------------------------------------------
SUCCEEDED
984,53037
Typ=1 Len=9: 57,56,52,44,53,51,48,51,55

STOPPED
983,28542
Typ=1 Len=10: 57,56,51,44,50,56,53,52,50,0  <<=== whoops!

I’ve logged this as a bug but in the interim, if you need the SESSION_ID from the scheduler views, you might want to add a: RTRIM(SESSION_ID, chr(0)) around the query.

Hoist by my own NULL petard Smile

Hyper-partitioned index avoidance thingamajig

As you can tell, I have no idea on a name for what I am about to describe. So let me start from the beginning, and set the scene for an idea I have to utilize a cool new 18c feature.

Often in a transactional-style system the busiest table (let us call it SALES for the sake of this discussion) is also

  • the biggest table, after all, it has all of our sales in it,
  • the most demanded for table, in that, almost every query in our application wants to access it in some way shape or form.

This is in effect the database version of the Pareto Principle. Everyone wants a slice of that SALES “pie”, and the piece of that pie that is in most demand is typically the most recent data. Your application may have pages that will be showing:

  • sales in the past hour,
  • sales in the past day,
  • products sold in the past hour,
  • largest purchase amount for sales in the past few hours,
  • suspicious activity for today,
  • A customer will want their sales for the last few days

The list goes on and on, the common theme being that the data being requested is bound by a range of time in the recent past. Our SALES table already will have a primary key, probably being some sort of unique transaction ID for each purchase, but to the satisfy the style of requests in the list above, we will probably need:

  • an index on the time(stamp) of the sale,
  • an index on the product ID that was sold,
  • an index on the customer ID who made the purchase,
  • potentially even an index on the sale amount

You can see the troubled waters into which we are sailing here. It is our biggest and busiest table, and here we are, adding index after index after index to improve query performance, whilst at the same time:

  1. adding overhead/contention to DML operations on the SALES table,
  2. increasing the size of the database,
  3. increasing the duration of the backups,
  4. increasing time for maintenance and copies to Development and Test

None of this is looking great but we might be thinking “What choice do we have?”

Here is perhaps an alternate strategy: Let’s not create any indexes.  The challenges (1) through (4) above evaporate to non-existence. But of course, now we have a remaining challenge in getting those queries to run efficiently.

I’ll tackle that in a different way – I will partition the SALES table very “aggressively”, hence my blog post title “hyper-partitioning”. Considering the typical query requirements I listed above, I will partition my SALES down to as small as a partition for every hour.


SQL> create table sales
  2    ( ts   timestamp,
  3      id   int,
  4      amt  number,
  5      product int,
  6      customer int,
  7      item_cnt int,
  8      terminal int,
  9      operator int,
 10      credit_card int,
         ...
         ...
 17    )
 18  partition by range ( ts )
 19  interval ( numtodsinterval(1,'HOUR') )
 20  (
 21    partition p1 values less than ( timestamp '2018-07-01 00:00:00' )
 22  );

Table created.

SQL>
SQL> insert /*+ APPEND */ into sales (ts,id,amt,product,customer)
  2  select date '2018-07-01' + rownum / 240, rownum, dbms_random.value(1,100),
  3        dbms_random.value(1,100),dbms_random.value(1,100)
  4  from dual
  5  connect by level 
SQL> set serverout on
SQL> declare
  2    h varchar2(1000);
  3  begin
  4  for i in (
  5    select partition_name, high_value
  6    from user_tab_partitions
  7    where table_name = 'SALES'
  8    and   interval = 'YES'
  9    and   partition_name like 'SYS_P%'
 10    order by partition_position
 11  ) loop
 12    h := i.high_value;
 13    execute immediate 'select to_char('||h||'-0.01,''yyyymmdd_hh24'') from dual' into h;
 14    execute immediate 'alter table sales rename partition '||i.partition_name||' to p'||h;
 15  end loop;
 16  end;
 17  /

PL/SQL procedure successfully completed.

SQL>
SQL> col high_value format a60
SQL> select partition_name, high_value
  2  from user_tab_partitions
  3  where table_name = 'SALES'
  4  order by partition_position;

PARTITION_NAME                 HIGH_VALUE
------------------------------ ------------------------------------------------------------
P1                             TIMESTAMP' 2018-07-01 00:00:00'
P20180701_00                   TIMESTAMP' 2018-07-01 01:00:00'
P20180701_01                   TIMESTAMP' 2018-07-01 02:00:00'
P20180701_02                   TIMESTAMP' 2018-07-01 03:00:00'
P20180701_03                   TIMESTAMP' 2018-07-01 04:00:00'
P20180701_04                   TIMESTAMP' 2018-07-01 05:00:00'
P20180701_05                   TIMESTAMP' 2018-07-01 06:00:00'
P20180701_06                   TIMESTAMP' 2018-07-01 07:00:00'
P20180701_07                   TIMESTAMP' 2018-07-01 08:00:00'
P20180701_08                   TIMESTAMP' 2018-07-01 09:00:00'
P20180701_09                   TIMESTAMP' 2018-07-01 10:00:00'
P20180701_10                   TIMESTAMP' 2018-07-01 11:00:00'
P20180701_11                   TIMESTAMP' 2018-07-01 12:00:00'
P20180701_12                   TIMESTAMP' 2018-07-01 13:00:00'
P20180701_13                   TIMESTAMP' 2018-07-01 14:00:00'
P20180701_14                   TIMESTAMP' 2018-07-01 15:00:00'
P20180701_15                   TIMESTAMP' 2018-07-01 16:00:00'
P20180701_16                   TIMESTAMP' 2018-07-01 17:00:00'
P20180701_17                   TIMESTAMP' 2018-07-01 18:00:00'
P20180701_18                   TIMESTAMP' 2018-07-01 19:00:00'
P20180701_19                   TIMESTAMP' 2018-07-01 20:00:00'
P20180701_20                   TIMESTAMP' 2018-07-01 21:00:00'
P20180701_21                   TIMESTAMP' 2018-07-01 22:00:00'
P20180701_22                   TIMESTAMP' 2018-07-01 23:00:00'
P20180701_23                   TIMESTAMP' 2018-07-02 00:00:00'
P20180702_00                   TIMESTAMP' 2018-07-02 01:00:00'
P20180702_01                   TIMESTAMP' 2018-07-02 02:00:00'
P20180702_02                   TIMESTAMP' 2018-07-02 03:00:00'
P20180702_03                   TIMESTAMP' 2018-07-02 04:00:00'
P20180702_04                   TIMESTAMP' 2018-07-02 05:00:00'
P20180702_05                   TIMESTAMP' 2018-07-02 06:00:00'
P20180702_06                   TIMESTAMP' 2018-07-02 07:00:00'
P20180702_07                   TIMESTAMP' 2018-07-02 08:00:00'
P20180702_08                   TIMESTAMP' 2018-07-02 09:00:00'
P20180702_09                   TIMESTAMP' 2018-07-02 10:00:00'
P20180702_10                   TIMESTAMP' 2018-07-02 11:00:00'
P20180702_11                   TIMESTAMP' 2018-07-02 12:00:00'
P20180702_12                   TIMESTAMP' 2018-07-02 13:00:00'
P20180702_13                   TIMESTAMP' 2018-07-02 14:00:00'
P20180702_14                   TIMESTAMP' 2018-07-02 15:00:00'
P20180702_15                   TIMESTAMP' 2018-07-02 16:00:00'

41 rows selected.

I’ve run a small anonymous block to rename the (system-named) interval partitions into some sensible names to reflect the date range the partition covers. Let’s now look at the typical queries we will now be performing on the SALES table:


SQL> set autotrace traceonly explain
SQL> select * from sales
  2  where ts > timestamp '2018-07-02 15:00:00';

--------------------------------------------------------------------------------------------------
| Id  | Operation                | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |       |     9 |   387 |     7   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ITERATOR|       |     9 |   387 |     7   (0)| 00:00:01 |    41 |1048575|
|*  2 |   TABLE ACCESS FULL      | SALES |     9 |   387 |     7   (0)| 00:00:01 |    41 |1048575|
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("TS">TIMESTAMP' 2018-07-02 15:00:00.000000000')

SQL> select * from sales
  2  where ts > timestamp '2018-07-02 15:00:00'
  3  and product = 12;

--------------------------------------------------------------------------------------------------
| Id  | Operation                | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |       |     1 |    43 |     7   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ITERATOR|       |     1 |    43 |     7   (0)| 00:00:01 |    41 |1048575|
|*  2 |   TABLE ACCESS FULL      | SALES |     1 |    43 |     7   (0)| 00:00:01 |    41 |1048575|
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("PRODUCT"=12 AND "TS">TIMESTAMP' 2018-07-02 15:00:00.000000000')

SQL>
SQL> select * from sales
  2  where ts > timestamp '2018-07-02 15:00:00'
  3  and customer = 25;

--------------------------------------------------------------------------------------------------
| Id  | Operation                | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |       |     1 |    43 |     7   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ITERATOR|       |     1 |    43 |     7   (0)| 00:00:01 |    41 |1048575|
|*  2 |   TABLE ACCESS FULL      | SALES |     1 |    43 |     7   (0)| 00:00:01 |    41 |1048575|
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("CUSTOMER"=25 AND "TS">TIMESTAMP' 2018-07-02 15:00:00.000000000')

SQL>
SQL> select max(amt) from sales
  2  where ts > timestamp '2018-07-02 15:00:00';

---------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |       |     1 |    33 |     7   (0)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE           |       |     1 |    33 |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR|       |     9 |   297 |     7   (0)| 00:00:01 |    41 |1048575|
|*  3 |    TABLE ACCESS FULL      | SALES |     9 |   297 |     7   (0)| 00:00:01 |    41 |1048575|
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("TS">TIMESTAMP' 2018-07-02 15:00:00.000000000')

SQL> set autotrace off

All of them scan a tiny portion of the data, namely, just the hours of sales data relevant to the query, and the query response times will be relatively consistent for all cases no matter which customer, product or other predicate will be passed because the data to be scanned is a fixed number of hours.

But there’s a problem here. If I am partitioning to the hour, or even to the minute…then it won’t be long before I have a lot of partitions. In the latter case (minutes) I will be up to over 500,000 partitions in the first year of SALES alone! That is a lot of database metadata to store. There is the partitions themselves, plus optimizer statistics on them, plus historical optimizer statistics, plus potentially histograms on every column.  A lot of optimizer data might lead to expensive parse times because there is just so much information to wade through when optimizing queries.

But we only need the extreme granularity of partitions for the SALES table for today. Once today ticks over and becomes “yesterday”, then we might only need a partition for the entire day.  And once “yesterday” ticks over to “last week”, then maybe only weekly partitions are needed and so forth.

One of the cool things in 18c is ability to do this style of maintenance with negligible disruption to service. Because I have named my partitions in a logical fashion, here is a simple routine to merge “yesterdays” hourly partitions into a single one for the day.


SQL> set serverout on
SQL> declare
  2    d date := date '2018-07-01';
  3    ddl varchar2(4000);
  4  begin
  5    select listagg(partition_name||chr(10),',') within group ( order by partition_position )
  6    into   ddl
  7    from   user_tab_partitions
  8    where  table_name = 'SALES'
  9    and    partition_name like 'P'||to_char(d,'yyyymmdd')||'%';
 10
 11    ddl := 'alter table sales merge partitions '||ddl||' into partition p'||to_char(d,'yyyymmdd')||' online';
 12
 13    dbms_output.put_line(ddl);
 14    execute immediate ddl;
 15  end;
 16  /
alter table sales merge partitions
 P20180701_00
,P20180701_01
,P20180701_02
,P20180701_03
,P20180701_04
,P20180701_05
,P20180701_06
,P20180701_07
,P20180701_08
,P20180701_09
,P20180701_10
,P20180701_11
,P20180701_12
,P20180701_13
,P20180701_14
,P20180701_15
,P20180701_16
,P20180701_17
,P20180701_18
,P20180701_19
,P20180701_20
,P20180701_21
,P20180701_22
,P20180701_23
 into partition p20180701 online

PL/SQL procedure successfully completed.

Now I have a single partition for yesterday’s data, and hourly partitions for today’s data.


SQL> col high_value format a60
SQL> select partition_name, high_value
  2  from user_tab_partitions
  3  where table_name = 'SALES'
  4  order by partition_position;

PARTITION_NAME                 HIGH_VALUE
------------------------------ -------------------------------
P1                             TIMESTAMP' 2018-07-01 00:00:00'
P20180701                      TIMESTAMP' 2018-07-02 00:00:00'
P20180702_00                   TIMESTAMP' 2018-07-02 01:00:00'
P20180702_01                   TIMESTAMP' 2018-07-02 02:00:00'
P20180702_02                   TIMESTAMP' 2018-07-02 03:00:00'
P20180702_03                   TIMESTAMP' 2018-07-02 04:00:00'
P20180702_04                   TIMESTAMP' 2018-07-02 05:00:00'
P20180702_05                   TIMESTAMP' 2018-07-02 06:00:00'
P20180702_06                   TIMESTAMP' 2018-07-02 07:00:00'
P20180702_07                   TIMESTAMP' 2018-07-02 08:00:00'
P20180702_08                   TIMESTAMP' 2018-07-02 09:00:00'
P20180702_09                   TIMESTAMP' 2018-07-02 10:00:00'
P20180702_10                   TIMESTAMP' 2018-07-02 11:00:00'
P20180702_11                   TIMESTAMP' 2018-07-02 12:00:00'
P20180702_12                   TIMESTAMP' 2018-07-02 13:00:00'
P20180702_13                   TIMESTAMP' 2018-07-02 14:00:00'
P20180702_14                   TIMESTAMP' 2018-07-02 15:00:00'
P20180702_15                   TIMESTAMP' 2018-07-02 16:00:00'

18 rows selected.

I stress – this is not my recommendation to race out and partition every transactional table you have, and drop all of the indexes Smile. But it with so many online partitioning operations in 18c, it raises some exciting new opportunities there were not available in previous releases. So start thinking about how you can exploit this to get advantages with the partitioning option that might sit “outside the box” of the standard usage.

18.3 As easy as 1…2…3

Well, finally it’s here! 18c for on-premise installation so the world can all get stuck into the cool new features of the latest release on their own laptops Smile  At least that is what I’ll be doing!

Naturally as soon as I heard the news, I downloaded the software and got ready to set aside the day for installation and creation of an 18c database. But I didn’t need that long – I didn’t need that long at all. Just a few clicks and a few commands and there it was – my 18c database up and running.

Check out how easy it is with my three videos.

Software Installation

Listener Creation

Database Creation

It really is as easy as 1…2…3

Enjoy 18c !

Little things worth knowing: Creating a RAC One Node database on the command line

This post is going to be super short, and mostly just a note to myself as I constantly forget how to create a RAC One database on the command line. This post is for 12.2.0.1 but should be similar on 12.1 (although I didn’t test!).

Provided you are licensed appropriately, this is probably the most basic way how you create an admin-managed RAC One database on Linux for use in a lab environment:

dbca -silent -createDatabase -gdbName RON -templateName gold_image01.dbc \
 -createAsContainerDatabase false -databaseConfigType RACONENODE \
 -RACOneNodeServiceName RON_SVC -totalMemory 1024 \
 -nodelist rac122node1,rac122node2 -storageType ASM \
 -datafileDestination '+DATA' -recoveryAreaDestination '+RECO' \
 -recoveryAreaSize 10240

This works for me, but most likely not for you :) And it’s certainly not suitable for a production deployment. Make sure to adapt the command as needed; I tend to create gold images for use with dbca, and this is one example.

The command itself should be fairly self-explanatory. If you are unsure about the meaning of the various options, have a look at the output of “dbca -help -createDatabase” and the official documentation/My Oracle Support. I learned the hard way that forgetting the “-nodelist” results in a single instance creation instead of an error message.

I didn’t find too many examples on the net, hope someone finds this useful.

Release 18.0.0.0.0 Version 18.3.0.0.0 On-Premises binaries

By Franck Pachot

.
Good news, the latest Patchset for Oracle 12cR2 (which is not named patchset anymore, is actually called release 18c and numbered 18.0.0.0.0) is available for download on OTN. It is great because OTN download does not require access to Support and Software Updates. It is available to anybody under the Free Developer License Terms (basically development, testing, prototyping, and demonstrating for an application that is not in production and for non-commercial use). We all complained about the ‘Cloud First’ strategy because we were are eager to download the latest version. But the positive aspect of it is that we have now on OTN a release that has been stabilized after a few release updates. In the past, only the first version of the latest release was available there. Now we have one with many bug fixed.

Of course, I didn’t wait and I have tested 18c as soon as it was available on the Oracle Cloud thanks to the ACE Director program that provided me with some Cloud Credits. In this post, I’ll update my Cloud database to run it with the on-premises binary. Because that’s the big strength of Oracle: we can run the same software, 100% compatible, on the Cloud and on our own servers. There are some limitations in the features available, but technically it is the same software.

Oracle Cloud First

Here is my Cloud version of Oracle 18c installed on February (18.1) updated on April (18.2) and July (18.3):

SQLcl: Release 18.2 Production on Tue Jul 24 11:02:56 2018
 
Copyright (c) 1982, 2018, Oracle. All rights reserved.
 
Connected to:
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.3.0.0.0
 
SQL> host $ORACLE_HOME/OPatch/opatch lspatches
28090523;Database Release Update : 18.3.0.0.180717 (28090523)
OPatch succeeded.
 
SQL> select banner from v$version;
 
BANNER
--------------------------------------------------------------------------------
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
 
SQL> select banner_full from v$version;
 
BANNER_FULL
----------------------------------------------------------------------------------------------------
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.3.0.0.0
 
SQL> select banner_legacy from v$version;
 
BANNER_LEGACY
--------------------------------------------------------------------------------
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
 
SQL> exec dbms_qopatch.get_sqlpatch_status;
 
Patch Id : 27676517
Action : APPLY
Action Time : 18-APR-2018 20:44:50
Description : Database Release Update : 18.2.0.0.180417 (27676517)
Logfile : /u01/app/oracle/cfgtoollogs/sqlpatch/27676517/22097537/27676517_apply_CDB1_CDBROOT_2018Apr18_20_43_27.log
Status : SUCCESS
 
Patch Id : 28090523
Action : APPLY
Action Time : 18-JUL-2018 11:38:20
Description : Database Release Update : 18.3.0.0.180717 (28090523)
Logfile : /u01/app/oracle/cfgtoollogs/sqlpatch/28090523/22329768/28090523_apply_CDB1_CDBROOT_2018Jul18_11_36_38.log
Status : SUCCESS
 
PL/SQL procedure successfully completed.

LINUX.X64_180000_db_home.zip

I have installed the on-premises 18c available on OTN. The good things with the new releases are:

  • No need to extract installer files. Just unzip the Oracle Home and link the executable
  • This Oracle Home image already includes the latest Release Updates


SQLcl: Release 18.2 Production on Tue Jul 24 11:02:56 2018
 
Copyright (c) 1982, 2018, Oracle. All rights reserved.
 
Connected to:
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.3.0.0.0
 
SQL> host $ORACLE_HOME/OPatch/opatch lspatches
27908644;UPDATE 18.3 DATABASE CLIENT JDK IN ORACLE HOME TO JDK8U171
27923415;OJVM RELEASE UPDATE: 18.3.0.0.180717 (27923415)
28090553;OCW RELEASE UPDATE 18.3.0.0.0 (28090553)
28090523;Database Release Update : 18.3.0.0.180717 (28090523)
OPatch succeeded.
 

We have 4 updates from July here for the following components:

  • The Database (28090523)
  • The Java in the Oracle Home, aka JDK (27908644)
  • The Java in the database, aka OJVM (27923415)
  • The clusterware component for the database to match the CRS, aka OCW (28090553)

So, now we have an image of the Oracle Home which already contains all the latest updates… except one:

$ cat $ORACLE_HOME/sqldeveloper/sqldeveloper/bin/version.properties
 
COMPANY=Oracle
PRODUCT=SQL Developer
VERSION=17.30003410937f
VER=17.3.2
VER_FULL=17.3.2.341.0937
BUILD_LABEL=341.0937
BUILD_NUM=341.0937
EDITION=

Unfortunately, that’s an old version of SQL Developer here, and with no SQLcl. Then just download this additional one and unzip it in the Oracle Home.

DataPatch

So, what happens when I open the database that I have created on 18.1 and patched with 18.2 and 18.3 RUs on the Oracle Cloud? There are two updates for the database (DBRU and OJVM). The DBRU is already there then DataPatch has only to apply the OJVM:

[oracle@VM183x dbhome_1]$ $ORACLE_HOME/OPatch/datapatch
SQL Patching tool version 18.0.0.0.0 Production on Tue Jul 24 10:57:55 2018
Copyright (c) 2012, 2018, Oracle. All rights reserved.
 
Log file for this invocation: /u01/app/oracle/cfgtoollogs/sqlpatch/sqlpatch_11104_2018_07_24_10_57_5 5/sqlpatch_invocation.log
 
Connecting to database...OK
Gathering database info...done
 
Note: Datapatch will only apply or rollback SQL fixes for PDBs
that are in an open state, no patches will be applied to closed PDBs.
Please refer to Note: Datapatch: Database 12c Post Patch SQL Automation
(Doc ID 1585822.1)
 
Bootstrapping registry and package to current versions...done
Determining current state...done
 
Current state of interim SQL patches:
Interim patch 27923415 (OJVM RELEASE UPDATE: 18.3.0.0.180717 (27923415)):
Binary registry: Installed
PDB CDB$ROOT: Not installed
PDB PDB$SEED: Not installed
PDB PDB1: Not installed
 
Current state of release update SQL patches:
Binary registry:
18.3.0.0.0 Release_Update 1806280943: Installed
PDB CDB$ROOT:
Applied 18.3.0.0.0 Release_Update 1806280943 successfully on 18-JUL-18 11.38.20.321353 AM
PDB PDB$SEED:
Applied 18.3.0.0.0 Release_Update 1806280943 successfully on 18-JUL-18 11.38.21.802495 AM
PDB PDB1:
Applied 18.3.0.0.0 Release_Update 1806280943 successfully on 18-JUL-18 11.38.23.230513 AM
 
Adding patches to installation queue and performing prereq checks...done
Installation queue:
For the following PDBs: CDB$ROOT PDB$SEED PDB1
No interim patches need to be rolled back
No release update patches need to be installed
The following interim patches will be applied:
27923415 (OJVM RELEASE UPDATE: 18.3.0.0.180717 (27923415))
 
Installing patches...
Patch installation complete. Total patches installed: 3
 
Validating logfiles...done
Patch 27923415 apply (pdb CDB$ROOT): SUCCESS
logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/27923415/22239273/27923415_apply_CDB1_CDBROOT_2018Jul24_10_58_08. log (no errors)
Patch 27923415 apply (pdb PDB$SEED): SUCCESS
logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/27923415/22239273/27923415_apply_CDB1_PDBSEED_2018Jul24_10_58_56. log (no errors)
Patch 27923415 apply (pdb PDB1): SUCCESS
logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/27923415/22239273/27923415_apply_CDB1_PDB1_2018Jul24_10_58_56.log (no errors)
SQL Patching tool complete on Tue Jul 24 10:59:21 2018

Now here is the history of patches:

SQL> exec dbms_qopatch.get_sqlpatch_status;
 
Patch Id : 27676517
Action : APPLY
Action Time : 18-APR-2018 20:44:50
Description : Database Release Update : 18.2.0.0.180417 (27676517)
Logfile : /u01/app/oracle/cfgtoollogs/sqlpatch/27676517/22097537/27676517_apply_CDB1_CDBROOT_2018Apr18_20_43_27.log
Status : SUCCESS
 
Patch Id : 28090523
Action : APPLY
Action Time : 18-JUL-2018 11:38:20
Description : Database Release Update : 18.3.0.0.180717 (28090523)
Logfile : /u01/app/oracle/cfgtoollogs/sqlpatch/28090523/22329768/28090523_apply_CDB1_CDBROOT_2018Jul18_11_36_38.log
Status : SUCCESS
 
Patch Id : 27923415
Action : APPLY
Action Time : 24-JUL-2018 10:59:19
Description : OJVM RELEASE UPDATE: 18.3.0.0.180717 (27923415)
Logfile : /u01/app/oracle/cfgtoollogs/sqlpatch/27923415/22239273/27923415_apply_CDB1_CDBROOT_2018Jul24_10_58_08.log
Status : SUCCESS
 
PL/SQL procedure successfully completed.

This is all good. Despite the different release schedules, the level of software is exactly the same. And we can start on-premises on a release with low regression risk (18c like a patchset) but many fixes (several release updates). For the moment only the Linux port is there. The other platforms should come this summer.

 

Cet article Release 18.0.0.0.0 Version 18.3.0.0.0 On-Premises binaries est apparu en premier sur Blog dbi services.

That demned elusive archive log!

Introduction

With apologies to Emma Orczy again for stealing a line from “The Scarlet Pimpernel” … </p />
</p></div>

    	  	<div class=

That demned elusive archive log!

Introduction

With apologies to Emma Orczy again for stealing a line from “The Scarlet Pimpernel” … </p />
</p></div>

    	  	<div class=

Power BI 101- Logging and Tracing, Part II

So we went over locations and the basics of logging and tracing in Power BI.  I now want to know how to make more sense from the data.  In Oracle, we use a utility called TKProf, (along with others and a number of third party tools) to make sense of what comes from the logs.  SQL Server has Log Analytics and the profiler, but what can I do with Power BI?

First, let’s discuss what happens when we have actual activity.  In my first post, the system was pretty static.  This time I chose to open up a file with larger data refreshes from multiple sources, added tables, calculated columns and measures.  The one Access DB has over 10 million rows that is refreshed when I first open the PBIX file:

https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/07/logging.png?r... 300w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/07/logging.png?r... 768w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/07/logging.png?w... 1400w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/07/logging.png?w... 2100w" sizes="(max-width: 628px) 100vw, 628px" data-recalc-dims="1" />

Post loading, there’s a significant increase in number of MS Mashup Container, (calculations and measures) and msmdsrv, (data loading) logging:

https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/07/logging1.png?... 300w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/07/logging1.png?... 768w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/07/logging1.png?... 1400w" sizes="(max-width: 650px) 100vw, 650px" data-recalc-dims="1" />

Do I really want to go through all this data by hand?  BI is a reporting tool, so what if I bring them into Power BI?  Let’s start with the first MS Mashup Container log-

In Power BI, click on Get Data –> Text and change the file type to “All Files” in the explorer and go to the directory that contains the trace files:

C:\Users\\AppData\Local\Microsoft\Power BI Desktop\Traces\Performance

Remember that you will need to have “hidden items” set to be displayed to browse down to this folder.  Choose the files you wish to load in the directory and Power BI and choose a Customer delimiter of a quotes, (“) to separate the file.  This will load a file that will have a few columns you’ll need to remove that contain data like colons, nulls and other syntax from the file.  Once you’ve completed doing this, you most likely have a table with 15 columns of valuable data:

https://i1.wp.com/dbakevlar.com/wp-content/uploads/2018/07/columns_loggi... 145w" sizes="(max-width: 175px) 100vw, 175px" data-recalc-dims="1" />

I’ve renamed the columns to something more descriptive and I can now apply these changes and pull some value from the data.

Using the provided data, I can then produce a report that tells me about what types of processes are the largest users of resources and time.  I can provide reports to grant a visual on what’s going on in a Power BI environment.  The report is pretty straightforward-  Wait events against percentage of waits, Memory allocation over time, Time Waited and Wait Count.  These reports may seem really foreign for most data scientists, but for a DBA, it should resonate and provide them with ways they can offer assistance to the Power BI group in optimization.

https://i1.wp.com/dbakevlar.com/wp-content/uploads/2018/07/logging_bi.pn... 300w, https://i1.wp.com/dbakevlar.com/wp-content/uploads/2018/07/logging_bi.pn... 768w, https://i1.wp.com/dbakevlar.com/wp-content/uploads/2018/07/logging_bi.pn... 1400w, https://i1.wp.com/dbakevlar.com/wp-content/uploads/2018/07/logging_bi.pn... 2100w" sizes="(max-width: 650px) 100vw, 650px" data-recalc-dims="1" />

I can add hierarchy to this to drill down into interesting areas of waits and add more files, identifying each table by the file unique identifier and date that it came from going forward.  I expect my reports and my direction to look different from the direction many have taken with Power BI performance, but I wanted to demonstrate that optimization is always about time.  I admit fully that I’m still learning, but I also am approaching this from a database optimization perspective.  Please let me know your thoughts?

Happy hunting, folks!

 

 

 



Tags:  , ,


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright © DBAKevlar [Power BI 101- Logging and Tracing, Part II], All Right Reserved. 2018.