Friday, 3 February 2017

This week 2/2017

One hopeless project in which I have taken a part has been taught me a few tricks in distribution database architecture, where each database has his specific rule in my organisation.

Remote queries
In my solution take a part 3 oracle 11g+ databases:
1. DB A - designed as storage of all changes applied on source system database,
2. DB D - designed as storage of reports - data warehouse
3. DB B - target database where I need to collect specific data from other databases and generate some report.
To get data from A and D I use dblink.
At the begging I created plain pl/SQL queries with business logic. When I executed them first time, I exceeded database temp area (128G). It was impossible to get data without a few tricks.
At beginning I have to describe what happened. So... if I have join like this:


1
2
3
4
5
6
7
8
9
SELECT A.ID,
       A.NAME,
       B.NAMEB
  FROM TAB_A@REMOTE_DB A
  LEFT JOIN TAB_B@REMOTE_DB B
    ON (A.ID =B.ID)
 WHERE B.ID IN (SELECT C.FK_ID
                  FROM TAB_C@REMOTE_DB C
                 WHERE C.NAME ='Tom') 

Database B executes 3 queries, transfer data from remote database to local area and locally matches them.
If we have huge tables but result of that select returns much less rows it's mean that this query should be executed remotely and only results should be transferred to client.
To force database to do something like this, there is a hint:


1
DRIVING_SITE(remote-table-alias)

Unfortunately my experiences shows that it doesn't work with materialised views, creating tables from query, etc. Anyway I found workaround for this problem using procedure with declared cursor. This way I could insert data from query executed fully remotely.
In my case this solution was useful only during testing. Finally I have to move logic on server side and create some views on server side.

Parallel execution
Second useful hint is to force database to execute query in parallel mode.

1
PARALLEL(4)

This mode doesn't work throw dblink as most hints but it is useful locally.


Materialized view

I had a chance to test performance of creating and refreshing materialized views for remote queries which I described above. Result of that tests are bellow:

Description of case Estimated time cost [sek]
create the materialized view 100
refresh the materialized view by execution
DBMS_MVIEW.REFRESH(MVIEW_NAME)
3000
truncate container table and refresh the materialized view by execution
DBMS_MVIEW.REFRESH(MVIEW_NAME, ATOMIC_REFRESH => FALSE)
104
drop and create a materialized view 110
refresh by execution
DBMS_MVIEW.REFRESH(MVIEW_NAME, ATOMIC_REFRESH => FALSE)
98
refresh the materialized view by execution
DBMS_MVIEW.REFRESH('MVIEW_NAME', PARALLELISM => 4, ATOMIC_REFRESH => FALSE);
94

By default materialized view has ATOMIC_REFRESH set to true and all operation are made in transaction one by one. In my case it is not required.

When refreshing more than one materialized view it is possible to do that parallel by execution list of materialized views, ex:

DBMS_MVIEW.REFRESH(LIST => 'MV_A,MV_B,MV_C', PARALLELISM => 4, ATOMIC_REFRESH => FALSE);

If there are some dependences between materialized views, it is possible to turn on specialised analyser but I didn't use it.


Grouping in partitions
By the way PL/SQL allows to grouping in some subgroups. Ex.
If there is a table with not unique customer numbers and it is needed to get his last inserted name, it is possible to do this by query:


1
2
3
4
5
6
7
8
9
SELECT GC.* 
FROM (SELECT P.ID,
             P.NAME,
             P.CUSTOMER_NO,
             ROW_NUMBER() OVER(PARTITION BY P.CUSTOMER_NO 
                                   ORDER BY P.ID DESC) AS ROW_NUMBER 
        FROM MY_CUSTOMERS P ) GC 
WHERE GC.CUSTOMER_NO = 12345 
  AND GC.ROW_NUMBER = 1