Introduction
In the initial example, we loaded the global financial market data files (MIC). One of the fields in the data file was the ISO country code in the 2-characters format. The international encoding of nations also provides a 3-characters format and a numeric format.It 'important for a Data Warehouse, have all the standard codes, above all for information of a general nature as well as the codes of nations. This allows the Data Warehouse to receive and send data files to the outside world using a common coding language.
We can find on the internet the data file that contains these encodings (I used the http://www.chemie.fu-berlin.de/diverse/doc/ISO_3166.html site but you can find others. I used this example because with a simple copy and paste, I created a txt format data file with fields at fixed lenght.
This data file, after being loaded into the Staging Area, will be a good candidate to become the dimension table of nations. This dimension must have a unique key field on which to build the dimensional key. In the case where, like this, there are more ways to identify the nation uniquely, we need to decide on a single one and use it as a standard for the whole Data Warehouse. We decide to define as a unique key the ISO code the 3-characters format.
In the sample of financial markets (MIC), there is the 2-characters format code. Suppose that, since the MIC will become a dimension of analysis, we want to include the ISO 3-characters format code and its related description.
The objective is to show how to use MEF for
• Load the data file of the nations in format with fixed-length fields.
• Set up a trasfomazion rule on a field
• Configure the fields that are not in the MIC data file.
• Use the post processing
Do all this, it is very simple. It takes a little work.
1 - It starts as usual, by configuring the COUNTRY data file and placing the sample file in the \dat.
2 - Run the load.
3 - Change the configuration file of the MIC structure, adding dummy fields. These fields will not be present in the file date, but will be created in the Staging Area table.
4 - We use the post processing package, which always runs even if it has nothing to do, to write the plsql code that will fill the added fields.
The data file and the related configuration files, you can download from https://drive.google.com/open?id=0B2dQ0EtjqAOTQzZSaUlyUmxpT1k. Under the folder of Use Case. Follow the read_me instructions.
We see in detail these steps.
The COUNTRY configuration
The configuration of the data file is performed by opening the io_country.txt file located in the \cft, and setting the following values.IO_COD:COUNTRY
IO_DEB:ISO 3166 Codes (Countries)
TYPE_COD:FIN
SEC_COD:ISO
FRQ_COD:M
FILE_LIKE_TXT:countries%.dat
FILE_EXT_TXT:countries_22-02-2016.dat
HOST_NC:.,
HEAD_CNT:3
FOO_CNT:0
SEP_TXT:
START_NUM:11
SIZE_NUM:10
MASK_TXT:DD-MM-YYYY
TRUNC_COD:1
Since the data reference date is inside the file name (countries_22-02-2016.dat), we need to set the size properly MASK_TXT: DD-MM-YYYY. The date starts at the eleventh character (START_NUM:11) and is long 10 (SIZE_NUM: 10).
I plugged in the data file 3 header lines, so there is HEAD_CNT:3
Since it is a data file with fixed length fields, leave blank the information of the SEP_TXT separator.
The configuration of the structure of the data file, was performed by opening the file country.csv under the folder \cft
We see in the figure only the columns that need to be setted.
The first 4 lines setup the fields according as they are present in the data file. They have a fixed length which is affected by the amplitude of the column headers (HOST_LENGTH_NUM), but in the Staging Area table, I want to have the real length, ie the one that is obtained with the sum of
HOST_LENGTH_NUM with STA_OFF_NUM.
We set the control of the congruence for all fields that do not undergo the rules (CHK_FLG) and we set the fields that make up the unique index (UI_FLG). After loading the Staging Area table, you will create the unique index. This is to prevent, immediately, that may be present duplicate keys.
At line 5, we insert the field in more that it is not found in the data file. (The COLUMN_COD FILE is null). Also, because we want the nation numerical code is just a number (not a number with leading zeros) we set a rule change. This rule will be used in the creation of the view that the processing module will use for loading the staging table.
Now, open a shell and :
Microsoft Windows [Version 6.0.6002]
Copyright (c) 2006 Microsoft Corporation. All rights reserved.
E:\>cd projects\dwh\mef
E:\projects\dwh\mef>sqlplus etl/etl
SQL*Plus: Release 12.1.0.1.0 Production on Wed Feb 24 10:41:25 2016
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Last Successful login time: Tue Feb 23 2016 09:28:53 +01:00
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
SQL> @sta_conf_io COUNTRY
PL/SQL procedure successfully completed.
Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
E:\projects\dwh\mef>
Load the COUNTRY data file
Reconnect to SQL*Plus and run the load:
E:\projects\dwh\mef>sqlplus etl/etl
SQL*Plus: Release 12.1.0.1.0 Production on Wed Feb 24 11:08:01 2016
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Last Successful login time: Wed Feb 24 2016 10:41:26 +01:00
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
SQL> exec mef_job.p_run('sta_iso_country');
PL/SQL procedure successfully completed.
SQL>
The MIC reconfiguration
We must change the structure of the MIC data file by inserting the extra fields, as we have done in the fifth row of the COUNTRY configuration. Doing so is very simple, because you just do a copy and paste of the lines 3 and 1of COUNTRY. (With FILE_COLUMN_COD and UI_FLG changed to null).After changing the configuration file of MIC, run the configuration script.
SQL> @sta_conf_io MIC
PL/SQL procedure successfully completed.
Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
E:\projects\dwh\mef>
Setting of the new fields PAE_DEB and PAEISO3_COD with the post-processing
Needless to say that in order to update these fields, you must first be loaded the COUNTRY data file. This setting is a typical post-processing operation. I have opened the temp_dwh_sta_iso_mic_ppk.sql file, present under the \sql folder, and modify it as follows.
create or replace package dwh_STA_ISO_MIC_PPK as
procedure p_main;
end;
/
create or replace package body dwh_STA_ISO_MIC_PPK as
pv_pkg varchar2(30) := 'dwh_STA_ISO_MIC_PPK.';
pv_error exception;
pragma exception_init (pv_error, -20058);
procedure p_upd_iso is
v_mod varchar2(61) := pv_pkg||'p_upd_iso';
v_count number;
v_upd number;
v_msg varchar2(500);
begin
select count(*) into v_count
from dwh_sta_iso_mic_stt;
update dwh_sta_iso_mic_stt d
set (paeiso3_cod,pae_deb) = (
select paeiso3_cod,pae_deb
from dwh_sta_iso_country_stt s
where d.paeiso2_cod = s.paeiso2_cod
)
where exists (
select paeiso3_cod,pae_deb
from dwh_sta_iso_country_stt s
where d.paeiso2_cod = s.paeiso2_cod
);
v_upd := sql%rowcount;
v_msg := 'Updated '||v_upd||' countries on '||v_count;
if (v_upd <> v_count) then
v_msg := v_msg||chr(10)||'Attention ! Some contries ('||
to_char(v_count-v_upd)||') without description and ISO3 code';
end if;
mef.p_send(v_mod,v_msg);
mef.p_mail('INFO','Staging Area Informational ('||v_mod||')',v_msg);
exception
when pv_error then raise;
when others then mef.p_rae(sqlerrm,v_mod);
end;
procedure p_main is
v_mod varchar2(61) := pv_pkg||'p_main';
begin
if (mef_job.pv_wr.repeat_cnt > 0) then
mef.p_send(v_mod,'Start');
p_upd_iso;
mef.p_send(v_mod,'End');
end if;
exception
when others then mef_unit.p_exc(v_mod,sqlerrm);
end;
end;
/
I have closed the file and I have renamed it, without the initial "temp_".
I have already done for you: you must only download the file, move it under the \sql folder, and compile it.
E:\projects\dwh\mef>sqlplus etl/etl
SQL*Plus: Release 12.1.0.1.0 Production on Wed Feb 24 11:16:35 2016
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Last Successful login time: Wed Feb 24 2016 11:08:01 +01:00
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
SQL> @..\sql\dwh_sta_iso_mic_ppk.sql
Package created.
Package body created.
SQL>
As you can see, the code performs a simple update of the Staging Area Table, taking the values from the dwh_sta_iso_country_stt table. Because we want to be notified if there are nations who have not found a description, add a new email code in the configuration table MEF_EMAIL_CFT, which will be used by mef.p_mail call.From SQL * Plus give:
Insert into MEF_EMAIL_CFT (EMAIL_COD, FROM_TXT, TO_TXT, CC_TXT, SUBJ_TXT, STATUS_COD)
Values ('INFO', 'mef@microetlfoundation.com', 'massimocenci@outlook.com', 'massimo_cenci@yahoo.it', null,1);
COMMIT;
Of course, change the email address with yours. We can now reload the MIC file. We enter SQL * Plus and give:
SQL> exec mef_job.p_run('sta_iso_mic');
PL/SQL procedure successfully completed.
SQL>
We see (in this image fragment) that the MIC Staging table has new values in the added columns.
Since there are some cases of country codes not found, we received the informational email: