As part of batch matching, there is no concept of driver vs. candidates. Due to this reflexive matches are identified through batch matching and the number of duplicated identified is much greater than the actual number. (ex: Record 1 is identified as a duplicate of Record 2 and vice versa). This can increase the time to resolve duplicates and hold up key activities, especially during conversion.
Purpose: Configuring OEDQ to remove reflexive batch matches. This will reduce the amount of duplicates identified during the batch match process and make the job for the data steward easier.
Components Needed:
- CDS Batch Matching Services
Steps:
1. Navigate to the CDS-Individual Match -Batch process
2. Remove the "Generate Reverse Relationships for Driver-Driver matches & Ensure Driver on LHS of Candidate-Driver matches" group
3. Add a Merge Attributes processor and connect it
4. Configure the Merge Attributes process to merge the individual id with the related id
5. Repeat steps 1-4 for CDS-Entity Match -Batch process
Things to consider:
1. This configuration needs to be thoroughly tested to ensure that the correct candidates are returned and the process does not break.
Tuesday, July 23, 2013
Monday, April 8, 2013
Publish Processor Results into Custom Log
Purpose: Configuring OEDQ to publish results to an external source which will act as a log. This will allow the user to view the results of a process without taking it down. Innately, OEDQ does not create/store any logs that document results.
Components Needed:
- Database table reserved for logging
- Purge mechanism for database table reserved for logging
Steps:
1. Add new data store to database table reserved for logging
2. Add an additional writer processor
3. Configure writer to write to database table reserved for logging
Things to consider:
1. Add an additional Date/Time processor to record a timestamp. This will help you index and catalog the log
2. In the case of transformations or standardizations to data, be sure to add the original values to track the changed made by OEDQ
3. Setup a purge on the database table reserved for logging to prevent this log from growing to large. These custom made logs can grow very quick and take up much more space than traditional logging.
Components Needed:
- Database table reserved for logging
- Purge mechanism for database table reserved for logging
Steps:
1. Add new data store to database table reserved for logging
2. Add an additional writer processor
3. Configure writer to write to database table reserved for logging
Things to consider:
1. Add an additional Date/Time processor to record a timestamp. This will help you index and catalog the log
2. In the case of transformations or standardizations to data, be sure to add the original values to track the changed made by OEDQ
3. Setup a purge on the database table reserved for logging to prevent this log from growing to large. These custom made logs can grow very quick and take up much more space than traditional logging.
Sunday, March 24, 2013
Calling external web services
Purpose: Configuring OEDQ to invoke external web services, send payloads and publish respons from web services.
Components Needed:
- Custom Script Processor
- Web Service WSDL
- Web Service Endpoint URL
Steps:
1. Add OEDQ Script processor
2. Create script using Goovy to do the following:
a. Try/Catch statement
b. Instantiate new XMLHttpReqest
c. Create xml payload as variable
d. Send xml payload variable
e. Extract value of response tag(s)
f. Publish extracted response tag value(s) as output value
Example Script:
addLibrary("http");
try {
var url = "http://xxxx";
var xmlHttp = new XMLHttpRequest();
var params = "<?xml version='1.0' encoding='UTF-8' ?>" + "<soapenv:Envelope xmlns:v1='http://xxxx'>" + "<soapenv:Body>" + "<RootTag>" + "<Tag1>Iutput1[0]</Tag1>" + "<Tag2>Input1[1]</Tag2>" ...etc + "</RootTag>" + "</soapenv:Body>" + "</soapenv:Envelope>";
var request = params;
var result = new String();
xmlHttp.open("POST", url, false);
xmlHttp.send(request);
request = xmlHttp.responseText;
request = JSON.parse(request);
result = document.getElementById("ResponseTag1").value
}
catch (e) {
result = "Error:";
}
output1 = result;
Components Needed:
- Custom Script Processor
- Web Service WSDL
- Web Service Endpoint URL
Steps:
1. Add OEDQ Script processor
2. Create script using Goovy to do the following:
a. Try/Catch statement
b. Instantiate new XMLHttpReqest
c. Create xml payload as variable
d. Send xml payload variable
e. Extract value of response tag(s)
f. Publish extracted response tag value(s) as output value
Example Script:
addLibrary("http");
try {
var url = "http://xxxx";
var xmlHttp = new XMLHttpRequest();
var params = "<?xml version='1.0' encoding='UTF-8' ?>" + "<soapenv:Envelope xmlns:v1='http://xxxx'>" + "<soapenv:Body>" + "<RootTag>" + "<Tag1>Iutput1[0]</Tag1>" + "<Tag2>Input1[1]</Tag2>" ...etc + "</RootTag>" + "</soapenv:Body>" + "</soapenv:Envelope>";
var request = params;
var result = new String();
xmlHttp.open("POST", url, false);
xmlHttp.send(request);
request = xmlHttp.responseText;
request = JSON.parse(request);
result = document.getElementById("ResponseTag1").value
}
catch (e) {
result = "Error:";
}
output1 = result;
Sunday, February 24, 2013
Extending OEDQ to Oracle DB AQ
OEDQ can be characterized as a re-active application. Subscribing or Publishing systems interact through web services. The problem with web services is that they are synchronous. This means that a request is sent and a response is recieved by the same application. There are scenarios when data quality results need to be pulbished to multiple systems, not just the application calling the web service. The below article describes how to publish the results of a request payload to multiple applications through the use of an Advanced Queue (AQ).
Purpose: Configuring OEDQ to publish results to an Oracle Database Advanced Queue (AQ)
Components Needed:
- Oracle Data Base Advanced Queue
- Custom Script Processor
- JDBC connection string to Oracle DB
Steps:
1. Create PL/SQL script to insert into Oracle DB AQ. Script needs to do the following:
a. Recieve Input as part of call
b. Create payload with input
c. Insert payload to AQ
d. Commit
2. Add OEDQ Script processor
3. Create script using Goovy to do the following:
a. Import groovy.sql syntax/library
b. Define input variable
c. Create new instance to db using JDBC connection
d. Call PL/SQL package and/or stored procedure
e. Pass input variable as part of previous step
Example PL/SQL to AQ Script:
create or replace
PROCEDURE ( p_payload IN CLOB ) as
o_payload XXX;
enq_msgid RAW(16);
eopt dbms_aq.enqueue_options_t;
mprop dbms_aq.message_properties_t;
err_message varchar2(2000);
v_payload VARCHAR2(5000);
len BINARY_INTEGER;
Begin
o_payload := XXX( p_payload );
eopt.sequence_deviation := NULL;
mprop.priority :=1;
dbms_aq.enqueue( 'AQ Name', eopt, mprop, o_payload , enq_msgid);
Commit;
END;
Example Groovy to AQ Script:
#! language : groovy
import groovy.sql.Sql
import groovy.xml.MarkupBuilder
def inputs =
'<?xml version="1.0" encoding="UTF-8" ?>' + '<Payload xmlns="http.xxx.xxx.com' + "<Root Node>" + "<Node1>" + input1[0] + "</Node1>" + "<Node2>" + input1[1] + "</Node2>" + ...etc. +"</RootNode>"
sql = Sql.newInstance("jdbc:oracle:thin:@hostname:port:SID, "Username", "Password", "oracle.jdbc.OracleDriver" )
sql = sql.call("call Procedure to insert into AQ('$inputs')")
Purpose: Configuring OEDQ to publish results to an Oracle Database Advanced Queue (AQ)
Components Needed:
- Oracle Data Base Advanced Queue
- Custom Script Processor
- JDBC connection string to Oracle DB
Steps:
1. Create PL/SQL script to insert into Oracle DB AQ. Script needs to do the following:
a. Recieve Input as part of call
b. Create payload with input
c. Insert payload to AQ
d. Commit
2. Add OEDQ Script processor
3. Create script using Goovy to do the following:
a. Import groovy.sql syntax/library
b. Define input variable
c. Create new instance to db using JDBC connection
d. Call PL/SQL package and/or stored procedure
e. Pass input variable as part of previous step
Example PL/SQL to AQ Script:
create or replace
PROCEDURE ( p_payload IN CLOB ) as
o_payload XXX;
enq_msgid RAW(16);
eopt dbms_aq.enqueue_options_t;
mprop dbms_aq.message_properties_t;
err_message varchar2(2000);
v_payload VARCHAR2(5000);
len BINARY_INTEGER;
Begin
o_payload := XXX( p_payload );
eopt.sequence_deviation := NULL;
mprop.priority :=1;
dbms_aq.enqueue( 'AQ Name', eopt, mprop, o_payload , enq_msgid);
Commit;
END;
Example Groovy to AQ Script:
#! language : groovy
import groovy.sql.Sql
import groovy.xml.MarkupBuilder
def inputs =
'<?xml version="1.0" encoding="UTF-8" ?>' + '<Payload xmlns="http.xxx.xxx.com' + "<Root Node>" + "<Node1>" + input1[0] + "</Node1>" + "<Node2>" + input1[1] + "</Node2>" + ...etc. +"</RootNode>"
sql = Sql.newInstance("jdbc:oracle:thin:@hostname:port:SID, "Username", "Password", "oracle.jdbc.OracleDriver" )
sql = sql.call("call Procedure to insert into AQ('$inputs')")
Monday, February 18, 2013
Overview of Oracle Enterprise Data Quality (OEDQ)
For those of you who are new to Oracle Enterprise Data Quality (OEDQ). Rittman Mead Consulting has published a great article which provides an overview of the tool and functionality. In addition to my blog I would strongly encourage subscribing to a few other blogs on the web to build a holistic understanding and knowledge base of OEDQ and Data Quality. Provided below are links to the Rittman Mead Consulting article as well as a few of the blogs I subscribe to. Enjoy!
Rittman Mead Consulting OEDQ Overview: http://www.rittmanmead.com/2012/08/introducing-edq/
OEDQ Overview and Tutorial Blog: http://oracleedq.com/
Oracle Master Data Management Official Blog: https://blogs.oracle.com/mdm/
Oracle Data Integration Official Blog: https://blogs.oracle.com/dataintegration/
Rittman Mead Consulting OEDQ Overview: http://www.rittmanmead.com/2012/08/introducing-edq/
OEDQ Overview and Tutorial Blog: http://oracleedq.com/
Oracle Master Data Management Official Blog: https://blogs.oracle.com/mdm/
Oracle Data Integration Official Blog: https://blogs.oracle.com/dataintegration/
Monday, February 4, 2013
Dimensions of Data Quality
Data Quality is an integral part to any system that deals with data. The quality, reliability and usability of data is paramount to its value. Unfortunately, data quality issues run all too rampant within the data infrastructure of an organization. To be able to correlate data quality issues to business impacts, one must be able to classify data quality exceptions and assess the impact. As such, one of the most valuable exercises when implementing a data quality solution is to profile the data across the four common dimensions of data quality. Listed below is a description of each dimensions along with a set of checks to profile the data:
Completeness: Determination of what data is missing and incomplete. Checks in this areas identify what data values are missing or are in an unusable state. The incompleteness of data can be a result of omitted records, accidental deletion or non-required fields. Completeness can be critical to business processes and the accuracy of information.
Checks:
-Nulls/Blanks
-Empty Records
-Missing or Incomplete Rows
-Incomplete Address
Uniqueness: Determination if data is deduplicated and unique. Checks in this areas identify duplicated data. The uniqueness of data can be a result of merged systems, data re-entry and multiple source systems. Uniqueness is a key functionality of Master Data Management and data sharing across an organization.
Checks:
-Distinct Data
-Duplicate Records
-Cross Applied Values
-Referential Integrity
Conformity: Determination if data abides by the standards of the organization and industry regulations. Checks in this areas identify what data values do not conform to integrity, column or row formatting/standards. The non-conformity of data can be a result of non enforced data standards,
imported data from unstructured sources or lack of data integrity constraints. Conformity can be vital to the usability and sharing of data across platforms.
Checks:
-Data abides by Domain Values
-Data Formatting
-ASCII Data Standards
-Whitespaces/Leading Spaces
Integrity: Determination if data is corrupted or decayed. Checks in this areas identify what data values are include erroneous, incorrect and junk information. Bad data can be a result of non enforced data standards, dummy data left in the system and lazy data entry. Integrity is very important to the reliability, accuracy and usability of data in key business processes..
Checks:
-Bad or Incorrect Values
-Data from other fields
-Multiple entries in the same field
-Trailing/Leading white spaces
Completeness: Determination of what data is missing and incomplete. Checks in this areas identify what data values are missing or are in an unusable state. The incompleteness of data can be a result of omitted records, accidental deletion or non-required fields. Completeness can be critical to business processes and the accuracy of information.
Checks:
-Nulls/Blanks
-Empty Records
-Missing or Incomplete Rows
-Incomplete Address
Uniqueness: Determination if data is deduplicated and unique. Checks in this areas identify duplicated data. The uniqueness of data can be a result of merged systems, data re-entry and multiple source systems. Uniqueness is a key functionality of Master Data Management and data sharing across an organization.
Checks:
-Distinct Data
-Duplicate Records
-Cross Applied Values
-Referential Integrity
Conformity: Determination if data abides by the standards of the organization and industry regulations. Checks in this areas identify what data values do not conform to integrity, column or row formatting/standards. The non-conformity of data can be a result of non enforced data standards,
imported data from unstructured sources or lack of data integrity constraints. Conformity can be vital to the usability and sharing of data across platforms.
Checks:
-Data abides by Domain Values
-Data Formatting
-ASCII Data Standards
-Whitespaces/Leading Spaces
Integrity: Determination if data is corrupted or decayed. Checks in this areas identify what data values are include erroneous, incorrect and junk information. Bad data can be a result of non enforced data standards, dummy data left in the system and lazy data entry. Integrity is very important to the reliability, accuracy and usability of data in key business processes..
Checks:
-Bad or Incorrect Values
-Data from other fields
-Multiple entries in the same field
-Trailing/Leading white spaces
Subscribe to:
Posts (Atom)