Oracle Enterprise Data Quality (OEDQ) Blog

Tuesday, July 23, 2013

Remove Reflexive Matches in Batch Match

As part of batch matching, there is no concept of driver vs. candidates. Due to this reflexive matches are identified through batch matching and the number of duplicated identified is much greater than the actual number. (ex: Record 1 is identified as a duplicate of Record 2 and vice versa). This can increase the time to resolve duplicates and hold up key activities, especially during conversion.

Purpose: Configuring OEDQ to remove reflexive batch matches. This will reduce the amount of duplicates identified during the batch match process and make the job for the data steward easier.

Components Needed:
- CDS Batch Matching Services

Steps:
1. Navigate to the CDS-Individual Match -Batch process
2. Remove the "Generate Reverse Relationships for Driver-Driver matches & Ensure Driver on LHS of Candidate-Driver matches" group
3. Add a Merge Attributes processor and connect it
4. Configure the Merge Attributes process to merge the individual id with the related id
5. Repeat steps 1-4 for CDS-Entity Match -Batch process

Things to consider:
1. This configuration needs to be thoroughly tested to ensure that the correct candidates are returned and the process does not break.

Monday, April 8, 2013

Publish Processor Results into Custom Log

Purpose: Configuring OEDQ to publish results to an external source which will act as a log. This will allow the user to view the results of a process without taking it down. Innately, OEDQ does not create/store any logs that document results.

Components Needed:
- Database table reserved for logging
- Purge mechanism for database table reserved for logging

Steps:
1. Add new data store to database table reserved for logging
2. Add an additional writer processor
3. Configure writer to write to database table reserved for logging

Things to consider:
1. Add an additional Date/Time processor to record a timestamp. This will help you index and catalog the log
2. In the case of transformations or standardizations to data, be sure to add the original values to track the changed made by OEDQ
3. Setup a purge on the database table reserved for logging to prevent this log from growing to large. These custom made logs can grow very quick and take up much more space than traditional logging.

Sunday, March 24, 2013

Calling external web services

Purpose: Configuring OEDQ to invoke external web services, send payloads and publish respons from web services.

Components Needed:
- Custom Script Processor
- Web Service WSDL
- Web Service Endpoint URL

Steps:
1. Add OEDQ Script processor
2. Create script using Goovy to do the following:
     a. Try/Catch statement
     b. Instantiate new XMLHttpReqest
     c. Create xml payload as variable
     d. Send xml payload variable
     e. Extract value of response tag(s)
     f. Publish extracted response tag value(s) as output value

Example Script:

addLibrary("http");

try {
var url = "http://xxxx";
var xmlHttp = new XMLHttpRequest();
var params = "<?xml version='1.0' encoding='UTF-8' ?>" + "<soapenv:Envelope xmlns:v1='http://xxxx'>" + "<soapenv:Body>" + "<RootTag>" + "<Tag1>Iutput1[0]</Tag1>" + "<Tag2>Input1[1]</Tag2>" ...etc + "</RootTag>" + "</soapenv:Body>" + "</soapenv:Envelope>";
var request = params;
var result = new String();
xmlHttp.open("POST", url, false);
xmlHttp.send(request);
request = xmlHttp.responseText;
  request = JSON.parse(request);
  result = document.getElementById("ResponseTag1").value
}

catch (e) {
result = "Error:";
}
output1 = result;

Sunday, February 24, 2013

Extending OEDQ to Oracle DB AQ

OEDQ can be characterized as a re-active application. Subscribing or Publishing systems interact through web services. The problem with web services is that they are synchronous. This means that a request is sent and a response is recieved by the same application. There are scenarios when data quality results need to be pulbished to multiple systems, not just the application calling the web service. The below article describes how to publish the results of a request payload to multiple applications through the use of an Advanced Queue (AQ).

Purpose: Configuring OEDQ to publish results to an Oracle Database Advanced Queue (AQ)

Components Needed:
- Oracle Data Base Advanced Queue
- Custom Script Processor
- JDBC connection string to Oracle DB

Steps:
1. Create PL/SQL script to insert into Oracle DB AQ. Script needs to do the following:
     a. Recieve Input as part of call
     b. Create payload with input
     c. Insert payload to AQ
     d. Commit

2. Add OEDQ Script processor
3. Create script using Goovy to do the following:
     a. Import groovy.sql syntax/library
     b. Define input variable
     c. Create new instance to db using JDBC connection
     d. Call PL/SQL package and/or stored procedure
     e. Pass input variable as part of previous step

Example PL/SQL to AQ Script:

create or replace
PROCEDURE ( p_payload IN CLOB ) as
o_payload         XXX;
enq_msgid          RAW(16);
eopt               dbms_aq.enqueue_options_t;
mprop              dbms_aq.message_properties_t;
err_message        varchar2(2000);
v_payload         VARCHAR2(5000);
len               BINARY_INTEGER;
Begin
o_payload := XXX( p_payload );
  eopt.sequence_deviation := NULL;
mprop.priority :=1;
  dbms_aq.enqueue( 'AQ Name', eopt, mprop, o_payload , enq_msgid);
Commit;
END;

Example Groovy to AQ Script:

#! language : groovy
import groovy.sql.Sql
import groovy.xml.MarkupBuilder
def inputs =
    '<?xml version="1.0" encoding="UTF-8" ?>' + '<Payload xmlns="http.xxx.xxx.com' + "<Root Node>" + "<Node1>" + input1[0] + "</Node1>" + "<Node2>" + input1[1] + "</Node2>" + ...etc. +"</RootNode>"
sql = Sql.newInstance("jdbc:oracle:thin:@hostname:port:SID, "Username", "Password", "oracle.jdbc.OracleDriver" )
sql = sql.call("call Procedure to insert into AQ('$inputs')")

Monday, February 18, 2013

Overview of Oracle Enterprise Data Quality (OEDQ)

For those of you who are new to Oracle Enterprise Data Quality (OEDQ). Rittman Mead Consulting has published a great article which provides an overview of the tool and functionality. In addition to my blog I would strongly encourage subscribing to a few other blogs on the web to build a holistic understanding and knowledge base of OEDQ and Data Quality. Provided below are links to the Rittman Mead Consulting article as well as a few of the blogs I subscribe to. Enjoy!

Rittman Mead Consulting OEDQ Overview: http://www.rittmanmead.com/2012/08/introducing-edq/

OEDQ Overview and Tutorial Blog: http://oracleedq.com/

Oracle Master Data Management Official Blog: https://blogs.oracle.com/mdm/

Oracle Data Integration Official Blog: https://blogs.oracle.com/dataintegration/

Monday, February 4, 2013

Dimensions of Data Quality

Data Quality is an integral part to any system that deals with data. The quality, reliability and usability of data is paramount to its value. Unfortunately, data quality issues run all too rampant within the data infrastructure of an organization. To be able to correlate data quality issues to business impacts, one must be able to classify data quality exceptions and assess the impact. As such, one of the most valuable exercises when implementing a data quality solution is to profile the data across the four common dimensions of data quality. Listed below is a description of each dimensions along with a set of checks to profile the data:

Completeness: Determination of what data is missing and incomplete. Checks in this areas identify what data values are missing or are in an unusable state. The incompleteness of data can be a result of omitted records, accidental deletion or non-required fields. Completeness can be critical to business processes and the accuracy of information.

Checks:
-Nulls/Blanks
-Empty Records
-Missing or Incomplete Rows
-Incomplete Address

Uniqueness: Determination if data is deduplicated and unique. Checks in this areas identify duplicated data. The uniqueness of data can be a result of merged systems, data re-entry and multiple source systems. Uniqueness is a key functionality of Master Data Management and data sharing across an organization.

Checks:
-Distinct Data
-Duplicate Records
-Cross Applied Values
-Referential Integrity

Conformity: Determination if data abides by the standards of the organization and industry regulations. Checks in this areas identify what data values do not conform to integrity, column or row formatting/standards. The non-conformity of data can be a result of non enforced data standards,
imported data from unstructured sources or lack of data integrity constraints. Conformity can be vital to the usability and sharing of data across platforms.

Checks:
-Data abides by Domain Values
-Data Formatting
-ASCII Data Standards
-Whitespaces/Leading Spaces

Integrity: Determination if data is corrupted or decayed. Checks in this areas identify what data values are include erroneous, incorrect and junk information. Bad data can be a result of non enforced data standards, dummy data left in the system and lazy data entry. Integrity is very important to the reliability, accuracy and usability of data in key business processes..

Checks:
-Bad or Incorrect Values
-Data from other fields
-Multiple entries in the same field
-Trailing/Leading white spaces