No, The Stork did NOT bring your Data

Six Immediate Business Benefits From Data Lineage

 

If you are reading this, I trust that you know what data is. If the stork didn’t bring your data, then do you know where it comes from?

Data lineage is not how data gets populated, but the way it gets to you. The Data Management Association Dictionary (DAMA) defines data lineage as “a description of the pathway from the data source to their current location and the alterations made to the data along the pathway.” Rather than seeing data and its subsequent manifestation like how storks “bring babies,” think of data lineage as the actual storks themselves, who are magnificent, wilderness pathfinders.

Data lineage maps the flow that happens to data from its origin, how it is handled in downstream processes, and its destination. We can liken it to GPS tracking for data, where users can trace back pathways, checkpoints, understand travel conditions, and identify landmarks along the way.

Tracing your data in today’s data landscape is nearly impossible with the myriad of sources, tools, and transformations each data set continues to be processed with, as the graph below suggests.

 

Data Lineage, powered by Machine Learning (ML) data, discover capabilities to straighten and highlight the pathway that your data takes.

 

Data from source systems (e.g., CRM, ERP, API, spread sheet) is managed before getting pushed or pulled into Data Lakes, Warehouses, Marts or other operational systems where data is transformed, curated, stored and further used and readied for reporting and operational needs. To meet ever-increasing stringent and quality reporting requirements, data must be mapped to its source to assess ownership, authenticity, quality, and accuracy – the “Data Lineage.”

Imagine traversing the data landscape without a map in hand. Not only will you be lost, but there will also be no means for you to retrace your footsteps. Tracking how data moves from point-to-point only provides us with a limited map from an IT perspective, much like real-life when we navigate with only what we can see on the horizon and recall where we’ve passed. Hence innovations like GPS navigation give us the much-needed traceability of where we have been, and how we get from places to places, and other critical information along our journey.

Similarly, it is critical to capture the traceability of data back to its business requirements, which will give us the context and value of how this data is applied. From a business perspective of data, we will also discover what data elements mean, who the owners/producers of this data were, which subject it belongs to, and its commercial impact.

Having this information available enables the business to capitalize on their data assets and increase data usage and acceptance.

 

Drivers and Benefits of Data Lineage

There are various drivers and multiple business benefits from implementing and enforcing a Data Lineage approach. Here are six of my picks:

  1. Meeting Regulatory Requirements– Regulatory and Legislation requirements are one of the main drivers for Data Lineage, particularly in the Banking and Financial Services industry. Examples of such legislation requirements are BCBS 239, PERDARR, General Data Protection Regulation (GDPR), IFRS9, and TRIM (Targeted review of internal models). Even though “Data lineage” is never explicitly mentioned in these regulatory documents, it represents the panacea to many regulatory requirements along with other data management approaches.
  1. Consistency in Business Terminology– As application development is traditionally isolated within each functional silo, similar naming conventions used in different data models may not necessarily have the same definitions and meanings as intended by its developers and designers. A handy data lineage allows data stewards to find standard business terms, review their definitions, and determine where there are inconsistencies in how the terms are used.
  1. Changing Business Requirements– There are several occasions where business users want changes to their reports and dashboards, resulting in changes to the metadata. Examples could be changes in the name of an attribute or modifications to the attribute’s data type. It is imperative to identify the impact of this attribute’s changes throughout the application landscape using Data Lineage. It helps answer questions like what data sources feed the most significant downstream sources before making any changes.
  1. Complying to Audit and Supervisor Requirements– There is a growing need in Risk Management and Finance functions to explain how figures and critical metrics are derived. This requirement requires tracing back a full chain of transformed or manipulated data elements using Data Lineage. Data lineage provides visibility into the data pipelines and information flows that can then be audited.
  1. Maintaining Data Quality– Data Lineage plays a critical role in performing root cause analysis when identifying data quality issues. Most often, the data quality issues determined through Data Lineage is diagnosed at the source. It is much more efficient to fix source issues instead of fixing them downstream or at the reporting layer. Using a data lineage mapping, the steward can trace back through the information flow, examine the standardizations and transformations applied to the data, validate that transformations were correctly performed, or identify one (or more) performed incorrectly, resulting in the data flaw.
  1. Assessing and Optimizing Data– Data pipeline stages with many incoming paths present potential bottlenecks. Data lineage allows for easier identification. By leveraging a set of lineage mappings, it is possible to profile execution times across different pipelines and redistribute processing to eliminate the lag.

All the above presents immediate business benefits from enabling data lineage in your organization.

Like many data programs, start small with a single strategic project.

  • Understand the story that needs to be told.
  • This will lead to the necessary data essential to telling the story.
  • Knowing the critical data elements will narrow the scope for the data lineage requirements.
  • Focused requirements will promote high return on investment, great probability of success and immense value derived to the business; and
  • A template of success to repeat that process for another key stakeholder and decision-maker.

 

Ascention Data As A Language series

Please see previous topics:

 

Ascention Shares Experience

Ascention wishes to impart skills and knowledge.  The team at Ascention is always willing to share our experiences to assist your team’s progress – simply contact us to start an informal, no-obligation discussion.

 

Ascention Contact:

Ben Elters, Client Director

E: ben.elters@ascention.com

M: +61 415 500 563