The Difference between Lineage and Traceability
(Editor’s note: Because data lineage continues to be a hot topic in the data governance world, we’ve updated and republished this post. Read the updated post.)
When it comes to bringing insight into data, where it comes from and how it is used, data lineage is often put forward as a crucial feature. However, there are some problems with the way lineage is often depicted.
Lineage shows facts, a flow of how data is or will move and transform between systems, tables, data domains. Often these lineage diagrams produce wall to wall flows that non-technical users would see as unusable. This is due to the fact that these diagrams show ‘as built’ transformations, staging tables, look ups etc… Which is great for technical purposes, but not for users looking to answer questions like “where does my data come from? What policies were used? What standards are applied?”.
This is where traceability comes in.
Different views for different users
First of all, a traceability view is made for a certain role with in the organization. Policy managers will want to see the impact of their security policy on the different data domains. Ideally before they enforce the policy. Analysts will want to have a high level overview of where the data comes from, what systems, what rules were applied. An auditor might want to see a trace of a Data Issue to the impacted systems and business processes.
Traceability views don’t have to be generated from the technical layer. They can be used to study the impact of introducing a new Data Asset or Governance Asset (such as a policy) on the rest of the business.
What is the impact of my new policies and standards?
Adding a business layer over the technical view
Any traceability view will have most of its components coming in from the data management stack. Systems, profiling rules, tables and columns information will be taken in from their relevant systems or from a technical metadata layer. Where the true power of traceability (and data governance in general) lies is in the info business users can add on top of it.
As an example envision a program manager in charge of a set of Customer 360 projects that wants to govern data assets from an agile, project point of view. By adding projects and their relations to data domains to his view this user can see the related data elements (technical) to his or her projects (business).
An agile view of data: from project to system
Good technical lineage is a necessity for any enterprise data management program. It does not however fulfil the need of business users to trace and link their data assets through their non-technical world. The right solution will cherry pick technical assets and allow different lines of business to add and link business terms, processes, policies and any other data concept modelled by the organization. Enabling customizable views that combine both business and technical information is critical to understanding data and using it effectively and the next step into establishing data as a trusted asset in the organization.
Koen is responsible for delivering real-world data governance solutions to the problems raised by different regulations such as BCBS239, Solvency II, and GDPR. Before joining Collibra, he worked as principal consultant for Wolters Kluwer Financial Services and Financial Architects on Basel, Compliance, and Solvency projects worldwide.