When it comes to bringing insight into data, where it comes from and how it is used, data lineage is often put forward as a crucial feature. However, there are some problems with the way lineage is often depicted. In this post, I aim to clarify the differences in lineage vs. traceability.
Lineage shows facts, a flow of how data is or will move and transform between systems, tables, data domains. Often these data lineage diagrams produce wall-to-wall flows that non-technical users would see as unusable. This is because these diagrams show ‘as built’ transformations, staging tables, look ups, etc. This is great for technical purposes, but not for users looking to answer questions like “where does my data come from? What policies were used? What standards are applied?”
This is where traceability comes in.
Different views for different users
First of all, a traceability view is made for a certain role with in the organization. Policy managers will want to see the impact of their security policy on the different data domains – ideally before they enforce the policy. Analysts will want to have a high level overview of where the data comes from, what systems, what rules were applied. An auditor might want to see a trace of a data issue to the impacted systems and business processes.
Traceability views don’t have to be generated from the technical layer. They can be used to study the impact of introducing a new data asset or governance asset (such as a policy) on the rest of the business.
Adding a business layer over the technical view
Any traceability view will have most of its components coming in from the data management stack. Systems, profiling rules, tables, and columns of information will be taken in from their relevant systems or from a technical metadata layer. Where the true power of traceability (and data governance in general) lies is in the info business users can add on top of it.
As an example, envision a program manager in charge of a set of Customer 360 projects that wants to govern data assets from an agile, project point-of-view. By adding projects and their relations to data domains to his view this user can see the related data elements (technical) to his or her projects (business).
Summing it up: lineage vs. traceability
Good technical lineage is a necessity for any enterprise data management program. It does not, however, fulfill the needs of business users to trace and link their data assets through their non-technical world. The right solution will cherry pick technical assets and allow different lines of business to add and link business terms, processes, policies, and any other data concept modelled by the organization. Enabling customizable views that combine both business and technical information is critical to understanding data and using it effectively and the next step into establishing data as a trusted asset in the organization.