Data traceability vs data lineage: Understanding the differences

Lineage vs. Traceability: Understanding the Differences

Lineage vs. Traceability: Understanding the Differences

When it comes to bringing insight into data, where it comes from and how it is used, data lineage is often put forward as a crucial feature. However, there are some problems with the way lineage is often depicted. In this post, I aim to clarify the differences in lineage vs. traceability.

Lineage vs. Traceability: Understanding the Differences

Lineage shows facts, a flow of how data is or will move and transform between systems, tables, data domains. Often these data lineage diagrams produce wall-to-wall flows that non-technical users would see as unusable. This is because these diagrams show ‘as built’ transformations, staging tables, look ups, etc. This is great for technical purposes, but not for users looking to answer questions like “where does my data come from? What policies were used? What standards are applied?”

This is where traceability comes in.

Different views for different users

First of all, a traceability view is made for a certain role with in the organization. Policy managers will want to see the impact of their security policy on the different data domains – ideally before they enforce the policy. Analysts will want to have a high level overview of where the data comes from, what systems, what rules were applied. An auditor might want to see a trace of a data issue to the impacted systems and business processes.

What-if? analysis

Traceability views don’t have to be generated from the technical layer. They can be used to study the impact of introducing a new data asset or governance asset (such as a policy) on the rest of the business.

Lineage vs. Traceability: Understanding the Differences

Adding a business layer over the technical view

Any traceability view will have most of its components coming in from the data management stack. Systems, profiling rules, tables, and columns of information will be taken in from their relevant systems or from a technical metadata layer. Where the true power of traceability (and data governance in general) lies is in the info business users can add on top of it.

As an example, envision a program manager in charge of a set of Customer 360 projects that wants to govern data assets from an agile, project point-of-view. By adding projects and their relations to data domains to his view this user can see the related data elements (technical) to his or her projects (business).

Lineage vs. Traceability: Understanding the Differences

Summing it up: lineage vs. traceability

Good technical lineage is a necessity for any enterprise data management program. It does not, however, fulfill the needs of business users to trace and link their data assets through their non-technical world. The right solution will cherry pick technical assets and allow different lines of business to add and link business terms, processes, policies, and any other data concept modelled by the organization. Enabling customizable views that combine both business and technical information is critical to understanding data and using it effectively and the next step into establishing data as a trusted asset in the organization.

Related resources


Collibra Data Lineage


Get the full story behind your data


Unlocking business opportunities through data lineage

View all resources

More stories like this one

Oct 19, 2020 - 4 min read

What is data lineage?

Read more
Sep 28, 2020 - 4 min read

What is a data catalog?

Read more
May 7, 2020 - 7 min read

12 steps to Data Intelligence:
Part 2

Read more