In 1988, Sotheby’s sold a collection of 175 pottery cookie jars for $907,995. These cookie jars included images of pigs, mice, goats, sheep, Humpty Dumpty and a large panda. Estimated to sell for $75 each at Sotheby’s, the jars ended up selling for over three times their evaluated worth. So why did this collection of kitschy cookie jars sell for so much money? The answer? Because they belonged to legendary Pop artist, Andy Warhol.
Provenance, the history of ownership and identity of an object, can directly affect an artwork’s value and authenticity. As seen in the cookie jar example, provenance adds tremendous monetary value to an artwork. Objects owned by a celebrity, a historical figure, a prestigious collector, a reputable gallery, or a top museum will garner more value at auction than works without these same credentials. Furthermore, authenticity, context and accuracy are crucial in the art world. With art forgeries and the unlawful seizure of art by the Nazis during World War II, it is pivotal that a work of art contains a legitimate and accurate provenance. For example, the Monument Men relied on provenance in the 1940s to track down the rightful owners of stolen art. Today, gaps in provenance have led to numerous high profile legal disputes regarding the rightful ownership of many valuable artworks.
All that being said, this blog post is not about art. Rather, I am using art as a lens to understand data lineage. Like provenance, which provides context and increases the value of an artwork, data lineage provides the necessary understanding that enables Data Citizens to create valuable business insights using their data. For Data Citizens to trust their data, they must know where their data comes from, where it’s been, how it’s being used, and who is using it. Data lineage provides a graph that documents and traces the interdependencies of the data in a data catalog. The lineage graph provides a roadmap of data consistency, accuracy and completeness, which enables business users to better understand and trust their data.
Introducing Collibra Lineage
Data lineage makes data meaningful. It turns data into a valuable asset that drives innovation. In fact, end-to-end lineage is a necessary and crucial foundation for all data-driven initiatives. In July 2019 we acquired SQLdep, a leading SaaS provider of automated technical lineage. We are excited to announce Collibra Lineage, our native, automated lineage capability that is the integration of SQLdep into Collibra Catalog.
Collibra Lineage automatically maps relationships between data points to show how data moves from system to system and how data sets are built, aggregated, sourced and used — providing complete, end-to-end lineage visualization. Before automated technical lineage, IT spent countless hours manually mapping the relationships between data. This time-consuming task prevented IT from focusing on strategic initiatives. A single developer might have to dedicate a full year’s work simply to document existing data flows in a typical data warehouse with hundreds of thousands of columns — all while those data flows are evolving in real time.
Luckily, Collibra Lineage combats this problem by automatically extracting lineage information from source systems and creating data flow visualizations. Integrated with Collibra Catalog, Collibra Lineage enables business users to immediately hone in on the data they care about and have full confidence in using that data to drive business decisions.
Automate lineage mapping efforts
Collibra Lineage solves the problem of manually mapping your data flows. With Collibra Lineage you save valuable time by automatically extracting technical lineage from various source systems, including SQL dialects, ETL tools and BI solutions, to create an interactive data lineage map and keep it up to date.
With Collibra Lineage, we generate two lineage views — business-friendly summary lineage views and detailed technical lineage views. Those interested in digging into the technical lineage can click into the technical lineage tab in Collibra Catalog to see a full technical lineage diagram.
From there, they can drill down into the lowest level of granularity and view column-level lineage and transformation logic.
And we enhance the intelligence you can derive from your data by also mapping indirect lineage. With Collibra Lineage, you can view indirect relationships that influence the movement of data, but do not directly participate in data movement itself, such as conditional statements and joins. In the screenshot below, you can see that the INSIGHT_ID column is participating in a join condition and not in the direct data movement itself and hence the relationship is categorized as indirect lineage.
Furthermore, you can easily identify and drill down into relevant table and column-level SQL code, both in and out, within the technical lineage diagram. For example, you can right click on a column and choose “SQL code (IN)” to see the code written to combine data into that column.
And with our filtering functionality, you can filter technical lineage diagrams to show exactly what you need by choosing the attributes required for your purpose. You simply click on “settings” in the preview pane on the right side of the diagram and filter accordingly. Below, the user has chosen the “group by schemas” option.
Once you have landed on the diagram you are looking for, you can easily export it in different file formats, such as PDF, PNG, and CSV, for seamless reporting and sharing.
Technical lineage gives IT crucial visibility into data pipelines. With Collibra Lineage, IT can quickly and seamlessly see these relationships, while also keeping the lineage diagrams up to date.
Better understand your data with a summarized lineage view
Collibra Lineage helps you understand the full context of your data by showing the flow of data as it moves from source to destination. Without lineage, the business cannot be sure that the data they are using in their analysis comes from trusted sources and is accurate. This means business users could be basing important business decisions off of inaccurate and incomplete data.
Fortunately, Collibra Lineage provides lineage visualizations that illustrate the source of the data, how data sets are built and aggregated, the quality of data sets, and the transformations along the journey. In Collibra, a business user, Cliff, can click on the Diagram tab and instantly see a business-friendly visualization. This interactive diagram shows summary lineage that traces data flows from data source to report. And with this enriched context, Cliff can be sure he is using the right data sets for the right business purposes and have confidence that he is using accurate, complete, and trustworthy data to drive business decisions.
Lay the foundation for Data Intelligence with Collibra Lineage
Lineage is crucial for extracting value from your data and an important step in the journey to Data Intelligence. With an increasing amount of data entering our environment, it is important that we can trust our data. Collibra Lineage documents the data lifecycle to help business users and IT better understand processes and their dependencies. In other words, Collibra Lineage provides a roadmap to data consistency, completeness, timeliness and conformity at every point in the data journey.
While the lifecycle of data may not have the same blockbuster thrill as the Monument Men using provenance to track down Nazi stolen art or the mega sale of Andy Warhol’s cookie jar collection, Data Citizens rely on data lineage to generate valuable business insights. Collibra Lineage ensures business users trust and understand their data, thus making data meaningful.