Interactive Lineage Exploration: Discovering the Right Data
The biggest opportunity for many organizations today is the democratization of data. In the past, data has been hard to find, hard to understand, and hard to get. However, the potential which our data has to create value for the organization can only be realized if we get the entire company to make fact-based decisions. And this can only happen when the information about data – and access to data – is broadly consumable.
Finding and understanding data is a complex simple problem. It requires all different types of information about the data in order to determine whether it is the right data for a particular use. And many times, the type of information that will differentiate suitable from unsuitable data cannot be determined in advance. The best data governance programs ensure that this information is constantly being enriched and enhanced. But the challenge is that this metadata itself becomes a highly varied structure that can be complex and confusing for people who are not intimate with the data. The average person might know that they want a certain set of customers, and they want the same ones that were used in another analysis. They will not know tables and field names, or even report attributes, KPI calculations, etc. But if they can start with the customer table, navigate the usage relationships until it reaches some output they are familiar with, they will have found what they are looking for. As we democratize the use of our data, the population of users that fit into that “not intimate” category increases. More and more, we need to simplify how to find and use data. Interactive exploration is a critical capability that is at the heart of finding the right data.
When users search for data, the trouble is always differentiating between the results. Much of the data organizations have is copies of other data. Especially in an analytical sandbox where people are constantly creating different views and combinations of similar data. Search is a difficult tool to use to distinguish between these copies because it requires you to know the exact forms of relationships before you ask the question. Some people are able to glean this information from tables and detail forms. But the majority of people are visual learners, and they need a mechanism that lets them find their information visually.
This is where data lineage can help. Most people think of lineage as “where the data came from.” But a more accurate description is “what relationships does the data have?” For most users without a complete knowledge of the data, the easiest means of understanding the data is to show it in the context of its relationships with other things. These “things” are not only its source systems and tables, but the related data sets, reports, formulas, KPIs, issues, models, and more. Depending on the use of the data, any one or all of these things may contribute to distinguishing between data that will created verifiable results and data that is untrustworthy.
Using other techniques, such as intelligent catalogs, faceted search, and machine learning recommendations can narrow down the scope in which data users have to search. But one of the main reasons for giving users access to all this data is to allow them to innovate. And these techniques cannot predict what will matter to each user in particular. Visual exploration of the lineage and relationships is the easiest way for data consumers to get from “things that might be right” to “ the right data for the job.”
Most of us explore things visually. And having a simple means of exploring the data and its relationships is a critical step to enabling the broad spectrum of users to build their own analyses. Without this capability, what they have are lists and searches. Using visual exploration allows a natural path for users to get insight into data, and leads data citizens quickly to actionable analysis.