Data Lineage Through the Eyes of a DBA
In data governance meetings and discussions about requirements today and in the past, there is a lot of talk about metadata scanning, importing technical metadata like table names and columns, and finding duplicates of columns and of data. But in these discussions, we often fail to realize the necessity of data lineage. Technical metadata alone is not enough to help a DBA understand and model the data in a way that allows efficient data management. Lineage of data from source to target systems along with transformations, as well as to business metadata like business term definitions and rules, is critical for DBAs and technical users.
It is important to have a catalog of all your data assets, and to know what is available and where data exists (which tables, databases, and systems). In Collibra, we can import this catalog of metadata from various sources like databases.
But DBAs also need to understand:
- What is the meaning and use of the tables and columns
- Who is the business owner
- What application owns it
- How it is linked and transformed from source to target
A DBA can be asked to create a new database with some fields and columns, or to transform existing tables and columns and databases. DBAs usually work by using huge spreadsheets with a catalog of all databases and systems. But rarely do they know the usage of each table, database, or column and how they would be interlinked together. So it is very difficult to identify if a database, table, or column of that content already exists. As a result, many times DBAs simply create copies of the same data, at the expense of consuming more resources, and creating data quality issues.
The fact that the DBA does not know what the table or column represents from a business perspective makes it impossible to make any data modeling decisions, which can greatly improve the performance as well as the quality of the data system.
So it’s critical that technical people see a semantic layer on top of their data for the following reasons:
- Proper data modeling
- Reducing redundancy for better disk and memory capacity
- Improving performance by better controlling redundancy and improving data model
- Controlling access which also leads to better performance
- Easier administration
A DBA does not like to deal with names that don’t mean anything like table ABC and column XYZ because this does not help a DBA manage the database systems efficiently. Also, when a user asks what are the columns and tables that contain the data he or she requires, a DBA should be in a position to give a correct answer. This can only happen, however, if the DBA knows the meaning of data.
This is why Collibra helps not only business people, but technical departments as well.
Here is an example of a DBA importing into Collibra the metadata information about tables and columns. This can be done automatically via Collibra Connect.
Figure 1: Example of importing technical metadata into Collibra – example of tables and columns
Now either the DBA or a data steward can be notified with a task to link these columns to appropriate business terms that provide meaning like in the following example:
Figure 2: Example of linking a table column for example MOBPH to a business term Customer Mobile Phone and to a Master Asset (for MDM)
This way a column becomes the authoritative source, or another type of data source, for a business term and the data lineage is starting to form and create a meaning around the data. In addition, Collibra automatically offers suggestions as what other assets are relevant to this data, so that a DBA can make a decision on what is the relationship faster and more efficiently.
Figure 3: Example of suggestions Collibra offers for lineage of an asset to other assets
Most important, Collibra offers a visual representation of data lineage from the data asset (column or table or database or system) to the business term, as well as to other assets like issues, systems, even further policies and business rules.
Figure 4: Lineage of table column to business term, to master data, even linking to issues, systems of use and MDM systems
So a DBA can easily identify which column represents a business asset that is required for a report, for an application, or for another business user request. In addition to this, Collibra offers the capability for a DBA to identify redundancies, like in this search for duplicate columns in all databases and systems:
Figure 5: Example of filtering for duplicate columns with all their attributes, tables, schemas, databases they belong to
By being able to check the connection of a column to other assets, it makes it easier for the DBA to identify the value of the column/table and to make the right decision about its existence or modification as needed.
In summary, data lineage of the business terms to the technical metadata helps DBAs and technical administrators perform their job in a better and more efficient way. It helps them to understand where the data is and what it means for better data quality management, reduction of redundancy, data modeling efficiencies, and reduction in the time and cost of searching for data when needed by the business. Collibra offers a unique, user-friendly and rich-in-features platform to implement this lineage in a complete way, allowing, via workflows and notification, the easy on-going maintenance and change management of the data assets.