What is a data catalog?


Data catalogs are becoming a must-have in the data landscape. But buyer beware! Not all data catalogs are created equal. If you’re in the market for a data catalog, be sure that you’re asking the right questions, and focusing on the must-have features to make your investment worthwhile. To make it easy for you, here’s a list of things your data catalog needs.

1. Automatic metadata creation

The best data catalogs don’t just scan existing metadata – they actually create their own metadata based on profiling and sampling capabilities embedded in the product. Why is this important? It’s simple: by creating metadata, the data catalog makes it easy for all data users (we call them data citizens) to find and understand the data. And it makes data discovery easier because it provides statistics on the rows and columns of data, and gives them meaning by linking it with business-friendly terminology.

For example, a marketing user may find a new data source in the data catalog that includes information she needs for a campaign: customer name, title, address, etc. But when the marketer previews the data set, she sees that 50% of the rows have no values for the fields she needs. Clearly that is not a good data set to use for her campaign, even though it looked promising at the start. This kind of insight comes from the metadata created by the catalog.

Using data profiling, Collibra Catalog automatically creates metadata that is linked to the business glossary in our data governance platform. Here’s what it looks like:

5 Things Your Data Catalog Needs (But Doesn’t Have)
5 Things Your Data Catalog Needs (But Doesn’t Have)

2. Business friendly language

If you want to empower your data citizens to easily find and understand the data, then you need to present the data in language they recognize and understand. A good data catalog will automatically link new data sources with the company’s existing business glossary to give meaning to the data using the vocabulary already established by the company. And by linking the data source to the business glossary, the data citizen knows exactly what the data means, and can easily judge whether it’s the right data for the project at hand.

Through its inherent integration with the Collibra Data Governance platform, Collibra Catalog provides this link to the business glossary automatically. Here’s a glimpse at what it looks like in our product:

5 Things Your Data Catalog Needs (But Doesn’t Have)
5 Things Your Data Catalog Needs (But Doesn’t Have)

3. Freeform tagging

Tagging is a common feature in data catalogs. But not all tagging is created equally. Make sure that the tagging features is one that fosters collaboration by allowing users to tag the data sets in a way that make sense for their part of the organization. If we think back to our earlier marketing example, the marketing team could use tagging to flag the data sets that work best for a specific campaign they are running. These tags allow them to easily find the right data set – without disrupting the link between the data set and the business glossary. Again, here’s a visual for how it looks in Collibra Catalog:

5 Things Your Data Catalog Needs (But Doesn’t Have)

4. Sync & notify

Another feature to put on your “must-have” list is sync and notify. Look for data catalogs with the ability to resync the data sources on a regular schedule, and alert the people using those data sets that a more up-to-date version of the data set is now available. This feature helps to ensure that people who are using an offline copy of the data are always aware of updates and can refresh their own copy accordingly.

5. Integration with your data governance platform

This should be the #1 feature on your data catalog wishlist. A governed data catalog gives data citizens confidence in the data because they can understand its meaning, through links with the established business glossary, data lineage, and more. And they can trust that the data is right, because it has been certified, following the appropriate processes and controls as established in the data governance platform. In Collibra Catalog, users can easily tell which data sets are certified – and which are not. But worth noting, this certification only has meaning when the certification process is implemented on the data governance platform.

5 Things Your Data Catalog Needs (But Doesn’t Have)

As you evaluate data catalogs, be sure to ask each vendor the following questions:

  1. How do you incorporate data profiling and sampling to create metadata?
  2. How do you link catalog data sources to business terms already established by the organization?
  3. How do you approach collaboration – and what capabilities do you provide to business users to enable them to work together better across the organization?
  4. What is your process for syncing and alerting users to changes in data sets?
  5. How do you incorporate data governance principles and processes from your data governance platform into the data catalog?

Data catalogs are critical to helping your data citizens find, understand, and trust their data. But it’s essential that your data catalog covers these key features. Collibra Catalog checks all the boxes – and more. Ready to learn more?