The enterprise data catalog: lessons learned from cutting the cord

Over the past several years many of us have “cut the cord” as we move away from cable and begin consuming digital content from providers, such as Netflix, Hulu and Disney. With the increasing variety and volume of on-demand content, viewers can now choose what content to watch, when they want to watch it, and which device they want to use. While this ability to self-serve content is great, it also poses some difficulties. For example, it can be difficult to know which provider has your show available, if you have a subscription to that provider, if the show is included in the subscription or do you have to pay for it separately, if you have to login through the provider or can you login directly with the content creator, and so on.

Why organizations need an enterprise data catalog 

Many organizations are facing a similar problem as they continue to connect to more data sources in order to extract the data insights and intelligence they need to drive their business. Just as your average viewer needs to find the shows they want quickly and easily, business users also need to be able to quickly find the data they need, have confidence that the data can be trusted, and be provided the right access based on their role. When the data resides across multiple cloud platforms, hundreds of applications, and within a multitude of databases the idea of individually searching each one of these repositories just cannot scale.

API’s and greater extensibility are streamlining connectivity and integration across a wider range of data sources. This creates a larger digital ecosystem of data, but many enterprises are still playing catch up in implementing a modern data intelligence infrastructure that includes an enterprise data catalog.

With an enterprise data catalog, organizations can provide a unified view into all its expanding data sources, whether on-premises or in the cloud. This catalog must be able to support the entire organization, and not just a single data source or department. It must also be able to support federated policies, and a variety of operating models, organizational structures, and diverse data sets.

Technical catalogs vs. enterprise data catalogs 

There are many technical catalogs in use throughout an organization, but they are often meant for technical users. Instead of users needing to jump from data source to data source or run ad hoc searches for data, a centralized data catalog provides a single source of truth that enables users to easily search and filter the entire data ecosystem, including all the various cloud platform, technical and other catalogs.

Also, most enterprises today rely on a multi-cloud infrastructure, but each cloud platform has its own technical catalog of only data within its own cloud.  Wouldn’t it be nice to go to one place to find your data, whether it’s in AWS, GCP, or Snowflake? The same goes for your databases, BI tools, ERP/CRM, and all your other applications. 

Without the right type of catalog, users spend most of their time searching for data instead of focusing on analyzing the data and extracting the insights they need to drive business decisions. According to Forrester, 76% of analysts’ time is spent simply discovering and preparing data.

If a user finds the data they think they need, now they have to ask themselves where did this data come from, is it relevant, and can I trust this data? Without the visibility provided by an enterprise data catalog they’re unable to trace the lineage of the data, gauge its quality, or understand the business context of the data in question.

Key features of an enterprise data catalog 

ML-powered automation 

Supporting the ability to connect to a wide range of data sources is great, but it also comes with a tidal wave of metadata and the need to create a massive inventory of business-critical assets. Continuing to take a human-based manual approach to organizing data with today’s data volumes simply cannot scale.

A data catalog infused with ML-powered automation is critical in order to sort and organize data assets at scale. With this necessary automation, organizations don’t have to pick and choose what metadata to catalog because of limited resources. They can use the power of ML to assist in suggesting appropriate data classifications to speed up the process of populating the catalog with data supported with full business context.

Data lineage 

It’s hard enough to get your arms around today’s data volumes, but it becomes even harder when the data isn’t static. Data typically goes through many changes along its journey from the originating data source all the way to its destination in the hands of the business. If users don’t know where the data came from and how it’s changed, they’re probably not going to be confident in using this data for their analysis and reporting. This is where the “farm to table” data lineage is a must have in order to provide complete visibility of the origination and transformation of the data, as well the needed confidence to stake business decisions on this data. The Forrester Data Intelligence Report revealed that 71% of business analysts say that data lineage makes it easier to see where data has come from and how it has been changed.

Governance and privacy

Connecting the entire business to trusted data across the organization is foundational to an enterprise data catalog, but together with this newfound visibility it’s crucial to apply the proper governance and privacy. Building a comprehensive business glossary, assigning stewardship, identifying sensitive data, and enforcing governance and privacy policies are key to ensure only the right users have access to the right data.

Data shopping experience 

So far we’ve highlighted many of the key elements to support a comprehensive enterprise data catalog. In order to bring all this together, a business-friendly data shopping experience is needed to enable consumers to discover and request access to data across their organization from a single interface. Through this UI users can search all data across the enterprise, quickly find the data sets they need, trust the data is up to date, and request access to the data.

***

If all the digital content providers could come together, I imagine they could learn some lessons from this enterprise data catalog approach. In the meantime, there’s no reason why your organization should follow in the footsteps of these providers. Instead you should implement a holistic data intelligence strategy built around an enterprise data catalog that connects all your data and users so that they can extract the insights they need to move the business forward.

Related resources

Blog

Why you need an enterprise data catalog

View all resources

More stories like this one

Jan 26, 2024 - 5 min read

Collibra is proud to be part of the Snowflake Horizon Partner Ecosystem

Read more
Arrow
Jan 16, 2024 - 5 min read

Beyond data silos: Bridging the context gap with process-aware cataloging

Read more
Arrow
Nov 28, 2023 - 5 min read

Q4 2023 Collibra release: helping customers reduce data risks and improve...

Read more
Arrow