Enabling enterprise-scale data quality: Collibra + BigQuery


Businesses are adopting the cloud-first approach at an unprecedented speed. They are keen to leverage the scalability and agility of the platforms like BigQuery for real-time analytics powering their business decisions. Gartner Research recognizes this trend of cloud-native platforms enabling you to respond to rapid digital change in 2022 and beyond.

Data quality while moving to the cloud

Although the cloud provides unified and compliant data access, it must be complemented with ensuring high data quality for delivering trusted analytics.

Quality of data is its fitness for use, based on the dimensions that matter to the organization. The dimensions often in focus are completeness, accuracy, and consistency. Data accessibility, timeliness, and relevance also play a significant role in delivering trusted analytics.

Migration plans typically ensure that data quality is not affected during the move. But the definitions and measurement metrics across the silos can introduce inconsistencies in data. In place of the traditional lift-and-shift approach, the Collibra three-step approach leverages data catalog, data governance, and data quality. This approach includes data source registration, governed ingestion and a governed data catalog:

  1. Data source registration to get a complete understanding of the enterprise data
  2. Governed ingestion and transformation with ML-powered, autogenerated, adaptive data quality rules alerting issues at the source
  3. Governed catalog for the data lake, with lineage, usage policies, and continuous data quality, for timely and relevant data access

The conventional approach to cloud data quality  

A recent Harvard Business Review study reported that 47% of recently created data records have at least one critical error, which does not bode well for driving trusted analytics. The key here is to monitor a data lake for incoming data continuously and fix these errors quickly. But that does not happen for most organizations, and up to 25% of revenue is lost due to bad data.

The three major issues organizations face with the conventional approach to data quality are:

  • Manual rule writing and management: Manual rule writing is reactive and inefficient for the speed and growing volume of data in the cloud. You will find it practically impossible to write thousands of rules and manage them constantly.
  • Limited connectivity and scalability: Most conventional solutions cannot run data quality rules on files and streaming data. The typical coverage of only 30% cannot guarantee trusted data. The scale of data in the cloud further limits the use of conventional solutions. 
  • Ad hoc and manual data quality management: Typically, IT and technical users work on data quality issues. Since they are not typically data producers or consumers, they have difficulties assessing the impact of the issues. The ad hoc approach to data quality rarely produces the desired results in the expected timeframe.

It is important to remember that with a data lake,  as you acquire new data, you also acquire new errors that impact your work. The conventional approach thus runs a huge risk of regulatory non-compliance. It can also result in a massive waste of thousands of hours and millions of dollars.

Collibra enables enterprise-scale, continuous data quality in the cloud   

With its robust predictive data quality solution, Collibra addresses the issues at the root, enabling enterprise-scale, continuous data quality for BigQuery:

  • Native support for BigQuery helps you observe data and auto-discover data quality rules
  • ML-powered adaptive rules reduce error-prone manual rule writing by alerting the owner if quality drops
  • Unified, scalable solution across diverse data sources, with easy reconciliation between your source data storage and target data lake
  • Customizable options of quality dimensions
  • High-quality, cost-efficient scheduled and ad hoc data pipelines  
  • Self-service data quality empowers everyone in your organization to contribute to data quality, eliminating IT dependency and improving response time

Collibra supports data observability by proactively maintaining a healthy data system. By adopting Collibra data governance, data lineage, and data quality, you can monitor critical data sources and data elements. With the uniform interpretation of data through a common business glossary, you can ensure shared understanding and consistent usage. Intuitive workflows help you handle data quality issues efficiently and guarantee clean end-to-end data pipelines. Ultimately, enterprise-scale continuous data quality with Collibra and BigQuery delivers high-quality data pipelines for trusted analytics in the cloud, thus enabling more informed business decisions. 

More stories like this one

Jul 15, 2024 - 4 min read

How to observe data quality for better, more reliable AI

Read more
Mar 28, 2024 - 3 min read

Ensuring data reliability for AI-driven success: The critical role of data...

Read more
Jan 9, 2024 - 4 min read

Data quality: key for government agencies with a data mesh strategy

Read more