Contact Us
Call us
Offices
Email
United States
+1 646 893 3042
Accounts receivable department
+1 646 974 0772
All other
+32 2 793 02 19
North America: USA and Canada
Collibra Inc.
61 Broadway, 31st Floor
New York, NY 10006 - USA
EMEA: Belgium
Collibra NV
Picardstraat 11 B 205,
1000 Brussels - BELGIUM
View all
Register for access
Register for access
  • Dashboard
  • University
  • Data Citizens
  • Marketplace
  • Product Resources
  • Support
  • Developer Portal
By signing up you agree to Collibra's Privacy Policy.
My Profile
John Smith
name@company.com
Data Scientist, USA
Interests
Cloud-Ready Data
Digital Transformation
Data Governance

Enabling enterprise-scale data quality: Collibra + BigQuery

Businesses are adopting the cloud-first approach at an unprecedented speed. They are keen to leverage the scalability and agility of the platforms like BigQuery for real-time analytics powering their business decisions. Gartner Research recognizes this trend of cloud-native platforms enabling you to respond to rapid digital change in 2022 and beyond.

Data quality while moving to the cloud

Although the cloud provides unified and compliant data access, it must be complemented with ensuring high data quality for delivering trusted analytics.

Quality of data is its fitness for use, based on the dimensions that matter to the organization. The dimensions often in focus are completeness, accuracy, and consistency. Data accessibility, timeliness, and relevance also play a significant role in delivering trusted analytics.

Migration plans typically ensure that data quality is not affected during the move. But the definitions and measurement metrics across the silos can introduce inconsistencies in data. In place of the traditional lift-and-shift approach, the Collibra three-step approach leverages data catalog, data governance, and data quality. This approach includes data source registration, governed ingestion and a governed data catalog:

  1. Data source registration to get a complete understanding of the enterprise data
  2. Governed ingestion and transformation with ML-powered, autogenerated, adaptive data quality rules alerting issues at the source
  3. Governed catalog for the data lake, with lineage, usage policies, and continuous data quality, for timely and relevant data access

You can get more details of this cloud migration approach in the whitepaper from Collibra and Google Cloud, Three key steps for successfully moving to the cloud.

The conventional approach to cloud data quality  

A recent Harvard Business Review study reported that 47% of recently created data records have at least one critical error, which does not bode well for driving trusted analytics. The key here is to monitor a data lake for incoming data continuously and fix these errors quickly. But that does not happen for most organizations, and up to 25% of revenue is lost due to bad data.

The three major issues organizations face with the conventional approach to data quality are:

  • Manual rule writing and management: Manual rule writing is reactive and inefficient for the speed and growing volume of data in the cloud. You will find it practically impossible to write thousands of rules and manage them constantly.
  • Limited connectivity and scalability: Most conventional solutions cannot run data quality rules on files and streaming data. The typical coverage of only 30% cannot guarantee trusted data. The scale of data in the cloud further limits the use of conventional solutions. 
  • Ad hoc and manual data quality management: Typically, IT and technical users work on data quality issues. Since they are not typically data producers or consumers, they have difficulties assessing the impact of the issues. The ad hoc approach to data quality rarely produces the desired results in the expected timeframe.

It is important to remember that with a data lake,  as you acquire new data, you also acquire new errors that impact your work. The conventional approach thus runs a huge risk of regulatory non-compliance. It can also result in a massive waste of thousands of hours and millions of dollars.

Collibra enables enterprise-scale, continuous data quality in the cloud   

With its robust predictive data quality solution, Collibra addresses the issues at the root, enabling enterprise-scale, continuous data quality for BigQuery:

  • Native support for BigQuery helps you observe data and auto-discover data quality rules
  • ML-powered adaptive rules reduce error-prone manual rule writing by alerting the owner if quality drops
  • Unified, scalable solution across diverse data sources, with easy reconciliation between your source data storage and target data lake
  • Customizable options of quality dimensions
  • High-quality, cost-efficient scheduled and ad hoc data pipelines  
  • Self-service data quality empowers everyone in your organization to contribute to data quality, eliminating IT dependency and improving response time

Collibra supports data observability by proactively maintaining a healthy data system. By adopting Collibra data governance, data lineage, and data quality, you can monitor critical data sources and data elements. With the uniform interpretation of data through a common business glossary, you can ensure shared understanding and consistent usage. Intuitive workflows help you handle data quality issues efficiently and guarantee clean end-to-end data pipelines. Ultimately, enterprise-scale continuous data quality with Collibra and BigQuery delivers high-quality data pipelines for trusted analytics in the cloud, thus enabling more informed business decisions. 

Related resources

Analyst report

Make a business case for data quality

Video/Webinar

Data Quality without complexity for BigQuery with Collibra

E-book

Predictive data quality and observability

View all resources

More stories like this one

Sep 16, 2022 - 5 min read

10 Tips on how to improve data quality

Read more
Arrow
Stairs descending from a high point of view.
Aug 29, 2022 - 8 min read

The 6 dimensions of data quality

Read more
Arrow
Jul 7, 2022 - 3 min read

Unlock the value of Collibra Data Quality & Observability

Read more
Arrow