Get a unified view of trusted data with Databricks and Collibra


Databricks is an industry-leading data analytics platform that delivers a true lakehouse data architecture. Enterprises use Databricks to build and deploy data engineering workflows, LLM and machine learning models, and analytics dashboards.

Collibra’s latest metadata integration with Databricks Unity Catalog provides deeper visibility and clarity across Databricks and the rest of your data ecosystem, enabling your team to better discover, understand, and protect data across more sources. Acting as an enterprise catalog or  “catalog of catalogs”, Collibra provides a complete picture of data assets across the organization – on-prem, public cloud or private cloud.

Improve enterprise decision-making

It’s 2023 and every enterprise needs to get more out of data, to monetize data, to leverage data to make better decisions. There’s no other option if the goal is to remain competitive.

But you can’t make better decisions if you can’t easily discover and trust your data.

Our newest metadata integration with the Databricks Unity Catalog does just that — it delivers a holistic view of all your data that enables your teams to make more informed decisions. The integration enables you to ingest metadata asset types including databases, schema, tables and columns. It ensures your organization can now catalog assets from Databricks and govern them. Now, all your data citizens can leverage Collibra’s intuitive data shopping experience to find, understand, and trust the data they need regardless of where the data resides. The same discovery process allows the user to select and prioritize high-value datasets for migration to the Databricks Lakehouse to take advantage of Databricks capabilities.

Collibra + Databricks: Better together

Databricks and Collibra continue to collaborate to deliver expanding integrations between the Databricks Lakehouse Platform and the Collibra Data Intelligence Cloud. In addition to the metadata integration, Collibra now offers Data Quality Pushdown (beta) capabilities for Databricks that deliver high quality data to users by securely processing data directly in Databricks for faster results and reduced costs.

With Collibra Data Quality, organizations can prevent and track data issues using a solution that automatically discovers rules and detects anomalies so you can find and fix bad data before it impacts your business. You can increase efficiency with the ability to scan billions of records in seconds without a dependency on Spark, so you can process, access and use your data faster. 

This solution allows you to lower costs by securely leveraging native DQ processing for data within Databricks, avoiding fees for data egress.

Our continued partnership with Databricks will continue to drive improved accessibility, data provenance, quality, and security by providing access controls, cross-system data lineage, sensitive data classification, data quality automation, and enhancing enterprise-wide collaboration around datasets, workloads, notebooks, dashboards and models.

The more complex your data ecosystem becomes, the more you will need a unified data intelligence platform that can connect to any data source, allowing your data citizens to discover, understand, trust, and access data at scale. 

With our latest 2023.05 release — including our Databricks metadata integration — you and your team can more confidently navigate today’s complex data landscapes.

Learn more about our partnership with Databricks.

Want to learn more about the partnership?

Check it out

Related resources


Leveling up: Recent developments between Databricks and Collibra


Announcing Collibra Data Quality Pushdown for Databricks (in Beta)


Collibra and Databricks: Taking the partnership to the next level with Databricks SQL

View all resources

Want to learn more about the partnership?

Check it out

More stories like this one

May 4, 2021 - 2 min read

Benefits of using Collibra with Databricks on Google Cloud

Read more