With the initial fanfare and “first-attempt” mistakes made with GenAI in our collective rearview mirror, forward-thinking enterprises are now getting serious about use cases to transform their business, seeking control over both models and data. This evolution represents a fundamental truth: AI is only as powerful as the data that feeds it.
The challenge? Users must be able to discover and understand high–quality, trusted data. Without this, even the most advanced AI and generative AI initiatives risk becoming expensive, misaligned, and ultimately ineffective. Poor quality data does not just degrade model performance; it leads to unreliable outputs that can misinform decisions, create compliance issues, and damage customer trust in AI-generated results.
Many organizations also struggle with fragmented data ownership. When different teams apply their own definitions, naming conventions, and metadata standards to the same data, it leads to duplication, inconsistencies, and confusion. This lack of alignment makes it difficult for users to find the right information, slows down innovation, and increases the complexity of governance. To fully realize the promise of AI, organizations need a unified approach to data discovery, stewardship, and metadata management.
That’s why we’re thrilled to announce our expanded partnership with AWS. As the first step in our collaboration, we’ve built a joint solution that demonstrates the integration between the Collibra Platform and the next generation of Amazon SageMaker, designed to align and synchronize business and technical metadata across organizational workflows. This joint solution was developed by both companies to help customers explore what’s possible. The solution code is available on Collibra’s Marketplace for easy access and implementation.
The Collibra + AWS advantage
The expanded partnership between Collibra and AWS represents a fundamental shift in how organizations approach data governance. Together, we offer a unified approach to transform data from a compliance burden into strategic assets.
Here’s how.
Aligning and synchronizing business and technical data
The integration between the Collibra Platform and Amazon SageMaker Catalog is designed to align and synchronize metadata across the organization. The result is consistent definitions, streamlined metadata management, and unified governance for all users, regardless of the platform they work in—whether within Collibra or across AWS environments.
Synchronization between the Collibra Platform and Amazon SageMaker Catalog is achieved through APIs, enabling a seamless bi-directional exchange of both business and technical metadata. Business metadata such as definitions, classifications and governance attributes are consistently shared across both platforms, ensuring a unified understanding of data context. This integration ensures both business and technical users have access to accurate, complete, and trusted information at every stage of the data lifecycle.
Common use cases include:
- Stakeholders from across the organization can understand the data, use cases and SageMaker models in Collibra to address discoverability and visibility to an organization’s uses of AI.
- Data scientists, AI specialists and data engineers can stay within Amazon SageMaker to implement their analytics, machine learning, and generative AI use cases. They can discover, request, and access data with additional business context.
- Data governance and business teams continue to discover data, manage metadata, workflows, ownership structures, and approval processes directly in Collibra. This allows them to maintain control using familiar tools and practices, with traceability and auditability.
The integration provides significant value for technical users by offering immediate access to business-approved metadata directly within their Amazon SageMaker Projects. This simplifies their workflows, reduces confusion over data definitions, prevents redundant efforts and accelerates project development.
For business users and data stewards, the integration delivers greater transparency and improved control over technical data assets as well as AI use cases. It allows them to manage, monitor and govern asset usage comprehensively within Collibra or Amazon SageMaker Catalog, ensuring compliance, simplifying audits, and maintaining clear alignment between business governance and technical implementations.
Collibra and Amazon SageMaker Catalog sync
For years, many organizations have leveraged Collibra to provide business semantics, including definitions, owners, data quality rules and governance-approved policies, for their core data assets.
Generative AI introduces new technical personas into the data landscape. Data engineers, analysts, developers, and generative AI specialists working in Amazon SageMaker all need access to the same trusted context that business teams depend on. Teams avoid duplicating assets, misinterpreting key metrics, and losing valuable time due to confusion and misalignment.
The integration solution between Amazon SageMaker Catalog and Collibra enables automated synchronization of metadata, ensuring consistency across both platforms. This process begins with the creation of metadata in either system, such as business glossary terms, asset descriptions, technical attributes, and classification details. The metadata is replicated using the native APIs of Amazon SageMaker and Collibra. Synchronization runs automatically, and users can configure how often it occurs based on their requirements.
As part of this integration, glossary terms and their descriptions are synchronized between the two platforms. Additional business metadata, such as data categories assigned to columns, is also replicated. Glossary terms and additional business metadata are linked to relevant data assets, including tables and columns, which helps establish clear business context. Descriptions for tables and columns are kept consistent between platforms to ensure alignment across teams.
This automated synchronization ensures that metadata remains accurate, current, and accessible to both technical and business users, regardless of the platform they use.
Using a straightforward, bidirectional sync between Collibra and Amazon SageMaker Catalog keeps everyone aligned and provides:
- One source of truth: Business metadata created in Collibra appears directly in Amazon SageMaker Catalog, so technical users see the exact definitions governance has approved, eliminating manual copying and exporting of CSV files.
- Smooth collaboration: When engineers register new datasets or models in Amazon SageMaker Catalog, the metadata they provide, including both business and technical information, is automatically synchronized back into Collibra. Data stewards can then enhance this metadata before publishing the assets for use by other teams across the organization.
- Transparency for Gen AI: Unified metadata enables prompt engineers and model builders to instantly trace data provenance and compliance attributes, simplifying responsible AI evaluations.
- No duplication: Shared identifiers prevent multiple teams from cataloging the same asset under different names, ensuring that KPI dashboards and Gen AI pipelines remain aligned and consistent.
Subscription and approval flow
Once object metadata is synchronized from Amazon SageMaker Catalog into Collibra, customers can use their existing approval workflows directly within the Collibra platform. Organizations that already rely on Collibra as the central hub for managing and approving dataset access can now seamlessly extend those approvals to assets stored in AWS. This is made possible through Amazon SageMaker’s access grant mechanism, which supports resources such as Glue tables, Redshift tables, and other relevant AWS data sources.
This integration simplifies the entire access request process for both technical and business users. Data consumers can conveniently request access to AWS-based resources either from within Collibra or directly through Amazon SageMaker. Regardless of which platform they use, business metadata remains synchronized and consistent, ensuring users have access to the same trusted context across tools.
Data stewards and dataset owners can review, approve, or deny access requests without leaving the Collibra environment, using familiar governance workflows. By centralizing approvals and maintaining metadata alignment across systems, organizations improve transparency, strengthen compliance, and foster more effective collaboration across data governance and analytics teams.
The path forward
The future belongs to organizations that treat data governance as foundational rather than aspirational. With Collibra and AWS, enterprises gain the confidence to pursue transformative AI initiatives knowing their data foundation is unshakeable.
The work begins now. Not with another pilot or proof of concept, but with a commitment to unified data governance that scales with your ambitions. Because in the race to AI advantage, victory doesn’t go to the swift, it goes to those who build on solid ground.
Learn more about the Collibra and AWS partnership.