Gain full visibility across your data landscape, find meaning in your data and improve the quality of business decisions.
Discover and download solutions and pre-built integrations for the Collibra Platform.
Get unparalleled value through the combined expertise and unique strengths of our people and technology.
See how security plays a key role in everything from how we build and deliver our platform to how we hire and train employees.
Collibra Privacy & Risk
Discover and understand data that matters so you can generate impactful insights that drive business value.
Understand your ever-growing amount of data in a way that scales with growth and change.
Show how data sets are built, aggregated, sourced and used, providing complete, end-to-end lineage visualization.
Build customer trust by operationalizing privacy policies and scaling compliance across new regulations.
Modernize your operations with a solution that is scalable, accessible and resilient: data in the cloud.
Drive digital growth and customer engagement by breaking down data silos and adding value to customer interactions.
Fuel your self-services analytics with the right data to develop unique business insights.
Innovate for the future while successfully navigating the complex web of regulations.
Transform decision making in the public sector with secure Data Intelligence that is FedRAMP Authorized.
Cloud ready data
Government and public sector
Tap into our knowledge base by connecting, sharing and learning from your peers in our Data Citizens community.
See how Collibra is helping global organizations unlock the value of their data.
Find the resources you need to accelerate time to value and fuel your growth.
Learn from the leaders in Data Intelligence through our individual courses, learning paths, and certification programs.
Data Citizens '20
Take your data strategy to the next level by arming yourself with the knowledge you need to achieve Data Intelligence.
Get advice, tips and tricks from our product experts and industry thought leaders to learn how to make your data meaningful.
Join the world’s largest virtual gathering of professionals focused on empowering businesses to deliver on strategic goals through Data Intelligence.
Check our upcoming events calendar to discover exciting opportunities to learn from our product and industry experts.
Connect the right data, insights, algorithms and people to optimize processes, increase efficiency and drive innovation.
Read our latest announcements, news coverage and thought leadership articles.
Find an opportunity to challenge and be challenged, and work with some of the most talented people in the business.
Get in touch with a member of our global team by locating an office near you, calling us or sending an email.
In 1941, renowned Argentine author Jorge Luis Borges published a short story entitled “The Library of Babel.” The story tells the tale of a universe consisting of an unimaginable stretch of hexagonal rooms, each of which hold the bare necessities for human survival and four walls of bookshelves. Though the order and content of the books are arbitrary and seemingly entirely meaningless, the inhabitants believe that the books contain every possible ordering of just 25 basic characters (22 letters, the period, the comma, and the space). Imagine how many books that would be! Some of the books are pure gibberish, while others are highly-relevant and useful. The latter may describe predictions of the future and biographies of any person, including slightly different or erroneous versions as well as translations in all languages.
Surely a reader entering the library would find the sheer volume of books unmanageable. It’s a pure glut of information, with no way to distinguish the meaningful books from the useless ones. But as the story progresses, the librarians begin to take matters into their own hands. In a desperate attempt to make sense of the litany of information available, they adopt extreme behaviors. Some become “Purifiers,” librarians who arbitrarily purge books they deem as nonsense. They define the criteria for what is good – and what is not – with little to no input from others. Others, in contrast, believe that somewhere, hidden in the vast realm of chaos, there is a book that catalogs all the library’s contents. And that a “man of the book” who has found – and read – this index and translated it into something useful for people entering the library. Clearly, this index would be helpful to people trying to find or understand the library’s contents. In both cases, the goal is to gain control over the library and the vast amount of books it contains so that the readers can find what they need. But the approaches are, indeed, very different.
Now, you’re probably wondering what “The Library of Babel” has to do with data. Well, the parallels are actually quite striking. Think about your data lake. In theory, it contains nearly every piece of data in your organization. Some of the data is meaningful, understood, and trusted. Other data is gibberish because it lacks meaning and trust. Both types of data live together in the data lake, and distinguishing the good from the bad is no simple task.
Moreover, organizations must also look outward as there are many more hexagonal rooms to scourge through. IDC, a market-research firm, predicts that the “digital universe” (the data created and copied every year) will reach 180 zettabytes (i.e., 180 followed by 21 zeros) in 2025 (see chart). Pumping it all through a broadband internet connection would take over 450m years. (paragraph From the Economist). In fact, I believe that the real era of big data is still to come.
Moreover, the quality of data has changed. They are no longer blocks of structured information, including databases, data warehouses, and other well-defined master customer records with age, sex, and home address. It is more about finding and rapidly understanding real-time streams of data: social media updates, mass transit movements, and the hundreds of sensors in jet engines and public places.
Now think about the people in your organization who manage and use the data. Surely there are “Purifiers” – the people who purge data at will in an effort to control the data chaos that exists within the lake. They are the data authority – the ones who decide what data is right – and what data is not. They define their own standards for quality without engaging with others across the business. And they refuse to compromise when data fails to comply. Like the “Purifiers” in Borges’ story, they purge data deemed unworthy. To me, they are not the best people to manage your data. Why? Because even though their intentions are pure, they lack collaboration. And that means that others do not have a say in which data stays and which data goes. And it’s possible that the data the Purifiers expel is data that is critical to a certain area of the business.
There are others who manage data who are collaborative in nature – the people who embrace data citizenship. They believe it’s possible to get a grip on the data by working together across the organization to understand the data’s meaning and use. These people believe there is a way to control the data before it enters the data lake. They are advocates for defining rules and operating models about which data enters the data lake. And they work hard so that all users can find the data, understand what it means, and trust that it is right. They are searching for the mythical “the man of the book” so that they, too, can uncover an index of all the data hidden in the depths of the data lake.
In the world of data, we know that no such book – nor “man of the book” – exists. However, many organizations are using a data catalog to help them gain control over the glut of information stuffed into their data lakes. A data catalog helps organizations index the data and link it to agreed-upon definitions about quality, trustworthiness, and use. It helps users to determine which data is fit to use – and which they should discard because it’s incomplete or irrelevant to the analysis at hand. It provides the collaboration that is lacking when a “Purifier” takes control. And it helps all data users to find, understand, and trust their data.
How do you manage your data lake? With a “Purifier” or a “Man of the Book?”
Pieter leads the company’s Research & Education group, including Collibra University, an online learning platform for data governance and data science education.
© 2020 Collibra. All Rights Reserved.
A message to our Collibra community on COVID-19. Read more from our CEO.