Data Lakes in the Cloud?
As a metaphor, it seems to defy the laws of physics. But today, more and more organizations are taking a look at how moving their data lake to the cloud can help them discover new efficiencies, lower costs, and drive more value from their data assets. Hadoop, the platform that arguably made the data lake possible, has been providing organizations with much-needed scalability and processing power as their data sources continue to multiply. But while Hadoop-based data lakes are great at allowing organizations to scale up, they haven’t really been that great at allowing organizations to scale down as data needs fluctuate. And, as data becomes increasingly cloud-based so are the services that organizations are calling on to parse and analyze that data in order to turn it into real business intelligence.
While these limitations aren’t necessarily deal-breakers for many organizations, others are looking for ways to apply the benefits of the cloud to their data lakes. Some are moving their data lake entirely to the cloud; others are looking at ways to manage data across a hybrid environment where their data lakes are both cloud-based on on-prem.
The goal of most organizations, of course, is to make good data more accessible to the people who need it to do their jobs from any device and any location. The best of today’s cloud platforms rely on object storage, which provides economies of scale, high availability across distributed networks and regions, and, perhaps most importantly, the metadata and other unique identifiers your data users need to discover and make sense of all that data.
And that’s a great thing. Because it opens the door to a better way to govern your data—and that’s the first step in driving real value from your data lake. With a way to apply governance—and implement a governed data catalog—across your data lake ecosystem, your data users are empowered to find the data they need from any system (remote desktop, mobile phone, or IoT device), understand the data they find, and trust that they have the best data for business-critical projects. They will also be better able to collaborate with each other to improve existing data assets or create new assets that will drive real business intelligence.
If you’re interested in how you can govern data more intelligently wherever it resides, download our eBook, Driving Value From Your Data Lake, to get started. You might not turn physics on its head, but you’ll discover better ways to help the geniuses at your organization give it a shot.
Maria Spanicciati is Content Manager and Editor of the Collibra blog.