You Have a Data Lake, Now What?
When you started your data lake you likely had some expectations. Now that your data is no longer in silos, your users should be able to find exactly what they need and trust that it’s accurate; however, simply moving your data into a data lake is just not enough. Without a clear purpose or enough details to give your data context, you’re left with a data swamp. In other words, your data is still hard to find and understand, which makes your users less likely to actually use it at all. Not much different from when it was stuck in silos, is it?
Your data lake strategy must put the needs of the people using it first in order to be a true success. This is where a data governance and catalog solution is key because it allows you to add enough detail to your data to bring it to life and make it more valuable to your users. The right solution will give you the ability to align your data with business goals, quickly ingest and certify assets, and establish standards for repeatable data models.
The first step is to find a tool that is supported by real human expertise in data governance. This is important because, as thrilling as the idea of robots taking over the world may be, we’re not quite there yet and human knowledge is still invaluable. It’s also a great way to reduce tribal knowledge and augment your data with crowdsourced intelligence to enhance your entire data ecosystem.
Once you’ve found your solution, it’s time to dig in. No data mess is too big to overcome when you have the right teams and tools involved. This step requires sifting through your data to determine exactly what stays in the lake and what gets tossed. While you’re sifting through your lake, you can begin to add metadata and trace your data’s lineage to get a complete understanding of the data you have.
After you have your data in order, you can work on improving your users’ experiences with a structured catalog to make it easier to engage with your data. This will give you data sets that are profiled and aligned to specific challenges so anyone can find and understand the data.
The final step to data lake nirvana is governance. Applying the right governance will allow you to focus your user experiences, share data from third-party sources, and provide a trusted channel to share your data insights without compromising protection and security.
Now you have a data lake, you’re off to a good start. All that’s left to do is to find the right data governance and catalog solution that puts people first to give your users exactly what they need to get more from your data.
Maria Spanicciati is Content Manager and Editor of the Collibra blog.