Establishing data as a strategic asset is not easy and depends on a lot of collaboration within an organization. The advantage is that once a system of record is in place for data, your organization can implement many valuable data governance use cases. In this post, I’m highlighting the top 3 of most value adding data governance use cases. Each one depends on a data governance and stewardship function being in place. For example, if you don’t have an approved taxonomy of what data exists in the business, then you can’t adequately tag data as it flows into your lake.
Data Lake Management: Prevent a Data Swamp
A data lake is a storage repository that holds a vast amount of raw data in its native format, including structured, semi-structured, and unstructured data. The data structure and requirements are not defined until the data is needed.
Often times, organizations interpret the above definition as a reason to dump any data in the lake and let the consumer worry about the rest. This is exactly how data swamps are born. To avoid a swamp, a data lake needs to be governed, starting from the ingestion of data.
Some key questions this governance function will need to answer are:
- Should this data be centrally stored? Maybe it is better to keep it within the LOB or maybe this not even our data?
- Does the data comply to our policies for usage, privacy, and more?
- Is the data aligned to our business glossary so people will actually find and understand what’s in the lake?
- Is the data aligned to a data dictionary?
- Is the data aligned to data quality requirements?
Once you work through and approve the above questions, the data can go through testing and then into the lake where your business users can find and shop for the data.
Data Distribution: Search and Shop for Data
Imagine the following scenario: a new product is going live and requires a very tailored marketing campaign – as tailored as possible to the customers your organization identifies as most likely to buy this type of product. This marketing campaign will rely on data. And it will include data sourced from several places like Adobe’s Analytics and one or more internal CRM systems (let’s keep it simple).
Without data governance, the search for data begins a soon as the project starts up. It’s not uncommon to spend a month navigating the different LOB’s, systems, and meeting rooms to find and understand what data should – and could – be used.
But let’s assume your organization has established data as a strategic asset and has put in place a system of record for data.
In this case, because digital marketing uses Adobe and CRM data so often, a data sharing agreement is already in place. This is basically an agreement that data sourced from Adobe by the content marketing department can be consumed by digital marketing. The agreement details everything from types of feeds that are available, refresh frequency, ownership of data, and probably 50 other attributes. Think of it as a contract and SLA around expected data.
The first thing one of the data citizens in digital marketing would do is check the data catalog: an enterprise-wide view on data sets and feeds available for distribution, as well as a capability to construct new data sets to consume. The crucial role governance plays here is that we can’t just provide a dump of all (duplicate and redundant) datasets out there. Our data citizens expect data fit for purpose, and they can be sure of this by checking the context of the feed (which is tagged with approved business terms), verify the quality, sample the data, and see what others have done with this data.
When the citizens find the right feed or data set, they can then add the assets to their shopping cart and request access. While requesting access, they provide the desired format the data should be delivered in and things like related data sharing agreement, how long the data is needed for, and more.
Depending on the level of integration, distribution of the data can mean a task being sent to the data custodian to provide access or a fully automated permission change and connection to the data is provided.
Governance, in this case, works as an accelerator (an hour instead of a month to find the right data) and an insurance policy (fit for purpose data was used, under a clear mandate from the business).
Report Certification: Reduce your Report Stack by 80%
If you’ve created a great asset that will be used across the enterprise, then why not prove that it is fit for purpose? Why risk stepping into a meeting where someone has a similar report with different numbers? Or hearing a different unit recreated the same report because they weren’t sure what data was used?
Just like an auditor would stamp a report as correct, the data authority can do the same to show the asset can be trusted from the bottom up. This is not just a good way to show value, but also a great way to weed out reports (or other assets) that are redundant, out of date, or just wrong.
The basic steps of certification:
- Do we have an owner for this asset?
- Can he or his delegate help identify the critical data elements?
- Can the data be traced to its source?
- Do we have standards in place, such as data quality rules?
- Can we prove that the standards have been applied and can they be measured at the different hops as the data flows through the organization?
So as you can see, expanding your governance program to include data governance use cases such as cleaning the data swamp, searching and shopping for data, and reducing your report stack is a great way to not only realize greater ROI. It’s also a great way to show real value to the business.
Koen is responsible for delivering real-world data governance solutions to the problems raised by different regulations such as BCBS239, Solvency II, and GDPR. Before joining Collibra, he worked as principal consultant for Wolters Kluwer Financial Services and Financial Architects on Basel, Compliance, and Solvency projects worldwide.