So You Think You Know What Data Governance Is?
Data governance is a relatively new, and very hot topic today. So it comes as no surprise that there are a zillion different definitions of what data governance is. Most of these definitions are self-serving. People with solutions trying to adapt the category definition they have been in for years to encompass data governance. In the end, they foster much more confusion than clarity. One thing is certain: data governance isn’t going away. In fact, data governance is assuming a more important role as organizations strive to use data to improve their businesses. And as data continues to take center stage as a strategic business asset, the need for the organization to understand how to govern and manage their data grows in importance.
First off, it is important to distinguish between governance and management. This distinction occurs not just because of technical or functional factors, but because in almost every organization around the world, the people who do one are not the people who do the other. Governance is fundamentally about decisions about the data. This includes deciding what the data means, where it should be used, how accurate it needs to be, and what rules it needs to follow. This has other implications as well. For example, if you want the data to be used in the right way, you need to be able to find the data that can be used for your purpose, instead of just taking what is at hand. If you want to be able to determine how accurate the data needs to be, you also need to have a way to mitigate inaccuracies. So these decision processes imply a whole series of other activities that are performed by the business people who know and use the data. It is that last statement that is the key: the business people who know and use the data.
Once this is clearly in mind, then the specific capabilities that make up data governance come into focus. Some of them are very well understood:
- Business Glossary: a record of the meaning of the data, so all the variations of it can be distinguished and erroneous comparisons eliminated
- Data Quality: the condition of the data and its level of trustworthiness and adherence to policies
- Roles & Responsibilities: the organizational structure that determines who takes responsibility for the care and maintenance of the data
These three elements are critical, but they are simply not enough. Business users not only want to know what the data means, but they want to truly understand it. They want to not only see that its quality is being monitored, but they want to be empowered to fix problems that they find. And they want to not only understand who is responsible for data, but what systems, business processes organizations, 3rd party sources, applications, business units, etc. have relationships with particular sets of data. This requires three things:
- A way to find the data
- A way to identify and resolve data problems
- A way to create associations across all aspects of the data
I Can’t Find the $%!* Data!
The inability to find the data is one of the top challenges I hear from customers and prospects. Solving the “I can’t find the $%!* data” problem takes more than a BI tool and more than an IT solution. It takes a solution that is accessible and that works the way business users think – by revealing the meaning of the data set as well as how that data set relates to other data sets across the organization. That’s where a data catalog comes in. For data governance to work, it needs to include a governed data catalog that fills the needs of the business.
Governed data catalogs give business users the ability to not only locate the data, but also to understand what it means. And a good data catalog takes it one step further by grouping useful collections of data together into multi-faceted sets based on what other users have used in the past. This growing body of knowledge based on what others used, searched for, and assembled ultimately becomes more useful to the broader organization.
Identifying and Resolving Data Problems
Many organizations today are finding that their data is in a state of chaos. Nobody knows where the data is, who has access to it, what it means, or how it’s used and protected. And controlling that chaos is a primary driver for data governance. Because at its core, data governance is about process. And these processes help to determine who can access the data, what they can use it for, how it’s stored and managed, and more.
But too much process can be as much of a problem as a lack of process. It’s about weighing the control needed to ensure data is trustworthy and protected with the flexibility needed to ensure people can use data to drive innovation. It’s about balancing a top-down and bottoms-up approach. This hybrid approach provides control where it’s needed (think regulatory compliance and corporate policies) alongside the flexibility needed to account for local variations, definitions, and capabilities in the use of the data.
Further, for everyone in the organization to trust the data, they need to actively participate in caring for the data. If they know how to fix something, then they are more willing to trust the data. Which takes us back to process. Having a process for data citizens to identify data issues and resolve them is critical to ensuring everyone has confidence in their data.
Creating Associations Across the Data
This aspect of data governance may sound snooze-worthy, but trust me, it’s not. Creating associations across the data is about metadata. But it’s more than your traditional definition of metadata. It’s about linking technical terms to not only the technical implementation of the data, but to everything that produces, consumes or influences the data. People often use lineage as a shorthand for these relationships, but to gain understanding and trust in data, you need far more than a from-to diagram. Relationships such as similar meanings, relationships to business processes, departments, applications, business units, products, and geographies are all necessary. Without it, the increasing scope and scale of data just breeds confusion. By creating these associations and relationships we lead the users of the data on a path to the information that they need to use data effectively. And after all, it is in its use that value is realized.
So if you need to define what data governance is, be sure you tell the full story. There’s no doubt that data quality and policies, data’s meaning in a glossary, and roles and responsibilities around data are important. But equally – if not more so – is the ability to identify and resolve data issues, the ability to find the data, and the ability to create associations across the data. Deliver these capabilities to your organization, and you will set your data governance program apart from the rest.
Dan is an experienced software industry expert, with broad and deep experience in the data and software markets. Dan began his career as a developer and product manager for BI and reporting software, moved into integration and middleware. He spent several years at Gartner as an industry analyst in the software space, covering data management, middleware, application architecture, and SAP. He has spent time at various software companies, and is currently Product Evangelist and Influencer Relations at Collibra.