4 Data Governance Best Practices to Kickstart your Data Governance Program
Last week, I attended the Gartner Data & Analytics Summit in Dallas, TX. About 2500+ professionals including leaders from analytics, business, information management, master data management, senior IT, and architects gathered to share and learn best practices for analytics, artificial intelligence, data governance, data quality, and more. As the saying goes, everything is bigger in Texas, and for data governance, it was the massive landscape of different industries and geographies coming together to create something significant and sustainable for effectively managing data. For many, the focus was to prepare challenges around data and analytics, including:
- Establishing effective information governance for better quality, privacy, and security
- Maximizing the impact of business intelligence and MDM programs
- Preparing for trends such as AI, Hadoop, Internet of Things (IoT) and blockchain
- Building and executing an effective, holistic data and analytics strategy
Organizations of all sizes and types were present at the conference to learn and shared about their data governance programs. It’s amazing to see how these organizations have transformed the process of enterprise data governance. And of course, it’s always such a great feeling to see the name Collibra being referenced by many industry leaders as an example to address their journey of data governance.
But as I talked with the delegates, many had one question in common:
What’s the data governance best practices you should start with when kicking off a governance program?
Essentially, there are 4 data governance best practices when launching a data governance program:
1. Focus on the operating model
The operating model is the basis for any data governance program. It includes activities such as defining enterprise roles and responsibilities across the different lines of business. The idea is to establish an enterprise governance structure. Depending on the type of organization, the structure could be centralized (if a central authority manages everything), decentralized (if operated by a decentralized or group of authorities), or federated (if controlled by independent or multiple groups with little or no shared ownership).
For example, recently, I worked on a data governance project with a major insurance provider in New York City. We started our initial engagement by interviewing different leaders from each line of business such as finance, insurance, sales, and marketing. In the end, we identified key representatives, one for the business track and other for the technical track. Sometimes these personas are also referred to as business stewards and technical stewards supporting parallel universes from both business perspective (owners of data), and information technology (owners of the infrastructure supporting data). Stewards form groups that roll up to the head of business lines, and business lines roll up to the leaders of business and IT.
As a data governance best practice, our client shared the idea of creating an enterprise data governance structure and formed a corporate data governance council reporting up to the Chief Data Officer.
Note: It is important to define the realm of ownership across your organization. Determining authority will help socialize your data governance program and establish intelligence structure to tackle data programs as one unit of force.
Members of the business and IT form different groups and align to a reporting structure often referred to as the data governance council or the data stewardship committee. It’s this council or committee where the majority of everyday data decisions are discussed and disseminated across the organization. The data governance council ensures formalized ownership, and determines the right tools and technology to support stewards so they can perform their job efficiently.
Here is a sample diagram showing an enterprise data governance organization:
2. Identify data domains
After establishing the data governance structure, the next step is to determine the data domains for each line of business. The most famous examples include customer, vendor, and product data domains. Depending on the type of industry, we come across different kinds of domains. But everything boils down to identifying domains and capturing information about a business and its consumers.
Considering the above example of a customer, vendor, and product, each data domain contains the following artifacts:
- Data owners
- Business glossaries
- Data dictionaries
- Business processes
- Data catalogs
- Reports catalogs
- Data quality scorecards
- Systems and applications
- Policies and standards
Typically, the identification of a data domain starts with a business need or problem.
For example, one of our clients, a major financial services institution, approached us with the following operational requirements:
- Increase customer experience
- Control over validating customer needs
- Manage customer usage
- Increase upsell on storage billing cycles
Note: Data governance is about people, processes, and technology. It can be enabled by identifying a data governance structure, assigning roles and responsibilities, and managing key information assets through a technology platform like Collibra.
Requirements were tied to the business problem our client was facing: they had to control visibility and understanding around its customers. Data was spread across multiple systems and applications with no defined ownership.
We helped create ownership by identifying key stakeholders, business processes, and datasets related to the customer domain and established control around its lifecycle. The idea is to have a clear understanding of where data comes from, who owns it, and when changes are made, who should be involved (all of which can be clearly defined and managed within the Collibra Enterprise Data Governance Center platform).
Here is a sample “Customer Life Time Value” diagram showing end to end data lineage across the customer’s data domain:
3. Identify critical data elements within the data domains
After defining the data domains, now, we are standing at the pinnacle. From here, evidently, we see data domains touching 10s, 100s, and 1000s of systems and applications containing key reports, critical data elements, business processes, and more. Obviously, we don’t want to boil the ocean by focusing on all the data artifacts at once. Instead, we should only identify what’s critical to the business.
For example, working with a federal government agency in Washington D.C., their data governance initiative was to attain commonality across the enterprise by creating a centralized platform to manage and control changes and providing visibility into critical data assets. A platform to serve as a vibrant ecosystem, fostering collaboration, lifecycle management, and retaining audit logs for the past vs. future analysis.
Another example is a technology company that needed to validate customer reports and related source systems. They started by identifying 10 key reports and documenting information about the systems of origin. Later, the initiative was scaled and called “the report certification” process applying to all reports showing certification and related source system information. Simply put, a report is not certified if the owners can’t show its traceability all the way down the system of origin.
4. Define control measurements
Above we learned about data governance structures, data domains, and identifying critical data elements. The next step is to set and maintain control to sustain the data governance program. After delivering data governance solutions across multiple industries including banking, healthcare, insurance, government, retail, manufacturing, and more, we have learned that data governance is not a one-time project. It is an ongoing program to fuel data-driven decision making and creating opportunities for business. It prepares an organization to meet business standards. Control measurements include the following key activities:
- Define automated workflow processes and thresholds for approval, escalation, review, voting, issue management and more
- Apply workflow processes to the governance structure, data domains, and critical data elements
- Develop reporting on the progress of steps 1 through step 4
- Capture feedback through automated workflow processes
For example, one of our technology customers out of California started with data governance in early 2010. They began by defining ownership, roles, and responsibilities, defining business data definitions and applying workflow processes to include data stewards in the change management process.
In the end, they established a robust data governance organization supporting an ongoing program and used Collibra as the system of record for managing all business data definitions and execution of control processes such as business data definition onboarding, approval, the collaboration of data stewards, and capturing feedback.
Here is a sample data steward dashboard showing reporting metrics (glossary, most viewed items, my tasks and issues, stewardship, reference data, quick actions to onboard metadata, and assets per domain):
There is more to the four data governance best practices as mentioned above for kicking off an enterprise data governance program. And depending on the industry, there are different approaches. The above steps stand valid for establishing effective information governance, which is the foundation for better quality, privacy, security, and many business intelligence and MDM programs. Data governance will help us prepare for the growing trends such as AI, Hadoop, IoT, and blockchain. It is just the start for many things data and analytics, and there is plenty more that can be discussed.
Does it sound like a fit for your organization? I would love to learn more about your plans for kicking off a data governance program.
Kash is a Customer Advisory Manager at Collibra. He is involved in research, development, and delivery of enterprise data governance solutions. He is also an instructor for Collibra University, a worldwide community of 1000+ users. Prior to joining Collibra, he was a researcher at University of Arkansas for Medical Sciences, a leading cancer research institute. Kash also has a masters degree in Information Quality from University of Arkansas for Little Rock in collaboration with MIT. He has published his research findings at ICIQ, SE Regional IDeA and ITNG conferences.