IT’S YOUR DATA, AND THIS IS YOUR BLOG

Welcome to the Collibra Blog, where CDOs, data stewards, and data citizens go to learn about true data governance.

Business Glossary Value in BI, Analytics, and Big Data Programs

I wrote an article a year ago about the Business Glossary Prime Directive for TDAN.com. Simply stated, the Business Glossary Prime Directive is “to eliminate semantic confusion across the enterprise.” There are many implications involved in achieving the elimination of semantic confusion; for data governance, it means that each business term has a unique name, a single definition, value set, set of business rules, authoritative source, and accountable party identified. For our analytics program it means that we have a single definition, value set, set of business and quality rules, and authoritative source for all our business dimensions and facts. Thus, having data governance involved in governing our analytics dimensions and facts is critical to the success of the analytics projects.

From my 20+ years as a consultant implementing analytics projects, I have found that a significant root cause of the challenges in BI analytics success is directly related to the lack of agreement in the definition of the dimensions and facts. Oh, sure, we all think we agree on what a customer is or how to compute customer-lifetime-value, but often we don’t. That is why we have conflicting numbers on our analytics reports for what is seemly the same metric. Yet, none of our differing definitions are wrong, per se, they are just coming from different views leading to semantic confusion. While the problem can be simply stated, the solutions are often very complex and that is why we struggle to achieve success with an analytics project without data governance.

Your business glossary should focus on enabling all analytics processes and people to easily find, understand, and trust the data they should be using, and not the data they shouldn’t be using. Effective governance allows the right people to use the right data for the right business purpose, at the right time with the right technology. We could use data incorrectly if the definition, value set, business rules, authoritative source, and usage limitations aren’t clear. We could make erroneous decisions that increase the risks in doing business. Without a good understanding of the data and its usage, we could create very sexy, technically correct but very inaccurate reporting.

I find that the effective and expedient approach is to begin with the data we have or want to have, both dimensions and facts, on a set of analytics reports. Let’s call this data our critical data elements or CDEs. It’s easier for our business and analytics teams to talk about the CDEs that are needed for each report. You can use this approach with data analysts, business managers, and data scientists. The type of source for the data does not matter to the approach—the data can be from a data lake, a data mart, an application, or even a spreadsheet.

We look at the CDEs for determination of a scope of effort, the scope of the iteration to implement governance. The CDEs at this point can be discussed as the data on the report, the organization and filtering for the report, the columns, the computations of each, and the summarizations of the report. I suggest we label the data governance processes as “governance as you need it.” You need to govern the CDE that will be included in a report or set of reports. This is a very practical approach that seems to resonate with business teams. Using this approach, you can control the scope of the governance project. Try to keep the scope to 50-75 CDEs. This should allow you to complete an implementation in two to four months.

You want relatively short implementation timeframes to:

  • Produce business value quickly
  • Show progress in the data governance program
  • Show progress while developing your governance processes and educating the business and technical staff on the processes and technology
  • Establish well-understood and trusted reports
  • Reduce the conflicting reporting and political issues
  • Consistently improve the elimination of semantic issues across the enterprise

Again, the data governance objectives are to leverage the business glossary to help data and reporting consumers to find, understand, and trust the data under governance.

Okay, now you could say, “well great, Lowell, I have some CDEs, but now what? How do we get approved data assets under governance and certified BI/analytics reporting?” Well, here’s your answer.

The CDEs provide a scope for the analytics and data governance teams, working in parallel to complete an implementation. I’m going to focus on the activities of the data governance team, but both teams must work together.

Once we have a list of CDEs, then the data governance team can execute a top-down governance effort similar to the following:

  1. Engage with the business stewardship resources to define and document the CDEs as business assets (define business assets).
      1. Each CDE should be defined as a business term in the business glossary. This is for both the analytics dimensions and facts, as well as the calculation or model components even if they are not persisted in a database.
      2. Abbreviations, business rules, quality rules, and quality thresholds should be documented.
      3. Roles such as data owner, accountable person, and business steward are defined.
      4. Any CDE that has security or privacy constraints should be tagged in the business glossary.
      5. Standards and associated policies should be defined as well.
  2. Engage with the technical stewardship resources of the CDE source databases/applications to define and document the physical data assets and IT assets (define the data assets).
      1. All CDE that are persisted in a database column will be documented as data assets.
      2. All physical characteristics, data values, rules, and domains should be documented.
      3. Technical stewards, application owners, etc. should be defined.
  3. Engage the business and technical stewards to map the relationship of the data assets and columns on the reports to the business assets (map business and data assets).
      1. An analysis needs to be done on the business assets and data assets to ensure that all assets are mapped and that all columns on the reports are defined as assets in the business glossary.
      2. We may find that we missed defining a business asset for each data asset.
      3. We may have report columns that are just calculations or components in a model and thus assets are mapped to the computation of the report column (such as percentages or averages).
      4. Technical stewards and accountable individuals should be defined.
  4. Document the data quality metrics for the data assets (determine data quality fit for purpose).
      1. Data quality metrics should be computed with the business rules established at the business asset level.
      2. Where one business asset is mapped to multiple data assets, data quality must be computed at each physical source. This will aid the stewards to determine the best authoritative source for reporting.
      3. Data quality fit for purpose should be discussed with the analytics consumers to define the fit for purpose quality expectations needed for trust in data usage.
      4. Data stewards and owners should establish processes to meet the fit for purpose quality.
  5. Engage with the subject matter experts or technical data stewards to define and document the data lineage and traceability of the data assets (support consumers understanding of trust).
      1. Import data integration metadata to help define the lineage and traceability.
  6. Define analytical reports in a report catalog (define critical reporting in a catalog).
      1. This is where the analytics development team and the data governance team have to coordinate.
      2. The report catalog is a responsibility of the report developers, not the data governance team. I often put this responsibility against the business teams or BI/analytics teams.
      3. Self-service reporting can leverage the report catalog and enhance the catalog as well.
  7. Document all report elements or the columns in each report and map those to data assets or business assets (define report elements, lineage, and traceability). This completes the mapping of the business asset to data asset to report assets.
      1. Request the report developer to define the report element, rules, and any computations.
      2. Request the report developer to have business stewards, technical stewards, and report responsible party to approve the lineage of report element, to report, to data asset, to the business asset.
      3. Ensure that all mappings, assets, lineage, and traceability are documented in the business glossary.
      4. Request the stewards and analytics team to certify each report. Given that we know the full traceability of the data and all assets, we can consider the reports to be certified.
      5. Full testing and acceptance of the BI/analytics reports have to be done as well. Any changes to the documented assets will have to be changed in the business glossary as well as the analytics reporting application.

Wow, that was easy to put on paper; however, it is not so easy to organize, communicate, educate, and have the resources complete data governance activities in a timely manner. Yet, if you can get the governance efforts completed in coordination with the BI/analytics project, then you should provide significant value and likely be considered much more successful. Just don’t forget that this maturity of alignment is required before you can achieve the Prime Directive. And, remember: stay calm and allow your business glossary to prosper.

Lowell is responsible for directing thought leadership and data governance advisory services for the Collibra Customer Success team. He has been a practitioner and executive in the data management industry for three decades. Lowell is a co-author of two books, a columnist and frequent conference speaker, as well as a contributor to the DAMA-I Book of Knowledge (DMBoK).