For business and technical users alike, reference data impacts daily operations. In order to optimize data use and availability, organizations need to know what reference data is, what it is not (i.e. master data), why it is important, and how to efficiently manage it with technology.
In his book Managing Reference Data in Enterprise Databases, Malcolm Chisholm, a world-renowned data management thought leader, defines reference data as “any data used solely to categorize other data found in a database, or solely for relating data in a database to information beyond the boundaries of the enterprise.”
Reference data carries meaning. It establishes permissible values, facilitates consistency, and maps internal data against external data and/or standards. Although it represents a small share of total data volume, reference data represents 25% to 50% of tables in databases and affects reporting accuracy and data governance.
Examples of reference data
Many reference data assets are maintained by standards bodies like ISO or by industry consortia. Some examples are:
- Country codes
- Measurement units
- Financial hierarchies
- Products and pricing
- Exchange codes
- USPS postal codes
Reference data characterizes data and relates data to information in both internal and external databases. Reference data can be as simple as specifying that all customer phone numbers must be ten digits in a customer relationship management (CRM) tool. These defined sets rarely change and data users consistently use them in lookup tables, drop down lists or pre-filled forms.
However, not all code sets are so cut and dry. Take something like a country code, again seemingly simple, but even the International Organization for Standardization (ISO) defines codes for countries in different ways under ISO-3166:
Reference data can also change over time, so organizations need to continuously refresh and manage data to maintain quality. For instance, country codes change an average of 3-5 times per year, and currency codes change an average of 5 to 10 times per year. Organizations use, customize and extend numerous existing industry ontologies to meet changing needs over time; as a result, they need to maintain consistency with the original standards to prevent drift from the external semantics. Any inconsistencies can impair decision making and diagnoses, and incur liability. To avoid these inconsistencies and minimize the consequences of poor reference data management, organizations need to make use of robust governance practices and policies.
Reference data in the market
Reference data is relevant to organizations across the globe and across industries. However, enterprises in heavily regulated industries are more dependent on accurate reference data.
Reference data in healthcare
Reference data is commonplace in the healthcare industry. Healthcare companies produce and use immense volumes of data everyday, impacting decisions around clinical efficacy, product recalls and patient trial recruitment.
Below are two examples of healthcare companies that struggled with reference data initially, but with effective data governance, they are able to use reference data as an asset:
- Cigna, a multinational health services organization, manages sensitive data and balances a number of regulations. The company struggled with inaccurate customer master data, with no context around the data, what metrics they summarized, and how the data was used. Cigna used reference data to get stakeholders speaking in the right language. Business and technical lineage allowed them to get the context they needed around data. Workflow management, stewardship and the centralized platform enabled cross functional stakeholders to collaborate with each other. As a result, the company established a culture of data transparency, where individuals can speak in a common language and collaborate around data efficiently.
- Fresenius Medical Care used to struggle with reporting because it generates incredible volumes of data from more than 347,000 dialysis patients at 4,000 clinics, 45 production sites, and 120,000 employees across the globe. The company’s data stewards leveraged reference data to assign characteristics, relations, and groupings and pushed the governed medical information into their data warehouse. The data warehouse performs change data capture and distributes reference data to tables for analytical end user consumption. As a result, business analysts at Fresenius Medical Care can make trusted business decisions because they can examine reports and dashboards that are built on governed and high-quality data.
Reference data in financial services
Likewise, financial services firms use reference data on a daily basis. They use reference data for several applications, including:
- International Securities Identification Numbers (ISIN)
- Committee on Uniform Securities Identification Procedures (CUSIP)
- Stock Exchange Daily Official List (SEDOL)
- Legal Entity Identifier (LEI)
Reference data serves as the foundation for a number of aggregate risk reporting and market data-related applications. Firms need reference data to facilitate efficient regulatory conversations. To make matters more complex, regulations evolve and new regulations emerge regularly. New mandates add requirements about counterparty identification, trading venues and financial instruments over trade lifecycles, requiring further investment in reference taxonomies to support the interoperability required for reporting. The consequences can be significant: opportunity costs due to unpredictable application behaviors, failed transactions, capital losses, trade processing fees and, of course, fines.
What is reference data vs. master data?
A common misconception is that reference data and master data are identical, but they are two different types of data.
Reference data is the data used to define and classify other data. Master data is the data about business entities, such as customers and products. Master data provides the context needed for business transactions.
While both reference data and master data management provide context for business activities, their usage and implementation can help define their differences. First, domain and subject matter experts curate, centrally administer and publish reference to downstream systems. Reference data often drives control logic. It categorizes data into groups before data consumers analyze them, sometimes to unify external and internal data, and other times to classify it into buckets for analysis. In a succinct sense, reference data are sets of values or classification schemas that are referred to by systems, applications, data stores, processes, and reports, as well as by transactional and master records.
On the other hand, master data describes the people, places and things involved in an organization’s business. Organizations use master data to apply quality rules, manage their transaction structure data and enterprise structure data to create a single golden record.
Why is reference data important?
Reference affects every part of the organization because it helps provide context to data. It affects data quality and in turn, data usability. Efficient reference data management is necessary for organizations aiming to achieve Data Intelligence.
Reference data use cases
Organizations use reference data to address a number of use cases. For example:
- Agreed upon metrics and hierarchies – Shared understanding across the organization helps build common metrics and hierarchies that can be easily leveraged for efficient operations
- Clear data controls – Managing reference data access control helps establish ownership and accountability. It goes a long way in improving the data governance that is essential for trust in data
- Trust in data quality – Consistent reference data usage can help build a single trusted view of data across the organization
- Faster delivery of insights from data – Streamlining and automating reference data management seamlessly provisions it to all stakeholders. With access to quality reference data, business users can quickly derive insights from data, powering their business decision
Consequences of poor reference data management
Misalignment of data and manual management of reference data poses many challenges and real consequences, such as:
- Insufficient governance – Organizations often have dozens or even hundreds of applications that hold data used by different people and different teams. Fragmented data and applications cause misalignment across the organization and make it difficult to formalize information, standards and processes. Since most organizations typically handle data governance activities manually, this results in slow and error-prone change. management and fragmented and inconsistent reference data across the enterprise.
- Inaccurate reporting and analytics – Inconsistent code values result in inaccurate and untrustworthy reporting and analytics. For example, business analysts examine data and make recommendations for critical decisions using reports based on regions, business units or territories, all of which represent reference data. If each source uses different code value sets, manual intervention becomes necessary to ensure the accuracy of data aggregation and business analytics.
- Inefficient operations – In order to get the most of data, data stewards need to monitor and refresh reference data consistently. However, manual reference data management is slow, prone to errors and not scalable. As an organization grows, this management becomes heavier and more complex, magnifying the operational and financial repercussions.
How do organizations manage reference data?
A reference data management tool is a mechanism that defines business processes around reference data and helps data stewards populate and manage it over time. Such a tool
- Automates workflows to create new codes and code sets
- Delivers codes and code sets to data users
- Maps data
- Compares data from various parts of the organization
Required capabilities for managing reference data
In order to effectively manage reference data, organizations need a suite of capabilities. An efficient reference data management solution must manage complex relationships across the enterprise. Organizations must invest in a data governance solution with native reference data management and additional lineage, stewardship and workflow capabilities features to resolve inconsistencies in the data:
- Data governance – Data governance and reference data management go hand in hand. Data governance tools with native reference data management allow a complete audit trail and full visibility into processes, ownership and stewardship roles, and a shared understanding of reference data
- Data lineage – A capability for mapping reference data from different sources to shared code sets and linking it to relevant terms for business and technical context
- Workflows – Clearly defined and automated processes to facilitate collaboration and resolution of data inconsistencies
- Stewardship – A system for managing tasks, roles and responsibilities to facilitate management as the data ecosystem evolves
- Policy management – A tool for creating, reviewing and updating data policies to ensure adoption and maintain compliance
Managing reference data with Collibra
Many organizations use Collibra to manage their reference data. By leveraging Collibra’s products and capabilities like Data Governance, Collaborative Workflows, Data Stewardship, and more, our customers manage their reference data in the context of other initiatives and achieve Data Intelligence, all from one platform.