Data quality in healthcare: challenges and opportunities


Back in February 2021 when there was a limited supply of vaccines, a young British man with no underlying health conditions was surprised when he received an offer to get  the COVID-19 vaccine for the high-risk group of excessive BMI. When explored further, his height was found to be listed as 6.2 cm instead of 6 feet 2 inches, causing the wrong BMI calculation. A classic example of how a data anomaly or an out of range value causes skewed outcomes. 

Another example occurred in England when a single technical glitch let nearly 16,000 COVID-19 cases go unreported and The Washington Health Department had to roll back data posted on the dashboard due to errors discovered. And Spain’s child COVID-19 mortality rate was wrongly reported to be 54% instead of 7%, causing unnecessary panic. 

The case for data quality in healthcare cannot be more prominent than these reported stories. As we all struggle with the pandemic, getting the data “right” on daily cases and hospital beds becomes absolutely important for planning the response. 

Give your clinical and business leaders the power to take control of your data and improve its quality with Collibra’s platform. See how Froedtert & the Medical College of Wisconsin achieved success in their customer story and learn how we can help you too.

How data quality in healthcare is critical

Brian Bradbury of the Center for Observational Research makes a strong case for using the power of data and analytics to advance the understanding of diseases, therapeutic interventions, and clinical outcomes. The benefits include the following:

  • Improved patient care response
  • Consolidated patient summary
  • Preventive alerts
  • Efficient patient service
  • Optimized supply chain management
  • Advanced risk and disease management
  • Research and innovative solutions 

You’ll find data quality critical to any industry and even more so in healthcare, where lives are at stake.Bad data has immediate consequences,” says Thomas C. Redman, the Data Doc. A survey conducted by the Poneman Institute revealed that mismatched patient data is the third leading cause of preventable death in the United States and also responsible for 35% of denied insurance claims.

Challenges with poor quality data in healthcare

Healthcare data can come from diverse sources and can be of any type. There are profiles of patients, care providers, and pharma companies and lists of diseases, diagnostic tests, and treatment options which get longer every day. There is also visual data in the form of scans, images, and graphs. Databases get filled with admission, diagnostic, treatment, and discharge records. And on top of all the different types of data, the complex data you use needs to comply with evolving regulatory requirements.

Hospitals must consider, what if this data is incomplete, inaccurate, and inconsistent? What if it is not updated? How can you know if it is valid? How confidently can you make decisions based on it? Can your patients trust you to make the best decisions for them? 

In an assessment study conducted by HBR, the mean DQ score in the healthcare industry was just 55%, which is very low considering the essential services it offers. This is because data quality in healthcare has some unique challenges: 

  • Temporary fixes: Data can go unchecked till the point of use, like a patient’s height. Often data gets a quick fix at that stage so that you can use it immediately, but it also means that data issues do not get corrected at the source, which can affect the data-driven analysis. 
  • No accountability: While automation has transformed healthcare, it is still a human-centric industry. Data entry, management, interpretation, and sharing can be manual and error-prone in emergency cases. Unless you establish clear accountability, these slips continue to build up. The most common reason the data quality issues are often neither reported nor corrected is that most people throughout the enterprise are data quality bystanders.  
  • Data drift: Concept drift, environmental drift, or upstream data changes primarily lead to data drift. Increased life expectancy influencing surgery demand prediction or gender recording no longer binary are examples of concept drift. Environmental drift is quite common in healthcare, with seasonal outbreaks of diseases. Revisions in ICD (International Classification of Diseases) codes is a classic case of upstream data change, which affects the processing of historical or streaming data. Predictive data quality solutions help health experts adjust to new taxonomies, detect data drift and make the right healthcare interventions.

Characteristics of data quality in healthcare

In healthcare, trust is the key. And for trusted analytics, you need trusted data. How can you overcome the challenges of data quality to deliver trusted data across your organization? How can you assure compliant data that maintains and improves quality over time?

A source-agnostic predictive DQ solution is the right solution to get all your data onboarded. With adaptive rules, the solution can identify hidden relationships and quality issues in data across any source to deliver continuous data quality.

With constant data drift detection and the robust quality assessment framework, you can confidently enable effective patient care and better risk management. The self-service access to all stakeholders ensures that there are no bystanders and everyone contributes to data quality. Leverage the machine learning-based continuous DQ to deliver the best possible response where it matters.

Methods to improve data quality in healthcare

There are a number of methods that you can use to improve data quality in healthcare, including integrated data analytics, using tools to quantify and qualify data, and having accurate and on-time data in a correct format.

1. Integrated data analytics

An integrated data analytics system helps to automate the data workflow, improving data governance and cutting down on errors. In healthcare, this process usually has three phases. In the capture phase, the data is delivered to the electronic health record, or EHR. In the structure phase, that captured data is formatted and stored properly. In the transfer phase, the data is extracted from storage to a back-end database.

2. Using tools to quantify and qualify data

Deciding on the right tools and metrics is the first step in properly understanding and evaluating your data sets. In essence, you’re choosing the overall structure of your approach to the data when you decide on the tools you want to evaluate it. Without a clear and comprehensive approach, you simply won’t be getting the most out of your data.

3. Having accurate and on-time data in a correct format

Once again, this comes down to accuracy, timeliness, and consistency. An emphasis on accurate data is important, but to improve your healthcare approach, data also needs to be delivered on time and in the proper format, which will make it easier to use. In other words, this step comes down to effectively processing the data.

Additional methods to improve data quality in healthcare include:

  • Ensure a data field is available to record a critical data value and data creator enters the right data in the right place (e.g. capture General Practitioner or next-of-kin telephone number so that patient can be contacted in the event of emergency or adverse incident)
  • Ensure a data creator adheres to data policy while also appreciating data user needs
  • Ensure a comprehensive categorization of data (e.g. mastering provider, payer, patient and plan data) to meet patients’ needs

Driving better and compliant healthcare 

Better quality data can drive better quality healthcare and also reduce costs. Continuous quality can unlock the value of your data and assure its compliant use:

1. No more guesswork  

Correctly linking patient data across organizations is a critical element of value-based care, patient safety, and care coordination. Duplicate records or mismatched records can lead to denied claims, unnecessary diagnostic tests, privacy risks, and reporting errors. Collibra Data Quality connects all organizations with trusted, timely, and meaningful patient data while reducing the time, expense, and effort required by 70%. It provides the predictable data quality that you need to deliver high-quality patient care. You can leverage the predictive DQ to detect anomalies and generate an early warning detection system for patient vitals.

In one case, a group of patients suffered near-fatal incidents that were avoidable. Within ten minutes, Collibra Data Quality was able to find anomalies and curate a visual showing a heatmap and time-series trend to illustrate when the issue was introduced.    

2. No quality loss over time  

The predictive DQ solution creates baselines to discover data drift and generates autonomous rules to monitor the drift. The adaptive rules constantly learn from new data and predict the issues with formatting, outliers, patterns, and relationships. Automatic data drift detection ensures accurate patient records, relevant treatment details, and consistent information across systems to deliver trusted data.

3. No quality drop in data migration

Healthcare is constantly getting transformed with new technology. There are times when you need to move data to other systems and worry about missing records, missing values, and broken relationships across tables or systems. The continuous DQ checks that every record in every cell matches between copies, in addition to checking for standard row count, column, and conformity. The types of upstream and downstream systems do not matter because the continuous DQ is compatible with any source and target.

4. Scalable compliance 

Working in a highly regulated industry like healthcare means you need transparent reports and audits. Healthcare compliance covers numerous federal and state healthcare laws and HIPAA (Health Insurance Portability and Accountability Act of 1996). Complying with data privacy protection regulations requires managing patient requests to view and update their personal data.

When you work with manual rules or fragmented tools, you need to coordinate with IT for any small change. The adaptive DQ streamlines the processes, empowering business users and compliance professionals to expedite transparent reporting. The easily scalable DQ unifies all information into one simple dashboard, with a scorecard that ranks each data set differently.

Consider complementing continuous data quality in healthcare with a robust data governance foundation. Take it one step further with a complete data intelligence solution to help you leverage metadata, work with shared definitions, and optimize quality processes. This combined solution ensures that data quality in healthcare will get addressed at the source, delivering better and compliant healthcare for all.

Want to learn more about Collibra Data Quality?

Request a demo

Related resources


What is data quality and why is it important?


The 6 data quality dimensions with examples

View all resources

Want to learn more about Collibra Data Quality?

Request a demo

More stories like this one

Mar 28, 2024 - 3 min read

Ensuring data reliability for AI-driven success: The critical role of data...

Read more
Jan 9, 2024 - 4 min read

Data quality: key for government agencies with a data mesh strategy

Read more
Nov 28, 2023 - 5 min read

Q4 2023 Collibra release: helping customers reduce data risks and improve...

Read more