The use of Artificial Intelligence (AI) is gaining momentum in healthcare. In a survey conducted by Sage Growth Partners, 90% of providers indicated that they have an AI strategy in place. The survey further revealed that 41% of providers were at the planning stage while 25% were in the early implementation phase.
While the use of AI can offer several benefits across the care continuum, such as AI-assisted robotic surgery, precision medicine, genomics and clinical research, there are also several notable barriers to using AI.
Since AI relies heavily on data, the integrity and quality of the underlying data upon which AI models are trained is critical to ensure accuracy and remove the risk of model bias and opacity. To drive successful AI initiatives at scale, healthcare institutions will require a comprehensive AI governance strategy as a foundational pillar.
According to McKinsey (a global management consulting firm), strengthening data governance, data access, data quality, data security and interoperability is critical. Training data can fast become a problem without proper controls in place.
Given the sheer volume of data that’s required to train AI models, a poorly thought-out process could lead to unintended consequences if data and models lack transparency. Moreover, in healthcare where patient safety and care are a top priority, the use of trustworthy data with a rigorous AI governance and data management strategy is imperative.
Some of the key challenges that healthcare institutions will need to address when embarking on AI initiatives include:
- Poor data quality – a key hurdle for healthcare organizations is that nearly 30% of healthcare costs are directly attributed to the use of poor-quality data (a combination of duplicate, incomplete, inconsistent, and erroneous data). If the underlying data upon which models are trained is of poor quality, then this will adversely impact the accuracy and quality of the AI models.
- Inability to discover data – it is estimated that data scientists spend on average, nearly 70% of their time looking for and prepping data to train models. With petabyte-scale data residing across hundreds of disparate data sources, many struggle to find the data they need in a timely manner. Moreover, even if all the data resides in a data lake, if the data is not properly governed or cataloged, the lake can turn into a data swamp very quickly. In other words, making it very difficult to quickly find, explore, access and understand data.
- Lack of understanding of data – researchers and data scientists also require in-depth understanding of data when building data pipelines. For instance, what are the data attributes, is the data certified and fully governed for use, does the data contain Personal Health Information (PHI) and Personally Identifiable Information (PII). Additionally, is the data from a trustworthy source. This is very important when selecting data sets to train AI models.
- Lack of transparency and auditability – the ability to easily trace data sets and models is imperative to pivot to explainable AI. Data scientists should leverage end-to-end data lineage to trace models and data sets from source to destination. They should also have detailed information with comprehensive audit trails of any changes data sets may have undergone. This will become increasingly important to build trust in models as well as to support any regulatory compliance mandates.
How Collibra Data Intelligence Cloud Can Help
At Collibra, we work with leading healthcare and life sciences organizations globally to help them innovate with trustworthy data.
Collibra Data Intelligence Cloud is an intelligent data platform that leverages the power of machine learning and provides a holistic and integrated approach to cataloging, governing, protecting, managing and collaborating with data at scale across on-premises, hybrid and multi-cloud environments.
With Collibra Data Intelligence Cloud, researchers and data science teams can easily collaborate with data and enable explainable AI resulting in greater transparency and auditability across data sets and models. For instance:
- The intelligent data catalog can serve as a trusted repository of metadata (data about data) that includes AI model techniques, data inputs, data features, expected outputs, and more.
- Researchers and data science teams can easily discover the data they need, ensure it’s certified for use and obtain rich business and technical context. Moreover, they can rate, rank and prioritize data sets for use, allowing them to build robust data pipelines at an accelerated pace.
- An active metadata graph coupled with end-to-end data lineage enables data scientists to gain greater clarity into the models and the underlying data. Users can trace lineage of data sets and models as well as understand data dependencies. They can obtain detailed information on the transformation data sets may have undergone.
- Machine-learning (ML)-enabled data quality and observability can help detect data drift, outliers and patterns to ensure accuracy and performance of AI and analytics models over time. Moreover, Collibra Data Quality and observability proactively detects anomalies in data such as missing records, values as well as broken relationships across tables or systems resulting in rapid resolution.
- Advanced capabilities in data privacy and protection helps ensure sensitive data is easily identified and fully protected with role-based access control.
To learn more, I would like to invite you to watch the following webinars:
- The Journey to Data Intelligence in Life Sciences & Pharma – featuring Genentech/Roche Group
- Data Intelligence in a Modern Healthcare Organization – featuring Mayo Clinic
- Building the Case for Data Governance – featuring AstraZeneca
AI: Truth or Consequences – featuring Ohio State University