Katrina’s Blog™

News and Commentary

on the science and technology of drugs and medical devices, including discovery, development, manufacturing, and regulation.

The Upside and Downside of ML, AI, and Big Data

April 19, 2022
| Business, Science
Photo by Possessed Photography on Unsplash - Photo by Possessed Photography on Unsplash - the hands of a robot playing a keyboard using an ML program called an expert system to mimic a human performer
Photo by Possessed Photography on Unsplash – Photo by Possessed Photography on Unsplash – the hands of a robot playing a keyboard using an ML program called an expert system to mimic a human performer


Artificial Intelligence, Machine Learning, and big data are big buzzwords in the tech and medical device field lately. However, practitioners have yet to deliver on the promise of these technologies, partly because they are still in development. Nevertheless, it can be challenging to break through the hype to understand the current state of the art and its applications. Therefore, I’m sharing my perspective on these software tools’ history, risk, and promise in this month’s article. I’m interested in your thoughts after you finish it.


For years, popular culture has predicted machine learning (ML),  artificial intelligence (AI), and data-driven models. Maybe you, like me, have been asking yourself, why are these topics so hot right now?  Why are so many startup companies being formed to use ML, AI, and big data to solve problems? Will any of this activity lead to anything?  Importantly, can we use these tools to find new medicines and address long-standing critical health problems like cancer or diabetes? This article will look at the history of AI, the four types of challenges in  AI predictions, why continuing validity assurance is essential, and what the regulators are thinking about AI in healthcare. 

The history of artificial intelligence is short. AI has been on the cultural stage since the late 19th and early 20th centuries, when science fiction authors predicted robots and intelligent machines would make our lives easier (or a lot harder,  depending on the plot). Alan Turing outlined the practical concepts behind AI in his 1950 paper, Computing Machinery and Intelligence. The 1956 Dartmouth Summer Research Project on Artificial Intelligence included a model of human problem-solving skills,

which sparked more research. As computers became faster and more accessible, additional algorithms were developed to transcribe,  translate, and interpret spoken language, mimic decision-making processes, and play rule-based games like chess or go. Neural networks,  inspired by biology, were added to the research toolkit in the 1990s. [I  recall when feed-forward networks and Bayesian inference were all the  rage!] Innovations in hardware and software for increased processing power, better graphics, network capacity, and inexpensive storage are essential to the latest AI research and commercialization explosion.  These advances offer tantalizing clues to the possibilities, but the practical outcomes of research are few.

There are four practical pitfalls operating in AI:   

First, more data doesn’t mean more information. Your quantity of data, however large, may contain low quality, even if it was expensive to acquire. It may have hidden biases, which will appear as results in the model. Data is affected by how it is collected,  organized, manipulated, and transformed; therefore, data scientists must evaluate these factors for impact. It’s possible to wring a model out of a poor dataset, but its predictive utility may be insufficient.   

Second, correlation is not causation. This principle of science is confirmed in studies at the bench and in the clinic, in every field. For example, scientific journals are filled with observational and natural history studies that point to potential connections. However, these connections often prove weak when examined using modern biology and chemistry tools. Similarly, analysis of big datasets can suggest patterns that appear to be rules but are not reproducible using different cohorts.   

Third, models are dependent as much on their logical frameworks as on source data. For example, multiple weather models exist, each established by different teams using similar data but alternate methods of estimating the contributions of numerous components. We often see these various models in play when forecasters predict a big storm.  Repeatedly comparing and adjusting the model to the observations has identified new factors and refined the primary logical framework. Still,  there remains plenty of controversy about the impact of different elements.   

Fourth, even with a lot of good data, the postulated problem may be too difficult to solve right now. Weather scientists have access to over a century of highly accurate daily or hourly measurements and a wealth of time-sequence data from geologic history. Nevertheless, they acknowledge the gaps in their knowledge and understanding of current factors driving weather patterns and the significant potential for contribution from as yet unknown factors. Scale that challenge to the variation inherent in biology and our mere 70-80 years of health science data, and it should be no surprise that predictive medical models are still weak. Fundamentally, a well-developed basic understanding of a  disease mechanism remains essential to clinical performance and treatment, regardless of how much data is available.   

All AI models need “ground-truthing” to assure their performance to address these pitfalls. A primary principle of these technologies is dividing the formative data into training and validation sets. Data scientists use the training set to establish the model and the validation set to ensure it is working as expected. Ground truthing involves collecting data from a completely independent source (ideally using a different origin) to assess the precision and accuracy of the model critically. In weather prediction, ground-truthing can be simple since accurate weather data is collected and publicly distributed every day on a global scale. In medical diagnoses, the collection of orthogonal data may be challenging, especially when the models don’t identify their source data. Yet failure to ground-truth a model may have severe consequences. For example, the Epic Sepsis Model (ESM), a  proprietary method used at hundreds of US hospitals, was shown in a recent JAMA study to be a poor predictor of the onset of sepsis. The researchers found the ESM identified 7% of the study patients with clinically determined sepsis did not receive timely antibiotics. However, the model failed to predict 67% of the patients who ultimately received a sepsis diagnosis.  The model also generated alerts for many more patients who did not have sepsis. Karandeep Singh, the study’s chief author, notes his concern that, while proprietary predictive models are widely used in healthcare,  there are few published studies on effectiveness, and the vendors disclose little on how they perform. Comparing the alerts to medical billing codes for sepsis treatment rather than actual clinical care often improves the appearance of the models, but this may not be an appropriate choice. Clear standards for validation, confirmation, and comparison of AI systems will be essential in applications like medical diagnoses, where failures have critical consequences.

How can regulators approach this problem for medical diagnostics? AI in healthcare is classified as Software as a Medical Device (SaMD) by the  FDA and other global regulatory agencies in the International Medical  Device Regulators Forum (IMDRF). While comprehensive FDA guidance is not yet available, the agency has shared some initial principles on its SaMD website. The IMDRF partners have established key definitions, a framework for risk categorization,  QMS system requirements, and a clinical approach for evaluation (this last is available as an FDA guidance).  The FDA has approved, authorized, or cleared 343 AI/ML-Enabled Medical  Devices since 1997 (as of this writing – see the complete list here). Many of the current devices are systems to guide diagnostic imaging. The guiding principles for good machine learning practice published in October 2021 emphasize the best practices from software engineering, security, critical thinking, and science that address the abovementioned risks. One essential element remains the role of the  “human in the loop.” Where the performance of the AI model depends on the clinician’s participation, evaluating the performance of the  “human-AI” team together should be a development feature of these devices. Once on the market, the models should be continually monitored for performance and re-trained, where necessary, to improve safety and performance.

I’m as excited as anyone about the potential for AI, ML, and big data to create constructive change. However, the field is still young, and there remain several challenges in model construction and implementation. It’s my assessment that AI models still have a long way to go before they can offer valuable predictions on new diagnostics,  medicines, and preventatives. Additionally, until we understand and can model how a spark of intuition on a prepared mind leads us to a  breakthrough insight, we won’t be able to unleash the full power of AI  as referenced in our fiction.


Reach out to me if you want to know more or discuss your medical product development or business challenges.

katrina@krogersconsulting.com

https://www.linkedin.com/company/katrina-rogers-consulting-llc

https://calendly.com/katrinarogers

Text Copyright © 2022 Katrina Rogers

Categories

Latest Posts