AI-Enabled Healthcare - How Quality Engineering Needs to Evolve

AI has a transformative role to play in healthcare, from improving diagnosis and care planning, to addressing conditions that have been historically difficult to treat. The integration of AI into healthcare is already well underway. Notable and increasing adoption of AI technologies within the industry is evident in:

Medical imaging – diagnosis and detection
Drug discovery and development – identification of drug candidates
Bioinformatics – virus and disease behavior modeling to understand the spread of illnesses
Genomics – personalized medicines
Electronic health records – data management and predictive analytics
Virtual health assistants – triage chatbots and virtual nurses
Patient management – hospital logistics and bed management
Telemedicine – remote patient monitoring
Robotic surgery – precision assistance in surgery
Predictive analytics for patient management – identifying high-risk patients
Mental health support – support chatbots and sentiment analysis
Healthcare robotics – assistance and rehabilitation

Diagnosing the challenges of healthcare AI assurance

AI is a powerful and innovative force to improve healthcare practices and outcomes, but the book on testing AI is not yet written, let alone for safety-critical applications.

Some of the key approaches to testing AI-infused applications we can draw from traditional testing and AI data collection. However, some challenges require innovation of our Quality Engineering practices in topics that are unique to the world of AI-infused healthcare applications.

Biased outcomes

A significant concern in AI systems is the susceptibility to suffer bias and provide unfair outcomes.

These are valid concerns, but there are times – particularly in the medical world – when a dataset that may appear biased is actually truly representative of the real-world and this needs to be factored in. Just think of the myriad conditions that affect one demographic, gender or ethnicity more than others, treatment plans that – due to financial, legal or regulatory reasons – are only available in certain geographic regions and conditions that occur, or are remediated as a side-effect of, other treatments and routines.

We need Quality Engineering approaches that can identify an unfair bias in an AI, but without penalizing software in cases where there’s a legitimate imbalance in data.

Explainability

A key challenge in AI-infused systems across industries is explainability. How do we know how an AI has made its decision? For many AI architectures, including some of the deep learning and generative models that we tailor to the most challenging problems, there’s an inherent difficulty in understanding how large inputs, with potentially millions of parameters, drive a given output.

Whilst we can show statistically that, in many problem domains, AI decision power can outstrip our own, even in highly specialized tasks, if we can’t explain the decisioning process in something that may have critical impact to a patient’s wellbeing, there can be legal implications and ethical blocks to using that output.

Performance

Optimal performance and ultra reliability are critical considerations in healthcare AI contexts to ensure accurate diagnostics, timely treatment decisions and patient safety.

Security

From computer viruses to human viruses – the impact of a malicious/terrorist attack on an AI in charge of a patient’s health and treatment might also have catastrophic life changing or loss of life impacts.

Healthcare AIs must therefore be dependable to safeguard sensitive patient data, maintain confidentiality, and prevent unauthorized access. The attack surface on an AI powered system may be very different to that of a traditional software offering but must still be thoroughly tested to ensure integrity of the solution.

Balance

One of the key problems for developing fair, reliable, high-quality AI-Infused healthcare applications is balancing the AI-centric considerations with those applicable to any software, without jeopardizing patient safety, our ability to deliver software or our ability to continuously improve outcomes.

Data errors

These errors are present in the foundational data that we build our AI from. Whether caused by null, missing or corrupted rows; logical constraints in our data; poor sampling; or poor labelling, these are the errors encountered at the start of our AI development journey. To identify these, in-depth understanding of our data, problem domain, and pipelines is required.

Design errors

Even with great data we can still see poorly performing AI. Whether that’s because of under-/over-fitting a model; low stability or ability to adapt to new data; algorithmic or learned bias; or the use of sub-optimal features; these errors will prevent an AI initiative’s success. To detect and avoid these errors we need to understand how our Data Scientists and Engineers are choosing algorithms, parameters and training approaches; identifying test strategies and approaches to push the boundaries of our model and identify weak behavior and contrast our model’s performance with real-world users and existing systems.

Usage and feedback errors

As with any component of a system, a strong-performing AI model that is not effectively integrated into our processes or software will cause issues with our day-to-day operation. These issues may be a model introducing performance or security bottlenecks; or an over-reliance on the model and a susceptibility to automation bias. We need to prepare careful end-to-end tests of AI systems and acceptance checks of the processes around them to ensure that when we are using intelligent components, we are using them in the optimal ways for our healthcare services and not actually weakening our overall healthcare provisioning.

These kinds of challenges need to be met not only for organization and healthcare provider’s internal assurance practices, but also need to be factored in to the static and dynamic checks required in regulatory, compliance and formal verification and validation (V & V) frameworks. Whilst big questions remain on how to Quality Engineer AI in line with these frameworks, as they are today, it seems likely that they will almost certainly be updated with more and more specific AI considerations as the uptake of these technologies in healthcare grows.

AI Quality Engineering enablers

Whilst there are specific challenges to Quality Engineering AI, and nuances to be accounted for, it’s not all bad news. Professionals around the world are working hard all the time to understand these challenges, identify where they are introduced and how to mediate them and make the path to safe, responsible AI as smooth as possible.

To learn more about testing of AI-unfused applications visit Forrester for Diego Lo Giudice’s video on minimizing risk and bias here. *
* registration required.

Quality engineered inclusive training data collection

Data is the foundational component of an AI. To ensure reliability, fairness and robustness, the training data used for healthcare AI models needs to be validated as diverse and representative of the population it serves; data that is a good representation of the population helps mitigate the risk of poor model fitment, low stability and biased outcome.

AI training data impact assessments should be deployed to evaluate how variations in training data impact overall performance, data should be subjected to low level tests to confirm its integrity, and models should be stressed with samples and datasets curated to push boundaries of the model and samples to maximize the performance of the model in the real world.

Comprehensive stakeholder alignment and interdisciplinary collaboration

Healthcare professionals from across the landscape should be included at all stages of the development and testing process for healthcare AI, from data capture to production monitoring, in collaboration with AI & ML experts, software integration specialists and operational teams to ensure a suitable solution for all technical needs.
SME input into acceptance testing and KPI design of an AI system’s operation is invaluable, with a feedback loop for continuous improvement based on real-world use to ensure that we’ve used, and continue to use, the right data to train our models, and that we’re applying models in the right ways in operational processes to improve the healthcare landscape.

Patient representation and involvement

Engage with diverse patient advocacy and representation groups, to gather feedback and insights on the requirements and uses of AI in healthcare from the patients’ perspective.
Patient inputs should be reviewed with implementers of AI to ensure their voices are heard to build AI in a responsible, ethical way, honoring their needs for fairness, security, transparency and reliability.

End-to-end validation of AI-infused systems and AI-informed decisions in healthcare processes

Artificial intelligence components can provide decision support in scenarios from diagnosis to hospital management and logistics preparation. Whilst a strong focus needs to be given on how the AI makes its decisions for basic accuracy and reliability, we also need to be aware that the AI is one part of a software value stream and the full stream still needs assurance.

As we design the functional and ethical considerations of our systems, we also need to include consideration and testing on the non-functional, operational and higher-level concerns to ensure that we are not only “building the AI system the right way” but “building the right AI system”. For example:
- A system making patient care decisions much more accurate, at the expense of no longer providing them in a timely manner would not be acceptable as a part of a suitable care plan, and so we need to validate the full performance of the processes
- Optimizing the cost of maintaining legacy systems by upgrading to a new, powerful AI system will not be acceptable if the new system’s installation and operation cost outweighs its predecessor, and so we need to perform careful cost-benefit and total cost of ownership analyses on AI solutions, as we would any other solution
- People have a tendency to suffer from automation bias, and a valid concern with the power of current and future generations of AI technology is that we may “trust too much”, overlooking that these solutions are software and can contain errors that we need to account for with user training in adoption and rollout processes to minimize risk of trusting a bad machine-led decision implicitly, without confirming or checking further.
A comprehensive strategy for new AI initiatives that includes end-to-end success factors, from technical, process, performance, economical and human considerations must be introduced to ensure that we don’t introduce AI components that may offer nuanced improvement, but overall deterioration in healthcare software provision.

Continuous monitoring and updating of approaches

Recognize the dynamic nature of healthcare and evolving IT, Security, and AI paradigms. Implement systems for continuous monitoring and updating of AI models to ensure relevance, security, and accuracy in line with industry best practices, healthcare professional needs and patient inputs.

Understanding and acknowledging bias, bias mitigation and sensitivity analysis

Nuanced analysis of bias and mitigation techniques

In implementing healthcare AI there will be clinically- and problem-relevant biases as well as undesirable ones. As datasets are curated for AI problems, and models implemented we need to work closely with SMEs to design and perform rigorous performance analysis, assess model variability, reliability, and review security impacts across demographics to ensure we build and operate in line with the nuances and context-dependencies of bias, mitigating risk and promoting fairness.

Sensitivity analysis

We need to understand how changes in variables affect model outputs. Conducting sensitivity analyses to evaluate sensitivity to variables that can alter performance, reliability and security. As we identify weaknesses or extremes of sensitivity we need to develop tests and mitigations to ensure safety and stability in our model across inputs and data.

Transparency and explainability – explainable AI (XAI)

Implement transparent and explainable AI models to understand how decisions are made wherever possible to assure fairness and reliability, and establish new methods to improve explainability. XAI is essential for clinicians to trust and interpret the results and apply the outputs into their daily work.

Implementation teams need to:

Provide clear explanations for performance metrics, aiding healthcare professionals in understanding the reliability of the model’s predictions.
Offer insights into the factors contributing to the model’s reliability, enhancing transparency.
Ensure that security measures are transparently communicated to build trust in the system’s data protection capabilities.

Ethical guidelines, governance and regulatory compliance

Ethics

Develop clear ethical guidelines for the use of AI in healthcare systems, addressing bias, privacy, and patient consent issues.
Utilize resources from government, legislative, professional, and standards bodies worldwide to inform the development of these guidelines and frameworks.
Implement governance structures to assure the ethical use of AI technologies in alignment with established ethical guidelines.

Governance

Implement governance structures to assure the ethical use of AI technologies in alignment with established ethical guidelines.

Regulatory compliance

Stay informed about and adhere to relevant regulations and standards in healthcare, including those pertaining to the ethical and unbiased use of AI.
Ensure AI models meet performance standards outlined in healthcare regulations, comply with reliability and accuracy requirements, and adhere to security-related regulations to safeguard patient data.

Educational initiatives and training for healthcare professionals

Provide education and training for healthcare professionals on how to interpret AI outputs, understand the limitations of models, and recognize potential biases. This enhances collaboration between AI systems and healthcare practitioners.

Provide education on interpreting and understanding performance metrics, enabling healthcare professionals to make informed decisions on the use of AI and to avoid automation & uptake bias in their work. We should further train healthcare professionals to assess and contribute to the reliability of AI models in clinical settings. Implement programs to raise awareness among healthcare professionals about the importance of security in AI applications.

Conclusion

In navigating the transformative potential of AI in healthcare, the role of quality practitioners needs to take on not just the technical considerations and variations these technologies have vs. traditional software; but also the business and disciplinary considerations of intelligent systems in support of safety-critical, potentially life-altering decisions. As AI continues to revolutionize medical practices, addressing nuanced considerations in dynamic bias, performance, usability, cultural sensitivity, and localization may become as important as raw technical performance.

Further, developing new compliance and V&V frameworks and approaches will involve a combination of technological advancements, and a commitment to ethical practices across industry and regulatory bodies to succeed.

As AI encroaches more and more into healthcare systems, safe and successful uptake will require the expertise of emerging high-skill roles dedicated to ensuring the effectiveness, fairness, and ethical use of AI in the evolving landscape of healthcare.

AI-Enabled Healthcare – How Quality Engineering Needs to Evolve to Meet New Challenges

share