Leveraging AI to Digitize Vaccine Passports and Optimize Allergy Monitoring

Improving Pediatric Patient Care through Unstructured Medical Text Analysis in a Hospital in the Liège Region, Belgium

02/11/2023

Natural Language Processing

Healthcare

Context

A genuine tool for institutional steering, the strategic plan of the CHC Health Group, named “Pulse,” incorporates digital transformation as a significant structural axis. In this context, an important project has been launched, specifically the creation of a “Health Data Warehouse,” also called E.D.S (Entrepôt de Données de Santé), with the aim of gathering a large amount of available data into a single, usable database. This effort is motivated by the need to fully explore the group’s data, which includes administrative, billing, and clinical information. The ultimate goal is to put CHC at the forefront of the use of hospital data, both in the operational context and in research, with the end purpose of steering and qualitative measurements.

CHC uses both structured data, such as billing and administrative information, and unstructured data from its “DPI” (EHR) system (“Dossier Patient Informatisé” – Electronic Healthcare Records). Unstructured data—i.e., letters, reports, and other textual data written by medical staff—pose a unique challenge due to their non-standardized format. Unstructured data make up between 70 and 80% of hospital data. Given its large proportion, it becomes crucial to make efforts to explore this type of data and leverage it using current AI technologies.

The project focused on extracting data on vaccinations and allergies from pediatric hospitalization and consultation reports. The goal was to develop a methodology to create AI models specialized in extracting this data. This methodology could then be applied to other types of medical data found in consultation reports, discharge letters, etc.

This project was carried out as part of “Tremplin IA,” an initiative by the Walloon Region aimed at promoting the adoption of AI in Belgium. Through Tremplin IA, Wallonia emphasizes its commitment to leveraging AI for progress and innovation across various sectors.

The project was part of a larger initiative, initiated by the SPF Public Health, aimed at extracting entities related to vaccinations, allergies, and oncology across the MOVE health network, which includes the CHC Health Group and two other hospitals. In the next steps, the aim is to gradually expand the application of these AI algorithms. Initially, they will be implemented in all hospitals affiliated with INAH. Then, the scope will be expanded to cover all healthcare facilities in Wallonia and Belgium. The ultimate vision is to see this innovation spread across Europe, establishing a standard for data extraction and analysis that elevates the quality and efficiency of healthcare across the continent.

The main objective of this project was to highlight the potential of natural language processing (NLP) techniques to utilize historical medical data. More specifically, the project aimed to assess the effectiveness of Named Entity Recognition (NER) and Relation Extraction (RE) models to extract relevant patient information from unstructured data.

Solution

Effixis has developed a pipeline that extracts and organizes information on patients’ allergies and vaccines from unstructured texts. This data is then mapped to predefined terms from SNOMED, an international standard that facilitates interoperability among hospitals, specifically for allergies and the names of vaccines administered at CHC.

For allergies, the pipeline identifies the “Causal agent” of an allergic reaction, whether it be a medical or non-medical product. It also extracts other relevant details such as the manifestation of the allergy, tests performed, their results, familial indicators, and more.

In the vaccine model, the focus is on the “Injected Product”. However, the pipeline also extracts additional data such as injection dates, targeted diseases, sequence numbers, and so on.

This pipeline has been tested on various patient subsets, including randomly selected patients and predefined subsets with information on vaccinated patients and those treated in allergology. The quality of the methodology has been analyzed with the help of CHC physicians, ensuring its efficacy and accuracy.

The results generated by this pipeline are ready for practical medical applications. Physicians can quickly make informed decisions regarding allergy management by identifying potential triggers. Moreover, the extracted vaccine data can be useful in creating a digital vaccination passport, simplifying immunization checks. This system improves both data organization and patient care by enabling quick and well-informed medical actions.

Approach

From the onset of this project, we fostered a close collaboration with CHC, ensuring our work was aligned with their needs and expertise. Every two weeks, we had recurring meetings with the medical director of CHC. Additionally, a dedicated team of medical specialists from CHC was constantly engaged, providing valuable insights and feedback throughout the project’s duration.

Our approach involved several key steps:

Data Extraction: We extracted data from medical documents and separated them into paragraphs. Using a set of rules, we identified extracts related to allergies and vaccinations.

Data Model Definition: We defined entities to be extracted from text through a deliberative and collaborative process involving medical experts. We identified main entities capturing essential information and secondary entities providing additional context. For allergies, we defined the “Causing Agent” as the primary entity, while for vaccines, it was the “Product” that was injected. At this stage, we also aimed to understand the relations between entities to gain more context.
Annotations and Supervised Machine Learning: We annotated the data to prepare for the training of supervised Machine Learning models. The ML phase consisted of two main steps: Named Entity Extraction (NER) and Relation Extraction (RE). Our models leveraged the power of “Transformers” [1,2], a state-of-the-art machine learning framework for handling natural language data.
Performance Evaluation: We used precision, recall, and F1 score as key metrics to evaluate the performance of our NER and RE models.
SNOMED Coding of Extracted Entities: We linked the entities extracted from medical texts to standardized medical notions using name matching techniques.
Medical Validation and Interpretation: During this essential step, we closely analyzed the model’s predictions on unlabeled data. We reviewed consultation records of randomly chosen patients to assess the model’s effectiveness. Additionally, for specific accuracy checks, we examined data from patients known to have been treated for allergies, as well as those vaccinated at CHC. Our findings were presented using dashboard visualizations, offering an intuitive overview of the model’s performance. This validation ensured the practical reliability and utility of our system in real-world hospital scenarios.

Technologies

Azure

Python

Natural Language Processing

Named Entity Recognition

Challenges

In our work, we observed a trade-off between the overall performance of the NER models and the depth of context achieved by incorporating additional entities. While these extra entities enriched the context, they also impacted the model’s overall performance. Despite a performance dip, their relevance for a deeper data understanding justifies their inclusion. We aim to enhance their recognition in future steps.

Another challenge we faced involved our annotations, which originated from two distinct sources, Effixis and CHC. Multiple annotators were responsible for annotating the entities and their relationships, leading to some inconsistencies due to the subjective nature of annotations. To increase our models’ performance, we’re contemplating more structured annotation protocols and harmonization rounds. These strategies could result in more consistent, high-quality annotations, thereby improving the performance of our NER and RE models.

One notable challenge lies in the interpretation of medical data. Even with structured data, doctors often return to the original source for validation, ensuring absolute accuracy. While our models organize and present information efficiently, the critical nature of healthcare dictates this double-check. Thus, regardless of technological advancements, the raw data remains a vital reference point, highlighting the ever-present balance between automation and human verification.

References

Vaswani, A., et al. (2017). Attention is All you Need. 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.
Devlin, J., et al. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805.

Similar case studies

17/05/2024

Financial Services

Fostering a Culture of AI-driven Innovation in the Financial Sector

Empowering Top Executives with Generative AI for Strategic Innovation

10/05/2024

Luxury

AI and IoT Integration in Luxury Brand Management

Blending Tradition with Innovation: Empowering Managers with Cutting-Edge Technologies

03/04/2024

Marketing

Beyond Traditional Learning: Shaping the Next Generation of Educational Tools

From Concept to Classroom: The Journey of Developing an AI-Powered Quiz Platform

06/09/2023

Manufacturing

Internal Intelligent Assistant for Industrial Machinery

Efficient and Transparent Knowledge Retrieval with LLM-Powered Smart Chatbot.

30/08/2023

FMCG

Mastering Generative AI - Workshop for CXOs in the Packaging Sector

Advancing Awareness, Sparking Innovation: Charting Comprehensive AI Path.

02/05/23

Public Sector

Cyberdefence & Language Models – Enhanced monitoring

Innovative AI powered dashboard to foster cybersecurity news monitoring.

Newsletter

Stay tuned, get inspired

Join our newsletter by filling out the form below

By clicking, your are confirming that your agree with our Terms & Conditions