A genuine tool for institutional steering, the strategic plan of the CHC Health Group, named “Pulse,” incorporates digital transformation as a significant structural axis. In this context, an important project has been launched, specifically the creation of a “Health Data Warehouse,” also called E.D.S (Entrepôt de Données de Santé), with the aim of gathering a large amount of available data into a single, usable database. This effort is motivated by the need to fully explore the group’s data, which includes administrative, billing, and clinical information. The ultimate goal is to put CHC at the forefront of the use of hospital data, both in the operational context and in research, with the end purpose of steering and qualitative measurements.
CHC uses both structured data, such as billing and administrative information, and unstructured data from its “DPI” (EHR) system (“Dossier Patient Informatisé” – Electronic Healthcare Records). Unstructured data—i.e., letters, reports, and other textual data written by medical staff—pose a unique challenge due to their non-standardized format. Unstructured data make up between 70 and 80% of hospital data. Given its large proportion, it becomes crucial to make efforts to explore this type of data and leverage it using current AI technologies.
The project focused on extracting data on vaccinations and allergies from pediatric hospitalization and consultation reports. The goal was to develop a methodology to create AI models specialized in extracting this data. This methodology could then be applied to other types of medical data found in consultation reports, discharge letters, etc.
This project was carried out as part of “Tremplin IA,” an initiative by the Walloon Region aimed at promoting the adoption of AI in Belgium. Through Tremplin IA, Wallonia emphasizes its commitment to leveraging AI for progress and innovation across various sectors.
The project was part of a larger initiative, initiated by the SPF Public Health, aimed at extracting entities related to vaccinations, allergies, and oncology across the MOVE health network, which includes the CHC Health Group and two other hospitals. In the next steps, the aim is to gradually expand the application of these AI algorithms. Initially, they will be implemented in all hospitals affiliated with INAH. Then, the scope will be expanded to cover all healthcare facilities in Wallonia and Belgium. The ultimate vision is to see this innovation spread across Europe, establishing a standard for data extraction and analysis that elevates the quality and efficiency of healthcare across the continent.
The main objective of this project was to highlight the potential of natural language processing (NLP) techniques to utilize historical medical data. More specifically, the project aimed to assess the effectiveness of Named Entity Recognition (NER) and Relation Extraction (RE) models to extract relevant patient information from unstructured data.
Effixis has developed a pipeline that extracts and organizes information on patients’ allergies and vaccines from unstructured texts. This data is then mapped to predefined terms from SNOMED, an international standard that facilitates interoperability among hospitals, specifically for allergies and the names of vaccines administered at CHC.
For allergies, the pipeline identifies the “Causal agent” of an allergic reaction, whether it be a medical or non-medical product. It also extracts other relevant details such as the manifestation of the allergy, tests performed, their results, familial indicators, and more.
In the vaccine model, the focus is on the “Injected Product”. However, the pipeline also extracts additional data such as injection dates, targeted diseases, sequence numbers, and so on.
This pipeline has been tested on various patient subsets, including randomly selected patients and predefined subsets with information on vaccinated patients and those treated in allergology. The quality of the methodology has been analyzed with the help of CHC physicians, ensuring its efficacy and accuracy.
The results generated by this pipeline are ready for practical medical applications. Physicians can quickly make informed decisions regarding allergy management by identifying potential triggers. Moreover, the extracted vaccine data can be useful in creating a digital vaccination passport, simplifying immunization checks. This system improves both data organization and patient care by enabling quick and well-informed medical actions.
From the onset of this project, we fostered a close collaboration with CHC, ensuring our work was aligned with their needs and expertise. Every two weeks, we had recurring meetings with the medical director of CHC. Additionally, a dedicated team of medical specialists from CHC was constantly engaged, providing valuable insights and feedback throughout the project’s duration.
Our approach involved several key steps:
In our work, we observed a trade-off between the overall performance of the NER models and the depth of context achieved by incorporating additional entities. While these extra entities enriched the context, they also impacted the model’s overall performance. Despite a performance dip, their relevance for a deeper data understanding justifies their inclusion. We aim to enhance their recognition in future steps.
Another challenge we faced involved our annotations, which originated from two distinct sources, Effixis and CHC. Multiple annotators were responsible for annotating the entities and their relationships, leading to some inconsistencies due to the subjective nature of annotations. To increase our models’ performance, we’re contemplating more structured annotation protocols and harmonization rounds. These strategies could result in more consistent, high-quality annotations, thereby improving the performance of our NER and RE models.
One notable challenge lies in the interpretation of medical data. Even with structured data, doctors often return to the original source for validation, ensuring absolute accuracy. While our models organize and present information efficiently, the critical nature of healthcare dictates this double-check. Thus, regardless of technological advancements, the raw data remains a vital reference point, highlighting the ever-present balance between automation and human verification.
Stay tuned !