Extracting structured information from discharge letters

ADEs are a major challenge in pharmacovigilance, and a first step is to correctly identify their occurrences.
06/06/2022
Natural Language Processing
Healthcare

Context

As part of a Swiss-wide pharmacovigilance project involving 4 major Swiss hospitals and renowned pharmaceutical sciences research labs, and focused on the detection of Adverse Drug Events (ADEs) linked to antithrombotic drugs, the team of researchers required Effixis’ expertise on AI topics and more specifically NLP.

ADEs are a major challenge in pharmacovigilance, and the first step is to correctly identify their occurrences. For instance, in the context of antithrombotic drugs, ADEs appear in the form of hemorrhages, thrombosis, and similar events. Usually, those events are identified through diagnosis codes that are assigned to a hospital stay. However, one hypothesis of the project was that those codes are subject to a high error rate in the context of ADEs, due to numerous interactions between drugs, risk factors, and hemorrhagic events. For this reason, an important dataset to exploit was the discharge summaries, which should contain all the mentions of hemorrhage, thrombosis, and similar events, as well as the context in which they appeared.

The goal was to rely on the discharge summaries, provided as unstructured text files, to extract the mentions of ADEs, and other relevant elements such as risk factors (alcohol consumption, smoking habits, etc.). To do so requires out-of-the-box thinking. On the one hand, existing NLP models are not tailored to work with discharge summaries, which have a particular structure and use a particular language. On the other hand, training a specific NLP model from scratch would require a complete training dataset to be created from scratch. Both options do not provide a satisfactory answer.

Solution

Effixis built an annotation tool relying on existing NLP models that sped up the annotation process by a factor of 100. The tool pinpoints all sentences in the discharge letters that might be informative about a hemorrhagic event, allowing annotators to focus on evaluating these specific sentences rather than reading the whole set of documents. In addition, the approach allowed us to enrich the dataset and extract many risk factors (e.g., smokers) and antecedents (e.g., cancer) automatically from the letters.

In turn, these annotated documents provided Effixis’ team with a rich dataset that could be exploited to classify discharge summaries containing mentions of hemorrhagic events.

Work continues to fine-tune the models and the methodology, and to publish the results in a medical journal.

Approach

Working in a field requiring such specific expertise, our team had to work in close collaboration with the pharmacovigilance experts, to understand the project requirements as well as the required data cleaning and engineering steps. In the process, we developed a graphical user interface that was used by the CHUV team to annotate exit letters and estimate the number of hemorrhagic events in the cohort. Due to the nature of the data, we worked on the client’s infrastructure using virtual machines.

The project is expected to continue following a second research project aiming at applying this technology in real-time clinical monitoring at the hospital level.

Technologies

Python, R
NLP
Streamlit

Challenges

Discharge summaries don’t follow a typical text structure, as the information they contained is extremely rich and the formulations very dense. They are hardly understandable for anyone external to the medical field. For this reason, we developed models that have a semantic understanding of each individual sentence in the discharge letter. As manual review is a mandatory step in such a project, we also developed our models in such a way that they do not take final decisions but rather speed up and guide the manual work.

Similar case studies

03/04/2024
Marketing

Beyond Traditional Learning: Shaping the Next Generation of Educational Tools

From Concept to Classroom: The Journey of Developing an AI-Powered Quiz Platform
02/11/2023
Healthcare

Leveraging AI to Digitize Vaccine Passports and Optimize Allergy Monitoring

Improving Pediatric Patient Care through Unstructured Medical Text Analysis.
industrial machinery image designed to accompany text
06/09/2023
Manufacturing

Internal Intelligent Assistant for Industrial Machinery

Efficient and Transparent Knowledge Retrieval with LLM-Powered Smart Chatbot.
photo to accompany text of use case
30/08/2023
FMCG

Mastering Generative AI - Workshop for CXOs in the Packaging Sector

Advancing Awareness, Sparking Innovation: Charting Comprehensive AI Path.
image to accompany text for use case
02/05/23
Public Sector

Cyberdefence & Language Models – Enhanced monitoring

Innovative AI powered dashboard to foster cybersecurity news monitoring.
photo to accompany use case text
04/02/2023
Human Resources

Generative AI for HR Innovation

AI basics, HR applications, and ethical considerations in a Generative AI workshop.

Newsletter

Stay tuned, get inspired

Join our newsletter by filling out the form below

By clicking, your are confirming that your agree with our Terms & Conditions

Stay tuned !​

Don’t miss out on our latest news – subscribe to our newsletter today!