Extracting structured information from discharge letters

ADEs are a major challenge in pharmacovigilance, and a first step is to correctly identify their occurrences.

06/06/2022

Natural Language Processing

Healthcare

Context

As part of a Swiss-wide pharmacovigilance project involving 4 major Swiss hospitals and renowned pharmaceutical sciences research labs, and focused on the detection of Adverse Drug Events (ADEs) linked to antithrombotic drugs, the team of researchers required Effixis’ expertise on AI topics and more specifically NLP.

ADEs are a major challenge in pharmacovigilance, and the first step is to correctly identify their occurrences. For instance, in the context of antithrombotic drugs, ADEs appear in the form of hemorrhages, thrombosis, and similar events. Usually, those events are identified through diagnosis codes that are assigned to a hospital stay. However, one hypothesis of the project was that those codes are subject to a high error rate in the context of ADEs, due to numerous interactions between drugs, risk factors, and hemorrhagic events. For this reason, an important dataset to exploit was the discharge summaries, which should contain all the mentions of hemorrhage, thrombosis, and similar events, as well as the context in which they appeared.

The goal was to rely on the discharge summaries, provided as unstructured text files, to extract the mentions of ADEs, and other relevant elements such as risk factors (alcohol consumption, smoking habits, etc.). To do so requires out-of-the-box thinking. On the one hand, existing NLP models are not tailored to work with discharge summaries, which have a particular structure and use a particular language. On the other hand, training a specific NLP model from scratch would require a complete training dataset to be created from scratch. Both options do not provide a satisfactory answer.

Solution

Effixis built an annotation tool relying on existing NLP models that sped up the annotation process by a factor of 100. The tool pinpoints all sentences in the discharge letters that might be informative about a hemorrhagic event, allowing annotators to focus on evaluating these specific sentences rather than reading the whole set of documents. In addition, the approach allowed us to enrich the dataset and extract many risk factors (e.g., smokers) and antecedents (e.g., cancer) automatically from the letters.

In turn, these annotated documents provided Effixis’ team with a rich dataset that could be exploited to classify discharge summaries containing mentions of hemorrhagic events.

Work continues to fine-tune the models and the methodology, and to publish the results in a medical journal.

Approach

Working in a field requiring such specific expertise, our team had to work in close collaboration with the pharmacovigilance experts, to understand the project requirements as well as the required data cleaning and engineering steps. In the process, we developed a graphical user interface that was used by the CHUV team to annotate exit letters and estimate the number of hemorrhagic events in the cohort. Due to the nature of the data, we worked on the client’s infrastructure using virtual machines.

The project is expected to continue following a second research project aiming at applying this technology in real-time clinical monitoring at the hospital level.

Technologies

Python, R

NLP

Streamlit

Challenges

Discharge summaries don’t follow a typical text structure, as the information they contained is extremely rich and the formulations very dense. They are hardly understandable for anyone external to the medical field. For this reason, we developed models that have a semantic understanding of each individual sentence in the discharge letter. As manual review is a mandatory step in such a project, we also developed our models in such a way that they do not take final decisions but rather speed up and guide the manual work.

Similar case studies

17/05/2024

Financial Services

Fostering a Culture of AI-driven Innovation in the Financial Sector

Empowering Top Executives with Generative AI for Strategic Innovation

10/05/2024

Luxury

AI and IoT Integration in Luxury Brand Management

Blending Tradition with Innovation: Empowering Managers with Cutting-Edge Technologies

03/04/2024

Marketing

Beyond Traditional Learning: Shaping the Next Generation of Educational Tools

From Concept to Classroom: The Journey of Developing an AI-Powered Quiz Platform

02/11/2023

Healthcare

Leveraging AI to Digitize Vaccine Passports and Optimize Allergy Monitoring

Improving Pediatric Patient Care through Unstructured Medical Text Analysis.

06/09/2023

Manufacturing

Internal Intelligent Assistant for Industrial Machinery

Efficient and Transparent Knowledge Retrieval with LLM-Powered Smart Chatbot.

30/08/2023

FMCG

Mastering Generative AI - Workshop for CXOs in the Packaging Sector

Advancing Awareness, Sparking Innovation: Charting Comprehensive AI Path.

Newsletter

Stay tuned, get inspired

Join our newsletter by filling out the form below

By clicking, your are confirming that your agree with our Terms & Conditions