Extracting structured information from discharge letters

ADEs are a major challenge in pharmacovigilance, and a first step is to correctly identify their occurrences.
Natural Language Processing


As part of a Swiss-wide pharmacovigilance project involving 4 major Swiss hospitals and renowned pharmaceutical sciences research labs, and focused on the detection of Adverse Drug Events (ADEs) linked to antithrombotic drugs, the team of researchers required Effixis’ expertise on AI topics and more specifically NLP.

ADEs are a major challenge in pharmacovigilance, and the first step is to correctly identify their occurrences. For instance, in the context of antithrombotic drugs, ADEs appear in the form of hemorrhages, thrombosis, and similar events. Usually, those events are identified through diagnosis codes that are assigned to a hospital stay. However, one hypothesis of the project was that those codes are subject to a high error rate in the context of ADEs, due to numerous interactions between drugs, risk factors, and hemorrhagic events. For this reason, an important dataset to exploit was the discharge summaries, which should contain all the mentions of hemorrhage, thrombosis, and similar events, as well as the context in which they appeared.

The goal was to rely on the discharge summaries, provided as unstructured text files, to extract the mentions of ADEs, and other relevant elements such as risk factors (alcohol consumption, smoking habits, etc.). To do so requires out-of-the-box thinking. On the one hand, existing NLP models are not tailored to work with discharge summaries, which have a particular structure and use a particular language. On the other hand, training a specific NLP model from scratch would require a complete training dataset to be created from scratch. Both options do not provide a satisfactory answer.


Effixis built an annotation tool relying on existing NLP models that sped up the annotation process by a factor of 100. The tool pinpoints all sentences in the discharge letters that might be informative about a hemorrhagic event, allowing annotators to focus on evaluating these specific sentences rather than reading the whole set of documents. In addition, the approach allowed us to enrich the dataset and extract many risk factors (e.g., smokers) and antecedents (e.g., cancer) automatically from the letters.

In turn, these annotated documents provided Effixis’ team with a rich dataset that could be exploited to classify discharge summaries containing mentions of hemorrhagic events.

Work continues to fine-tune the models and the methodology, and to publish the results in a medical journal.


Working in a field requiring such specific expertise, our team had to work in close collaboration with the pharmacovigilance experts, to understand the project requirements as well as the required data cleaning and engineering steps. In the process, we developed a graphical user interface that was used by the CHUV team to annotate exit letters and estimate the number of hemorrhagic events in the cohort. Due to the nature of the data, we worked on the client’s infrastructure using virtual machines.

The project is expected to continue following a second research project aiming at applying this technology in real-time clinical monitoring at the hospital level.


Python, R


Discharge summaries don’t follow a typical text structure, as the information they contained is extremely rich and the formulations very dense. They are hardly understandable for anyone external to the medical field. For this reason, we developed models that have a semantic understanding of each individual sentence in the discharge letter. As manual review is a mandatory step in such a project, we also developed our models in such a way that they do not take final decisions but rather speed up and guide the manual work.

Similar case studies


Cyberdefence & Language Models – Enhanced monitoring

Innovative AI powered dashboard to foster cybersecurity news monitoring.

Revolutionizing legal document research

AI-Powered semantic search and novelty detection for efficient document analysis.
Financial Services

A new approach to the Swiss Solvency Test (SST)

A case study on delivering a groundbreaking open-source solution for FINMA.

Upskilling 1,000+ Managers in Digital Technologies

A Comprehensive training program in Digital Foundations, AI, and Data Ethics.

Global monitoring of specialized chemical industry 

Leveraging our NLP-based Intelligent Tagging Platform in the chemical industry.
Real Estate

Structured feature extraction from real estate listings

Extracting structured real estate information from text using cutting-edge NLP.


Stay tuned, get inspired

Join our newsletter by filling out the form below

Stay tuned !​

Don’t miss out on our latest news – subscribe to our newsletter today!