Revolutionizing legal document research

AI-Powered semantic search and novelty detection for efficient document analysis. A legal technology startup wanted to develop an AI-powered tool to analyze large quantities of legal texts for the purposes of document search and review.
23/03/2023
Natural Language Processing
Legal

Context

A large portion of a lawyer’s job involves searching for and reading through dense legal documents. This is an expensive and time-consuming task, made more difficult by the rapid pace of legal changes in the technology sector. To efficiently navigate through this corpus of documents requires a powerful search engine, tailored to the needs of lawyers. Existing search engines rely on keyword-based techniques which are too narrowly specific and do not capture broader meaning and intent in the user’s search query. Recent advancements in Natural Language Processing (NLP) have made possible the technique of semantic search – a new type of search engine that understands the meaning of a user’s query and finds relevant documents to that query. Using the same underlying principles, this semantic search engine can also be used to identify novel pieces of information within a corpus of documents.

Solution

The objective of the project was to research and develop a proof-of-concept application that showcases the power of an AI-powered semantic search engine for legal documents. The tool would provide a user-friendly interface to search for legal documents with a natural language query and display the top documents with the most relevant sections pre-highlighted. Additionally, the tool would provide novelty detection functionality by identifying the most unique passages in the corpus. In both cases, a custom-trained text classification model would allow the user to restrict their search to chunks of text falling within one of several orthogonal categories (e.g. general facts, judicial rulings).

Approach

A large volume of legal documents was obtained by scraping a web database of European Union court rulings to provide an example corpus of legal documents. The solution relied on using state-of-the-art NLP models to power the semantic search engine. For the initial proof-of-concept, pre-trained sentence transformer models were chosen for their general text encoding capabilities. In later phases of the project, custom models were fine-tuned using a dataset tailored for the legal domain. The semantic search and novelty detection algorithms were based on measuring the proximity of a user’s search query to the chunks of text in the document corpus. The text classification model was trained in a self-supervised fashion by exploiting the HTML structure of the scraped documents. The web application was developed using Streamlit and deployed as a Docker application on the Azure Cloud, allowing the client to run the service from anywhere.

Technologies

Python
NLP
Docker
Azure Cloud
Streamlit

Similar case studies

02/11/2023
Healthcare

Leveraging AI to Digitize Vaccine Passports and Optimize Allergy Monitoring

Improving Pediatric Patient Care through Unstructured Medical Text Analysis.
Male carpenter wearing face mask while going through paperwork in a workshop.
06/09/2023
Manufacturing

Internal Intelligent Assistant for Industrial Machinery

Efficient and Transparent Knowledge Retrieval with LLM-Powered Smart Chatbot.
a picture of a two people, one gesturing with his hands, in front of a laptop displaying some graphics
30/08/2023
FMCG

Mastering Generative AI - Workshop for CXOs in the Packaging Sector

Advancing Awareness, Sparking Innovation: Charting Comprehensive AI Path.
a picture of the screen representing the tools of Cyberdefense System based on NLP techniques
02/05/23
Public Sector

Cyberdefence & Language Models – Enhanced monitoring

Innovative AI powered dashboard to foster cybersecurity news monitoring.
a picture of a recruitment agency exchanging a document
04/02/2023
Human Resources

Generative AI for HR Innovation

AI basics, HR applications, and ethical considerations in a Generative AI workshop.
a picture of Swiss mountains to represent the collaboration of Effixis with FINMA in a creation of an open-source, simulation-based market risk software tool that will be adopted by market participants
10/01/2021
Public Sector

A new approach to the Swiss Solvency Test (SST)

A case study on delivering a groundbreaking open-source solution for FINMA.

Newsletter

Stay tuned, get inspired

Join our newsletter by filling out the form below

By clicking, your are confirming that your agree with our Terms & Conditions

Stay tuned !​

Don’t miss out on our latest news – subscribe to our newsletter today!