Structured feature extraction from real estate listings

A company specialized in real estate appraisals needed to automate the extraction of structured features (e.g., number of rooms, floors, property type) from a large volume of unstructured real estate listings.
10/06/2022
Natural Language Processing
Real Estate

Context

Real estate listings are a rich source of data for appraisals and price forecasts. However, extracting structured features from these listings can be a tedious and error-prone process, often requiring manual effort. Named Entity Recognition (NER) models can automate this process by identifying and extracting entities such as property types, floors, number of rooms, and more directly from real estate listings. Additionally, text classification models can automate the process of tagging listings under one or more general categories, such as holiday home or ski residence.

Solution

Our solution was to use state-of-the-art Natural Language Processing (NLP) techniques to annotate a custom dataset of real estate listings and train NER and text classification models on the task of structure feature extraction. We targeted over 10 entities for the NER models and 2 categories for the text classification models. A web app was deployed to host the models and act as an interface for the end user. The application supported the testing of the models on new real estate listings and could automatically generate structured tables of real estate features directly from unstructured listings. Additionally, a clustering and recommendation engine was built to identify and suggest similar listings.

Approach

We started by collecting a dataset of real estate listings in Switzerland and manually labelling thousands of examples across several languages. To accelerate the labelling, we used a process called active learning, in which models were trained during the annotation loop to suggest annotations on unlabelled data. Once enough data were collected, separate NER and classification models were trained for each language. In parallel, a webapp interface was developed to host the finished models. The webapp interface supported the loading and running of any of the models on new real estate listings, which could be copy-pasted directly into the application.

Technologies

Python
Natural Language Processing
Docker
Azure Cloud

Challenges

  • Multilingual data: real estate listings in Switzerland may be posted in one of several languages (e.g. English, French, Italian, or German), requiring custom-trained models specialized for each language
  • Limited training data: limited public data exists for training NER or classifications models on real estate listings, requiring the data scraping and labeling to be handled in-house
  • User-friendly interface: the models must be accessible in a user-friendly UI with low latency.

Similar case studies

02/11/2023
Healthcare

Leveraging AI to Digitize Vaccine Passports and Optimize Allergy Monitoring

Improving Pediatric Patient Care through Unstructured Medical Text Analysis.
Male carpenter wearing face mask while going through paperwork in a workshop.
06/09/2023
Manufacturing

Internal Intelligent Assistant for Industrial Machinery

Efficient and Transparent Knowledge Retrieval with LLM-Powered Smart Chatbot.
a picture of a two people, one gesturing with his hands, in front of a laptop displaying some graphics
30/08/2023
FMCG

Mastering Generative AI - Workshop for CXOs in the Packaging Sector

Advancing Awareness, Sparking Innovation: Charting Comprehensive AI Path.
a picture of the screen representing the tools of Cyberdefense System based on NLP techniques
02/05/23
Public Sector

Cyberdefence & Language Models – Enhanced monitoring

Innovative AI powered dashboard to foster cybersecurity news monitoring.
a picture of a recruitment agency exchanging a document
04/02/2023
Human Resources

Generative AI for HR Innovation

AI basics, HR applications, and ethical considerations in a Generative AI workshop.
a picture of a hand signing an official document
23/03/2023
Legal

Revolutionizing legal document research

AI-Powered semantic search and novelty detection for efficient document analysis.

Newsletter

Stay tuned, get inspired

Join our newsletter by filling out the form below

By clicking, your are confirming that your agree with our Terms & Conditions

Stay tuned !​

Don’t miss out on our latest news – subscribe to our newsletter today!