Structured feature extraction from real estate listings

A company specialized in real estate appraisals needed to automate the extraction of structured features (e.g., number of rooms, floors, property type) from a large volume of unstructured real estate listings.
10/06/2022
Natural Language Processing
Real Estate

Context

Real estate listings are a rich source of data for appraisals and price forecasts. However, extracting structured features from these listings can be a tedious and error-prone process, often requiring manual effort. Named Entity Recognition (NER) models can automate this process by identifying and extracting entities such as property types, floors, number of rooms, and more directly from real estate listings. Additionally, text classification models can automate the process of tagging listings under one or more general categories, such as holiday home or ski residence.

Solution

Our solution was to use state-of-the-art Natural Language Processing (NLP) techniques to annotate a custom dataset of real estate listings and train NER and text classification models on the task of structure feature extraction. We targeted over 10 entities for the NER models and 2 categories for the text classification models. A web app was deployed to host the models and act as an interface for the end user. The application supported the testing of the models on new real estate listings and could automatically generate structured tables of real estate features directly from unstructured listings. Additionally, a clustering and recommendation engine was built to identify and suggest similar listings.

Approach

We started by collecting a dataset of real estate listings in Switzerland and manually labelling thousands of examples across several languages. To accelerate the labelling, we used a process called active learning, in which models were trained during the annotation loop to suggest annotations on unlabelled data. Once enough data were collected, separate NER and classification models were trained for each language. In parallel, a webapp interface was developed to host the finished models. The webapp interface supported the loading and running of any of the models on new real estate listings, which could be copy-pasted directly into the application.

Technologies

Python
Natural Language Processing
Docker
Azure Cloud

Challenges

  • Multilingual data: real estate listings in Switzerland may be posted in one of several languages (e.g. English, French, Italian, or German), requiring custom-trained models specialized for each language
  • Limited training data: limited public data exists for training NER or classifications models on real estate listings, requiring the data scraping and labeling to be handled in-house
  • User-friendly interface: the models must be accessible in a user-friendly UI with low latency.

Similar case studies

02/05/23
Defence

Cyberdefence & Language Models – Enhanced monitoring

Innovative AI powered dashboard to foster cybersecurity news monitoring.
23/03/2023
Legal

Revolutionizing legal document research

AI-Powered semantic search and novelty detection for efficient document analysis.
10/01/2021
Financial Services

A new approach to the Swiss Solvency Test (SST)

A case study on delivering a groundbreaking open-source solution for FINMA.
04/03/2023
Healthcare

Upskilling 1,000+ Managers in Digital Technologies

A Comprehensive training program in Digital Foundations, AI, and Data Ethics.
09/06/22
Chemicals

Global monitoring of specialized chemical industry 

Leveraging our NLP-based Intelligent Tagging Platform in the chemical industry.
10/06/2022
Real Estate

Structured feature extraction from real estate listings

Extracting structured real estate information from text using cutting-edge NLP.

Newsletter

Stay tuned, get inspired

Join our newsletter by filling out the form below

Stay tuned !​

Don’t miss out on our latest news – subscribe to our newsletter today!