Structured feature extraction from real estate listings

A company specialized in real estate appraisals needed to automate the extraction of structured features (e.g., number of rooms, floors, property type) from a large volume of unstructured real estate listings.

10/06/2022

Natural Language Processing

Real Estate

Context

Real estate listings are a rich source of data for appraisals and price forecasts. However, extracting structured features from these listings can be a tedious and error-prone process, often requiring manual effort. Named Entity Recognition (NER) models can automate this process by identifying and extracting entities such as property types, floors, number of rooms, and more directly from real estate listings. Additionally, text classification models can automate the process of tagging listings under one or more general categories, such as holiday home or ski residence.

Solution

Our solution was to use state-of-the-art Natural Language Processing (NLP) techniques to annotate a custom dataset of real estate listings and train NER and text classification models on the task of structure feature extraction. We targeted over 10 entities for the NER models and 2 categories for the text classification models. A web app was deployed to host the models and act as an interface for the end user. The application supported the testing of the models on new real estate listings and could automatically generate structured tables of real estate features directly from unstructured listings. Additionally, a clustering and recommendation engine was built to identify and suggest similar listings.

Approach

We started by collecting a dataset of real estate listings in Switzerland and manually labelling thousands of examples across several languages. To accelerate the labelling, we used a process called active learning, in which models were trained during the annotation loop to suggest annotations on unlabelled data. Once enough data were collected, separate NER and classification models were trained for each language. In parallel, a webapp interface was developed to host the finished models. The webapp interface supported the loading and running of any of the models on new real estate listings, which could be copy-pasted directly into the application.

Technologies

Python

Natural Language Processing

Docker

Azure Cloud

Challenges

Multilingual data: real estate listings in Switzerland may be posted in one of several languages (e.g. English, French, Italian, or German), requiring custom-trained models specialized for each language
Limited training data: limited public data exists for training NER or classifications models on real estate listings, requiring the data scraping and labeling to be handled in-house
User-friendly interface: the models must be accessible in a user-friendly UI with low latency.

Similar case studies

17/05/2024

Financial Services

Fostering a Culture of AI-driven Innovation in the Financial Sector

Empowering Top Executives with Generative AI for Strategic Innovation

10/05/2024

Luxury

AI and IoT Integration in Luxury Brand Management

Blending Tradition with Innovation: Empowering Managers with Cutting-Edge Technologies

03/04/2024

Marketing

Beyond Traditional Learning: Shaping the Next Generation of Educational Tools

From Concept to Classroom: The Journey of Developing an AI-Powered Quiz Platform

02/11/2023

Healthcare

Leveraging AI to Digitize Vaccine Passports and Optimize Allergy Monitoring

Improving Pediatric Patient Care through Unstructured Medical Text Analysis.

06/09/2023

Manufacturing

Internal Intelligent Assistant for Industrial Machinery

Efficient and Transparent Knowledge Retrieval with LLM-Powered Smart Chatbot.

30/08/2023

FMCG

Mastering Generative AI - Workshop for CXOs in the Packaging Sector

Advancing Awareness, Sparking Innovation: Charting Comprehensive AI Path.

Newsletter

Stay tuned, get inspired

Join our newsletter by filling out the form below

By clicking, your are confirming that your agree with our Terms & Conditions