Real estate listings are a rich source of data for appraisals and price forecasts. However, extracting structured features from these listings can be a tedious and error-prone process, often requiring manual effort. Named Entity Recognition (NER) models can automate this process by identifying and extracting entities such as property types, floors, number of rooms, and more directly from real estate listings. Additionally, text classification models can automate the process of tagging listings under one or more general categories, such as holiday home or ski residence.
Our solution was to use state-of-the-art Natural Language Processing (NLP) techniques to annotate a custom dataset of real estate listings and train NER and text classification models on the task of structure feature extraction. We targeted over 10 entities for the NER models and 2 categories for the text classification models. A web app was deployed to host the models and act as an interface for the end user. The application supported the testing of the models on new real estate listings and could automatically generate structured tables of real estate features directly from unstructured listings. Additionally, a clustering and recommendation engine was built to identify and suggest similar listings.
We started by collecting a dataset of real estate listings in Switzerland and manually labelling thousands of examples across several languages. To accelerate the labelling, we used a process called active learning, in which models were trained during the annotation loop to suggest annotations on unlabelled data. Once enough data were collected, separate NER and classification models were trained for each language. In parallel, a webapp interface was developed to host the finished models. The webapp interface supported the loading and running of any of the models on new real estate listings, which could be copy-pasted directly into the application.
Stay tuned !