Working Paper
Precision Without Labels: Detecting Cross-Applicants in Mortgage Data Using Unsupervised Learning
Abstract: We develop a clustering-based algorithm to detect loan applicants who submit multiple applications (“cross-applicants”) in a loan-level dataset without personal identifiers. A key innovation of our approach is a novel evaluation method that does not require labeled training data, allowing us to optimize the tuning parameters of our machine learning algorithm. By applying this methodology to Home Mortgage Disclosure Act (HMDA) data, we create a unique dataset that consolidates mortgage applications to the individual applicant level across the United States. Our preferred specification identifies cross-applicants with 92.3% precision.
https://doi.org/10.21799/frbp.wp.2025.25
Access Documents
File(s): File format is application/pdf https://www.philadelphiafed.org/-/media/FRBP/Assets/working-papers/2025/wp25-25.pdf
Bibliographic Information
Provider: Federal Reserve Bank of Philadelphia
Part of Series: Working Papers
Publication Date: 2025-09-02
Number: 25-25