Working Paper
Finding Needles in Haystacks: Multiple-Imputation Record Linkage Using Machine Learning
Abstract: This paper considers the problem of record linkage between a household-level survey and an establishment-level frame in the absence of unique identifiers. Linkage between frames in this setting is challenging because the distribution of employment across establishments is highly skewed. To address these difficulties, this paper develops a probabilistic record linkage methodology that combines machine learning (ML) with multiple imputation (MI). This ML-MI methodology is applied to link survey respondents in the Health and Retirement Study to their workplaces in the Census Business Register. The linked data reveal new evidence that non-sampling errors in household survey data are correlated with respondents’ workplace characteristics.
Keywords: administrative data; machine learning; multiple imputation; probabilistic record linkage; survey data;
JEL Classification: C13; C18; C81;
https://doi.org/10.29412/res.wp.2022.11
Access Documents
File(s):
File format is text/html
https://www.bostonfed.org/publications/research-department-working-paper/2022/finding-needles-in-haystacks-multiple-imputation-record-linkage-using-machine-learning
Description: Summary
File(s):
File format is application/pdf
https://www.bostonfed.org/-/media/Documents/Workingpapers/PDF/2022/wp2211.pdf
Description: Full text
Authors
Bibliographic Information
Provider: Federal Reserve Bank of Boston
Part of Series: Working Papers
Publication Date: 2021-10-01
Number: 22-11