Missing Data due to Record Linkage of Register and Survey Information. An Empirical Comparison of Selected Missing Data Techniques

Gerhard Krug


Linking register to survey data is becoming more and more important for empirical social science. Due to reasons of data protection the respondents have be asked for their permission to link their data. The resulting sample can therefore be selective. Missing data techniques can be used to correct for any record linkage bias. In this paper I use a survey where participants were asked permission for combining the survey with administrative data (record linkage). Based upon this survey the performance of different missing data techniques is compared. For those who refuse their permission I set their survey answers to missing, creating pseudo-missing data following an empirical relevant but unknown mechanism (rather than a statistical simulation of a missing data process). OLS Regression is performed using casewise deletion, multiple imputation and two versions of Heckman’s sample selection model, respectively, to correct for the pseudo-missing data. The results are compared to a regression that is based on the complete data set and that gives us the “true” regression parameters. In an empirical example analysis characterized by weak selectivity of the missing data, all missing data techniques performed quite well. In a second example analysis with strong selectivity, it was only multiple imputation that was able to correct for the record linkage bias, given that missing values were present only in one or more independent variables. In the case of strong selectivity and missing values in the dependent variable, none of the missing data techniques eliminated the bias.

Full Text:

PDF (Deutsch)

DOI: https://doi.org/10.12758/mda.2010.002


  • There are currently no refbacks.

Copyright (c) 2016 Gerhard Krug

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.