Combining Information from Multiple Data Sources to Improve Sampling Efficiency

Paul Burton, Sunghee Lee, Trivellore Raghunathan, Brady T. West


Many surveys target population subgroups that may not be readily identified in sampling frames. In the case study that motivated this study, the target population was households with children between the ages of 3 and 10 from two areas surrounding Cleveland, Ohio and Dallas, Texas. A standard approach is to sample households from these two areas and then screen for the presence of age-eligible children. Based on the estimated number of age-eligible households in these two areas, this approach would have required completing screening interviews with 5.4 to 5.7 households to find one eligible household. We developed a model-assisted sample design strategy to improve screening efficiency by attaching a measure of eligibility propensity to each household in the population. For this, we used a modeling and imputation strategy that combined information from several data sources: (1) the population of addresses for these two areas with demographic covariates from a commercial vendor, (2) external population data (from the American Community Survey and Census Planning Data) for these two areas, and (3) screening data from a large nationally representative survey. We first tested this sampling strategy in a pilot study and then implemented it in the main study. This strategy required 4.2 to 4.3 completed screeners to identify one eligible household. The proposed approach therefore improved the sampling efficiency by about 25% relative to the standard approach.


address-based sampling, imputation, rare populations, commercial data, census data, address frame

Full Text:




  • There are currently no refbacks.

Copyright (c) 2024 Paul Burton, Sunghee Lee, Trivellore Raghunathan, Brady T. West

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.