“Total Variability Measures for Selected Quarterly Workforce Indicators and LEHD Origin Destination Employment Statistics in OnTheMap”, Andrew Green (Cornell University), Kevin McKinney (U.S. Census Bureau), Lars Vilhuber (Cornell University), John Abowd (Cornell University)
We report results from the first comprehensive total quality evaluation of three major indicators in the U.S. Census Bureau’s Longitudinal Employer-Household Dynamics (LEHD) Program Quarterly Workforce Indicators (QWI): beginning-of-quarter employment, full-quarter employment, and average monthly earnings of full-quarter employees. Beginning-of-quarter employment is also the main tabulation variable in the LEHD Origin-Destination Employment Statistics workplace reports as displayed in OnTheMap (OTM). The evaluation is conducted using the multiple threads generated by the edit and imputation models used in the LEHD Infrastructure File System. These threads conform to the Rubin (1987) multiple imputation model. Each implicate is the output of formal probability models that address coverage, edit and imputation errors. Design-based sampling variability and finite population corrections are also included in the evaluation. We derive special formulas for the Rubin total variability and its components that are consistent with the disclosure avoidance system used for QWI and LODES/OTM workplace reports. These formulas allow us to publish the complete set of detailed total quality measures for QWI and LODES. The analysis reveals that the three publication variables under study are estimated very accurately for tabulations involving at least 10 jobs. Tabulations involving three to nine jobs have acceptable quality. Tabulations involving one or two jobs, which are generally suppressed in the QWI, have substantial total variability but their publication in LODES allows the formation of larger custom aggregations, which will in general have the accuracy estimated for tabulations in the QWI of similar magnitude.
“Usage and outcomes of the Synthetic Data Server,” Lars Vilhuber (NCRN, Cornell University) and John Abowd (NCRN, Cornell University)
The Synthetic Data Server (SDS) at Cornell University was set up to provide early access to new synthetic data products by the U.S. Census Bureau. These datasets are made available to interested researchers in a controlled environment, prior to a more generalized release. Over the past 5 years, 4 synthetic datasets were made available on the server, and over 100 users have accessed the server over that time period. This paper reports on interim outcomes of the activity: results of validation requests from a user perspective, functioning of the feedback loop due to validation and user input, and the role of the SDS as a access gateway to and educational tool for other mechanisms of accessing detailed person, household, establishment, and firm statistics.
Lars Vilhuber speaks about “Disclosure Limitation and Confidentiality Protection in Linked Data” at the Center for Interuniversity Research and Analysis of Organizations‘s conference on “Facilitate the access to Quebec data: How and to what ends?” The conference is jointly organized with the Quebec inter-University Centre for Social Statistics (QICSS). The presentation relies on joint work with John M. Abowd and Ian M. Schmutte.