“Usage and outcomes of the Synthetic Data Server,” Lars Vilhuber (NCRN, Cornell University) and John Abowd (NCRN, Cornell University)
The Synthetic Data Server (SDS) at Cornell University was set up to provide early access to new synthetic data products by the U.S. Census Bureau. These datasets are made available to interested researchers in a controlled environment, prior to a more generalized release. Over the past 5 years, 4 synthetic datasets were made available on the server, and over 100 users have accessed the server over that time period. This paper reports on interim outcomes of the activity: results of validation requests from a user perspective, functioning of the feedback loop due to validation and user input, and the role of the SDS as a access gateway to and educational tool for other mechanisms of accessing detailed person, household, establishment, and firm statistics.
“Improving Access and Data Security to Confidential Labor Market Data”, Warren Brown (Cornell University), Stephanie Jacobs (Cornell University), David Schiller (German Institute for Employment Research), Jörg Heining (German Institute for Employment Research)
Abstract: The Cornell Institute for Social and Economic Research (CISER), Cornell University and the Institute for Employment Research (IAB), German Federal Employment Agency are collaborating to expand use of IAB’s confidential Sample of Integrated Labour Market Biographies (SIAB). DDI 2.5 is used to enable researchers to discover the files by means of variable level searching in a repository of metadata on U.S. and German labor market related data files. The repository is the Comprehensive Extensible Data Documentation and Access Repository (CED2AR) being developed by researchers at Cornell University with funding from the U.S. National Science Foundation. CED2AR provides researchers access to machine-readable codebooks with variable characteristics thus enabling researchers to develop detailed proposals for access to these data that are submitted to IAB. Researchers with approved projects are able to access and analyze the data using the Cornell Restricted Access Data Center (CRADC), a remote access virtual data enclave using remote desktop protocol. In the initial testing phase several researchers located in Europe and North America are successfully accessing and analyzing the Scientific Use Files of the SIAB. The project is well on its way to realizing the goal of wider access to researchers while improving secure management of confidential data.
The presentation can be found at http://hdl.handle.net/1813/44707
Lars Vilhuber speaks about “Disclosure Limitation and Confidentiality Protection in Linked Data” at the Center for Interuniversity Research and Analysis of Organizations‘s conference on “Facilitate the access to Quebec data: How and to what ends?” The conference is jointly organized with the Quebec inter-University Centre for Social Statistics (QICSS). The presentation relies on joint work with John M. Abowd and Ian M. Schmutte.