John M. Abowd and Ian M. Schmutte : “The Advantages And Disadvantages Of Statistical Disclosure Limitation For Program Evaluation”
Abstract: This paper formalizes the manner in which statistical disclosure limitation (SDL) hinders empirical research in economics. We also highlight a hitherto unappreciated advantage of SDL, formal privacy models, and synthetic data systems: they can serve as a defense against model overfitting and falsediscovery bias. More specifically, a synthetic data validation system can – and we argue should – be used in conjunction with systems in which researchers register their research design ahead of analysis. The key insight is that privacyprotected data can be used for model development while minimizing risk of model overfitting. To demonstrate these points, we develop a model in which the statistical agency collects data from a population, but publishes a version in which the data that have been intentionally distorted by some SDL process. We say the SDL process is ignorable if inferences based on the published data are indistinguishable from inferences based on the unprotected data. SDL is rarely ignorable. If the researcher has knowledge of the SDL model, she can conduct an SDLaware analysis that explicitly corrects for the effects of SDL. If, as is often the case, if the SDL model is unknown, we describe circumstances under which SDL can still be learned.
Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, & William C. Block: “Crowdsourcing Codebook Development and Enhancements in CED²AR”
Abstract: Recent years have shown the power of usersourced information evidenced by the success of Wikipedia and its many emulators. This sort of unstructured discussion is currently not feasible as a part of the otherwise successful metadata repositories. Creating and augmenting metadata is a laborintensive endeavor. Harnessing collective knowledge from actual data users can supplement officially generated metadata. As part of our Comprehensive Extensible Data Documentation and Access Repository (CED²AR) infrastructure, we demonstrate a prototype of crowdsourced DDI on actual codebooks. While the system itself is more general, the demonstrated implementation relies on a set of linked deployments of the basic software on web servers. The backend transparently handles changes, and frontend has the ability to separate official edits (by designated curators of the data and the metadata) from crowdsourced content. The implementation allows a data curator, such as a statistical agency, to collect and incorporate improvements suggested by knowledgeable users in a structured way.
Lars Vilhuber chairs session at JSM which includes multiple papers with NCRN contribution (presenter bolded, NCRN participants in red italics):
Robustness of Employer List Linking to Methodological Variation — Mark J. Kutzbach, U.S. Census Bureau ; Graton Gathright, U.S. Census Bureau ; Andrew Green, U.S. Census Bureau/Cornell University ; Kristin McCue, U.S. Census Bureau ; Holly Monti, U.S. Census Bureau ; Ann Rodgers, University of Michigan ; Lars Vilhuber, Cornell University ; Nada Wasi, University of Michigan ; Christopher Wignall, Amazon.com
Two Perspectives on Commuting and Workplace: A Microdata Comparison of Home-to-Work Flows Across Linked Survey and Administrative Files— Andrew Green, Cornell University/U.S. Census Bureau ; Mark J. Kutzbach, U.S. Census Bureau ; Lars Vilhuber, Cornell University
Developing Job Linkages for the Health and Retirement Study — Kristin McCue, U.S. Census Bureau ; John M. Abowd, U.S. Census Bureau/Cornell University ; Margaret Levenstein, University of Michigan ; Matthew Shapiro, University of Michigan ; Ann Rodgers, University of Michigan ; Nada Wasi, University of Michigan ; Dhiren Patki, University of Michigan
Lars Vilhuber speaks about “Disclosure Limitation and Confidentiality Protection in Linked Data” at the Center for Interuniversity Research and Analysis of Organizations‘s conference on “Facilitate the access to Quebec data: How and to what ends?” The conference is jointly organized with the Quebec inter-University Centre for Social Statistics (QICSS). The presentation relies on joint work with John M. Abowd and Ian M. Schmutte.