26 January 2019 by Lars
Research Data Centers (or Centres) are not “data jails” – they are data safes!
This is a slightly de-twitter-fied version of this Tweet thread.
This 👉article (“What went wrong at Statscan?” by @taviagrant @globeandmail ) just does not get #privacy and comes to a patently wrong conclusion. It ends by claiming that secure research data centers @CRDCN are “data jails” – that is just plain stupid.
Full disclosure: Starting April 1, I will be on the board of @CRDCN board, which provides guidance on how to manage the Canadian network of secure research data facilities. I am also a participant in the 🇺🇸 discussion about data publication. Bias? Then stop reading now.
The article cites Wendy Watkins @Carleton_U who states that people wishing to access confidential data in the @CRDCN “data jail” have to “pass a series of bureaucratic hurdles”. That is just stupid. Do you really want to let anybody inspect your tax, health, school records?
Not saying that access to data has the right price – that is an open debate – but that verifying who we open the books for is NOT a banal “bureaucratic hurdle”. If you think so, please tweet complete T1, T4 tax and full health record when requesting access
Who has an honorable interest (for instance, but not exclusively, 👩🔬research) and who has not (🕵️♂️want to know what your neighbor earned? Your ex-spouse now earns?)? Figuring that out is hard! THAT’s what the “bureaucratic hurdles” are supposed to check.
Could they be faster🚤? Cheaper 💱? Easier for the person requesting access? Of course! Please write it all on a post-it 🗒️, just say “I’m honest” and I’ll hand you my credit card 💳, PIN number, check card… because you’re honest, right?
Should there be MORE access do data that is currently not accessible even in the @CRDCN ? Of course (looking at you, @RAMquebec ⚕️ health data).
Should there be more historical data to allow for longitudinal analysis (over a long period of time for the same people)? Sure! Should it have data on race, immigration status, religion as @taviagrant argues elsewhere ?
While such data on race and other demographics can inform good policy, it can also inform policies that we might today consider to be bad. The US collected the data in the 19th century to better count slaves. Used it to enforce the restrictive immigration laws of 1882, 1892 barring Chinese and Asian immigration, and in 1924 limiting all immigration using quotas based on ethnic origin (see https://www.migrationpolicy.org/research/timeline-1790)
Have you followed the recent discussion about the citizenship question in the US?? While it might great for sociologists and researchers in the future, current citizens might have a different opinion. See reporting by @hansilowang .
See https://www.npr.org/2019/01/22/687510944/trump-administration-to-ask-supreme-court-to-decide-citizenship-questions-fate and https://www.npr.org/2019/01/16/685777445/administration-must-remove-census-citizenship-question-judge-rules The grass is not necessarily greener on the other side of the fence … er… border(wall).
The discussions about right levels of data publication, data access and privacy are hard, necessary and not trivial (not unique to 🇨🇦@StatCan_eng @StatCan_fra – see casd.eu 🇫🇷, @uscensusbureau 🇺🇸), and they should be had.
In fact, the discussions are happening: At the Allied Social Sciences Meetings in January 2019 (also https://arxiv.org/abs/1808.06303) and soon at the Open Government Partnership Summit and the Canadian Economics Association meetings. Inform yourself, for instance with @john_abowd‘s http://blogs.cornell.edu/abowd/special-materials/tweetorial-formal-privacy-for-social-scientists/ . Participate!
But the problems cannot be resolved by pestering about “data jails” and essentially saying “throw open the doors”.
You’re not letting the good data out from jails, you are letting the bad guys 🕵️♂️ into the safe house.