Challenges for the Modern Economist

Lars Vilhuber

Challenges for the Modern Economist



  • I don't run my own surveys
  • I occasionally create official statistics (bias!)
  • I did my Ph.D. in the 1990s

Big data

or rather…

Alternate data sources

What is "big" data?

Big data is:

  • 8 GB?
  • 7.5 TB?
  • 309.3 million people, measured once?
  • 150 million people, measured 80 times?
  • 16,721,787,543 tables?
  • 100 countries, 5-15 times?
  • 19.3 billion records / 50TB?
  • 112 weekly data points?

Representative big data is:

  • I can't run it on my laptop
  • 2 years worth of stock trades?
  • 10 questions / population/ 1 country?
  • 3 variables for 98% of one country's workforce?
  • 30+ variables for same?
  • 1% samples of 100 countries' censuses?
  • 10% of tweets?
  • 1 variable for 10% of Twitter users?

What is "big" data?

This brings up the question: How do we collect data?

How do we collect data?

Surveys Administrative Organic Data
Aim Informational Administer programs … something else (Twitter?)
Who Trained professionals designing, fielding, analyzing surveys Trained professionals running a bureaucracy, collecting necessary data Trained professionals optimizing revenue
Core Well established science, defining population, frame Definition of population, frame critical, but ex-post Population and frame often unclear
Stats Primary purpose is to create statistics Statistics about populations is secondary purpose Public statistics at best incidental, possibly self-serving

New challenges

  • treating admin/organic as a noisy data source, different from surveys
  • designing administrative data collection with statistics in mind
  • handling large data flows in commonly accepted ways
  • novel confidentiality issues <!– - reconceptualizing multiciplity of data sources –>

Data collection in surveys

Respondent load should always be considered when planning a statistical collection and there should be policies and practices in place to manage relationships with respondents. The aim should always be to keep reporting load to the minimum and to maintain the high quality of collections.”

Australian National Statistical Service

Data collection in administrative data


  • Clients cannot get service without filling form
  • Coercion


Data collection discrepancies

Data collection discrepancies


  • where did you work (precise lat/long) in the past 10 years?
  • who did you work for in the past 10 years?


  • IRS Form W4, line 8
  • CRA-ARC T4, box 54

Data collection discrepancies

Eliminating discrepancies: