A couple of economics professors at Stanford, Jonathan Levin and Liran Einav, have dived into the econ/big data intersection.
Along with a nice overview of the space, they include some notes indicating the challenges economists have faced in incorporating statistical and machine learning techniques:
The statistical and machine learning techniques that underlie these [big data] applications, such as Lasso and Ridge regressions and classification models, are now common in statistics and computer science, although rarely used in empirical microeconomics.
Modern datasets also have much less structure, or more complex structure, than the traditional cross-sectional, time-series or panel data models that we teach in our econometrics classes.
With this information, it is possible to create an almost unlimited set of individual-level behavioral characteristics. While this is very powerful, it is also challenging. In econometrics textbooks, data arrives in “rectangular” form, with N observations and K variables, and with K typically a lot smaller than N.
When data simply record a sequence of events, with no further structure, there are a huge number of ways to move from that recording into a standard “rectangular” format. Figuring out how to organize unstructured data and reduce its dimensionality, and assessing whether the way we impose structure matters, is not something that most empirical economists have been taught or have a lot of experience with, but it is becoming a very common challenge in empirical research….