As a Stanford undergraduate, I decided to major in economics because clever uses of data were always in the air.
One of my professors hand-classified 10,000 records from a 19th century industrial fair, finding an incredible story about how patent laws influenced the course of European tech innovation. A classic paper in the field constructed a 10,000-year time-series of lighting efficiency (watts/lumen) as an alternate estimate of GDP growth.
After reading enough papers, I realized something: the key to success is in cleverly selecting, finding, or creating a data source that answers a particular question.
Afterwards, econometric tools (regression, etc) are used, to squeeze statistical significance out of a relatively small, standardized data set. Reinhard and Rogoff performed their now-infamous analysis in an Excel spreadsheet.
When I graduated, the questions had changed, but the fundamental tools of analysis remained constant.
Half of my classmates, including me, were headed to consulting or investment banking. These are “spreadsheet monkey” positions analyzing client financial and operational data.
In terms of relationship-building, this is great. Joining high strategy or high finance, you walk through the halls of power and learn to feel comfortable there.
But in terms of technical skill-set, not so great.You begin to specialize in spreadsheets, a tool which hasn’t significantly improved since 1995.
For someone like me, who wants to solve the most interesting problems out there, dealing with gigabytes and terabytes of data, realizing this was bitter medicine.Computational data analysis has changed a lot in the last twenty years, but my career track — economics, consulting, finance — hadn’t.I realized that, if I wanted to become a data scientist, I would have to make a leap and teach myself programming.