Data Science Quick Overview

There's a lot of misconceptions and jargon when it comes to data science. As Hal Varian (chief economist at Google) simply put it in 2008: “The ability to take data — to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it — that’s going to be a hugely important skill in the next decades.”

Another great way to visualize it is through the below graphic. Data science is a combination of statistics, computer science, and domain expertise. There are many roles within this set of fields, but the main ideas are that we need people who can gather data, analyze it, visualize it, and explain why it matters.

Machine Learning vs Data Science

Machine Learning

Machine learning is a set of statistical techniques used to predict variation. They are like regressions but can be much more complicated and intensive for computers to run. They are being used very often because they generally need a lot of data, which is now available.

When people use machine learning, generally they are looking to predict something. This might be forecasting future stock prices or the probability that someone will commit a crime if they are released on bail.

Our club uses these techniques but does not dive too intensively into the theory. For our purposes, we are interested in the application. For futher reading on the issue, there is a fantastic technical paper from the Belfer Center.

Data Science

Alternatively, data science represents a much bigger scope. Many data scientists use machine learning techniques every day, but they don't necessarily need to do so. Data scientists also need to think about how they are going to get their data and communicate that for real change.