July 04, 20163 min read

Data Science In a Nutshell

Data science is not rocket science. Although a data scientist needs to have many skills in different areas, you do not need to be an omniscient unicorn to take advantage of some parts of data science.

Definition of Data Science

“Data science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured, which is a continuation of some of the data analysis fields such as statistics, data mining, and predictive analytics, similar to Knowledge Discovery in Databases.” – Wikipedia

“The science of dealing with data, once they have been established, while the relation of the data to what they represent is delegated to other fields and sciences.” – Peter Naur

“By ‘Data Science’ we mean almost everything that has something to do with data: Collecting, analyzing, modeling…” – The Journal of Data Science

1. Use experiments and A/B tests when possible.

You make many decisions each day. The effect of these decisions could cost you money. If you want to make the best decision, you had better take A/B test results into account.

1. Calculate the median, not just the mean.

A common mistake is to calculate the mathematical mean instead of the median. If outliers exist in your data (and they do almost every time), then there is no point in calculating a traditional mean because it will be biased towards the outliers. In most situations, the median is a much better indicator.

1. Gather as much data as you can from your own business.

You have more data than you think. Gmail, Google Analytics, and CRM software all have functions to export data to a spreadsheet. You may have customer surveys or data from your fitness app, smart bracelet, and so on. Gather data from different sources because data storage is easy and cheap. Even if you now think that the data are unnecessary, you may realize later that you can take advantage of the historical data you stored previously.

1. Do not think that correlation is causation.

If you look at a chart or a scatterplot, you may notice a relationship between some things. This relationship can help you understand your business and helps you predict the future, but do not ever think that one causes the other. There may be a hidden reason that causes both.

1. Visualize your data.

Humans have evolved to understand visual information, not numeric. If you want yourself and others to understand the story behind the data, you must visualize it.

1. Do not listen to your intuition.

Your intuition is just the result of your neural network in your brain, which was taught a very limited amount of data, and it was historically mainly designed to survive, not to get rich.