Big data and small data
Data is the new gold. Data is the new oil. In the past years big data became very popular, and now in 2016 small data is the “new big data”.In data science both of them are very important, however we can read a lot of different opinions in this topic. In this article we compare big data and small data and examine both sides of the coin.
What is big data and small data?
Big data definition:
Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate to deal with them. Accuracy in big data may lead to more confident decision-making, and better decisions can result in greater operational efficiency, cost reduction and reduced risk.
Small data definition:
Small data is a dataset that contains very specific attributes. Small data is used to determine current states and conditions or may be generated by analyzing larger data sets (Big data)
Big data vs. small data
The main difference between big data and small data is the size of the dataset we analyze. It is clear for everyone, but beside this there are many other fields where we can differentiate big data and small data. On the comparison table below you can find five categories. The source of the data, 3V: volume of the data, variety of types of the data and velocity, + value.
Comparison table Source: https://datafloq.com/read/small-data-vs-big-data-back-to-the-basic/706
The trap of small data
The most often problem with big data is the size of the database. We have too much data, not organised and the data cleaning needs a lot of time. Moreover we need different tools to analyze and visualize our data. Therefore the small data seems to be easier and faster to analyze. But there is a trap. When we have just a few data (compared to big data), we can’t ignore coincidence. So sometimes analyzing small data can be more difficult than big data. To see the certainty we need to use A/B test https://www.answerminer.com/calculators/ab-test/ . It will show us the real difference between two cases.
Enlarge small data
In case of big data we have enough data to see even the smallest correlations in our dataset. In small data we don’t have this opportunity, but there is a way, if we want to know more about our data. We can make bigger data from small data. There are several types of columns we can broaden our dataset with.
- Name → Gender
- City → Population
- Date → Weekend Yes/No
- Birthdate → Zodiac sign
- Address → Real estate market
Small data from big data
As we can enlarge our small data we can filter and aggregate our big data. As we wrote in the beginning in the article, big data is a huge amount of information. First of all we need to store and manage data. Nowadays it is closely free, so most of the companies save all of their data. They don’t know what to use it for, but later they will need it. When they have the dataset, they will need some tools to discover the relationships in big data. They need to find relevant groups and types of the data. After they can create smaller and smaller datasets they will get small data.
All in all we can say there are many advantages and disadvantages of big data and in small data as well. We need to recognize our needs, and possibilities and decide what kind of information we need, and what we would like to discover from our data. After that we can decide between big data and small data, or we can use both. Big data and small data are equally important.