author picture
AnswerMiner
March 16, 20207 min read

Coronavirus (COVID-19) exploratory data analysis with AnswerMiner

This blog post was updated at March 22.

You could already read Bence Buday’s other article in our blog about analyzing Facebook page with AnswerMiner. Now he made some research and spent some time to analyze and visualize coronavirus data (jan 24 - mar 16) that will be updated.

We all know what COVID-19 is. All platforms of the media are full with this virus and its economical effect. It is hard to write without any emotions or comments. I am sitting at home in voluntary quarantine, lost almost ⅓ of my savings and have no idea what the future holds for us.

Anyway, this post won’t tell you how deadly the coronavirus is or how much toilet paper you should buy from the stores so please keep reading only if you are interested in data visualization!

What have you seen so far on the web?

Based on my researches in the topic the most used charts are the follows:

  • column chart
  • histogram
  • and line diagrams

Beside these most of the information just comes as text and numbers. I do not prefer this since there is not any attached data, which could explain the WHY and HOW.

Let see this example:

Media news: “Today, 345 people died in Italy”. - this is horrible and there are no words for such a loss like this. Is there a war in Italy? Actually yes, against an invisible enemy which does exactly the same damn thing what humans do, it wants to live.

Let me try to add some extra info for this sentence to make it more understandable. “Yesterday 345 people died from the 26407 cases in Italy, where the healthcare system is totally overloaded. Most of them had some serious health issues and age over 65.” I hope that you can feel the differences between the headlines. Not better, but much more understandable.

Any nice visualisation?

Some institutes and research companies developed global map based visualisation, which shows us how the virus could get about on Earth in time. You can find there the number of cases and countries too.

These maps are pretty nice ones. With them you can see how quickly a virus can travel around the world. I think this is the most appropriate map about coronavirus.

The Dataset by myself

For my own visualisation I had to make my own dataset from different sources all over the internet.

PLEASE KEEP IN MIND: THESE ARE THE REGISTERED CASES!

There must be many-many cases which are not registered, since the hosts have no idea that the virus is in his/her body.

I gathered the:

  • all cases, [1]
  • number of recovered cases, [1]
  • number of death, [1]
  • number of affected countries, [2]
  • iShare MSCI WORLD UCITS ETF (DIST) value ($), [3]
  • Argus US Jet Fuel price ($/gallon). [4]

All of the other columns are calculated from the above mentioned ones. I will present them later. If you are wondering what the two last columns are for, please do not mind so long. One of them gives a great overview about stock life nowadays, the other shows how the oil price/gallon in USD looks like during coronavirus.

Lets visualize!

What I wanted to see first is how it looks when only all cases and death cases are together. These numbers show us the minimum death rate of the virus (based on the registered cases). The chart below shows the number of all cases colored by the number of total deaths on each recorded day.

All cases colored by amount of death (UPDATED at 03.22.)

The more cases means more deaths, this is clear. However only the coloring won’t tell us how the rate changes in time, so for the next visualization I used “Death Rate - 2” (death cases / all cases).

Death rate - 2 in time (UPDATED at 03.22.)

(March-16) 4% - This rate is growing slowly since the full recovery tooks more time then dying, which let us assume that the “Death Rate - 1” (death cases / all closed cases) should show a regressive behaviour. Sad, but true.

Updated at March-22

4.3% - 5 days ago, I expected that the Death Rate - 2 will increase slowly. It seems that I was right. At this point I must mention that due to the Italien cases, where there are no free beds, equipment and nurses in the hospitals, doctors have to decide about lives day by day. This fact (beside the very high average age of the patients) can support my expectation at Death Rate -2, however will harm Death Rate - 1.

Death rate - 1 in time (UPDATED at 03.22.)

(March-16) 7% - My suggestion looks valid since in the meantime this rate is going down and sits around 7%. The two numbers must meet somewhere. I do not know where (5%? 6%?) and when, but time will tell us.

Updated at March-22

12.3% - almost a week ago it seemed to me that at the end of the pandemic, the total death rate can stop around 5-6%. However the situation in Italy pushes up the numbers. Well, how about 8%? (Keep in mind, we are talking about the registered cases.)

Affected number of countries in time (UPDATED at 03.22.)

(March-16) Right at the moment it is hard to say whether the number of affected countries affect the rates above or not. A couple of weeks is needed to say it. (This post will be updated.)

Updated at March-22

179 of 195 countries are infected.

I do not have to present, but this virus hit the stock market and oil industry as well.

Behind the line and column charts

Column and line diagrams are very useful and easy to understand for all but there are other ways to represent data.

How about a bubble plot chart? Cases of death and recovery won’t be less in the future. Only active cases can be 0 at the end of this crisis. In a bubble plot chart this will result in a growing line of bubbles. Each bubble is a date. The more the bubble goes right, the more recoveres there are. The more the bubble goes up, the more death there is. For better understanding, the best case when the bubble is close to the x axis and far from the y axis.

Recovered and death cases together (UPDATED at 03.22.)

In the upcoming weeks I hope that the movements in the right direction will be way bigger than in the up direction. (Update of this post is coming later.)

The next chart is a totally different one. I could hide information with coloring and sizing. Name of the week days sized by the sum of the new cases on the week day. Otherwise, from all cases the most of them were registered on sundays. The second one in this competition in thursday. Coloring is about to tell, which day is the deadliest. The more red the coloring is, the more deadly the week day is.

Sum of new cases by week days (UPDATED at 03.22.)

(March-16) The heat map is also a pretty spectacular chart. Here you can see how the iShare MSCI WORLD UCITS ETF (DIST) value ($) went down day by day, separated by affected countries. Will the dive stop when all countries on Earth have the virus? Good question!

Updated at March-22

According to my source [2], there are still a few country without any registered COVID-19 patients (16).

iShare MSCI WORLD UCITS ETF (DIST) value ($) in time separated by number of affected countries (UPDATED at 03.22.)

(March-16) More bubbles? Yes! Here you can see the percentage of daily cases from all cases on a single day. Please note that 40% of all cases have been registered in the last 6 days (jan 24 - mar 16).

Updated at March-22

The last 7 days (03.16 - 03.22) gives the 49% of all cases. This means that the spreading speed of the virus is + 50% / week. Is it possible that there will be 471.000 total cases on 03.29? Will there be 21120 deaths? You will get the answer from the post update.

Daily new cases percentage of all cases (UPDATED at 03.22.)

Summary

(March-16) What I could see from my charts is that somewhere in the future all countries may will be affected, and the death rate should stop between 4% and 7% in registered cases. The next weeks will show how the spreading will go on. Hopefully now this virus is in a peak period, but this is not 100% sure, since many countries just reported their first diseased patient.

Updated at March-22 - death rate can be between 4% and 7%, but it really depends on the load of the healthcare system. The peak period theory seems correct, however the absolute peak point is far.

Please follow the WHO instructions beside the law of your country!

Stay healthy!

Used sources:

Upload or connect your data source and analyze data in your spreadsheets

Try AnswerMiner free
Cookies help us in delivering our service. You consent to our cookies and you agree to our privacy policy and cookie policy if you continue to use our website. Learn more about our privacy policy here and cookie policy here.