COVID19 data analysis using Python

by Ling H


This project is to conduct COVID19 data analysis and research what the correlated factors be on reflecting COVID19 cases and deaths numbers among countries. First, the program reads in daily updated data from website and then aggregates the data into location(country) level and calculates growth rates and scale total/new cases and deaths numbers by country's population, and then it lists out the top growth rates and highest population scaled cases/deaths locations. Secondly, country level potential factors such as, GDP, median age, etc.. data is appended to investigate the correlations between those potential factors and population scaled cases numbers. Significant factors are listed and then input into a multiple linear regression model. In addition, this program also automatically generate graphs on plotting total/new cases/deaths for world and any selected (based on user input) country. It also output correlation coefficients/p-value chart on all potential factors. At the end of linear regression model, it also plot actual vs. predicted values on population scaled total cases.