Gathering and analyzing health data is an important endeavor that helps health organizations and officials understand the impact of disease outbreaks and pandemics, and also plan and project for the future to detect and possibly prevent a reoccurrence. Week two of the Sidehustle Bootcamp had us working on health insight analysis and Team Six opted to work with datasets on Malaria and Covid-19 (USA). Below, is a breakdown of the steps taken in analyzing both datasets.
Data Collection
Both datasets were gotten from the Kaggle database. Sourcing for the right datasets was anything but easy as most datasets did not contain the requirements but we persevered.
Cleaning steps on COVID-19 Data
The following steps were taken to clean and prepare the COVID-19 datasets:
- We deleted all "United States" data in the State column as it was not required.
United States is a country and not a state.
We deleted all "All" data in the Sex column, only analyzing data belonging to both male and female sexes.
We deleted all "All Ages" data in the Age group column. We only analyzed data selected in the different age groups.
INSIGHTS AND OBSERVATIONS
Age group 85 years and above recorded the highest number of deaths (47683) for covid-19 while the age group 15-24 years recorded the lowest number of death (89).
More male than female deaths were recorded. 53.85% male deaths against 46.15% deaths were recorded.
New York City accounted for 13.86% of the Sum of COVID-19 Deaths. Across all 53 States, the Sum of COVID-19 Deaths ranged from 0 to 20,559. followed closely by New Jersey with 13,988 deaths.
New York recorded the highest number of Pneumonia, Influenza & covid deaths with 23,501 deaths followed closely by California with 16,795.
New York recorded the highest number of Pneumonia & covid deaths with 7905 deaths followed closely by New Jersey at 6,775.
Malaria Dataset Cleaning
All blanks columns were deleted along with all unrequired data
We replaced all null entries with 0
After cleaning on excel, we uploaded the data to Power BI for analysis and visualization and came up with the following insights.
The year 1992 recorded the highest number of deaths from malaria
Africa is the continent with the most malaria-related deaths in the world
Children under the age of 5 had a higher mortality rate than children between ages 5 and 14
Adults aged 50 – 69 had a higher mortality rate than those aged 70 and above
Of the top 10 countries with malaria prevalence by standardized age, Burkina Faso recorded the highest while Liberia had the lowest prevalence.