Bootcamp Week 2: Health Data Analysis

Bootcamp Week 2: Health Data Analysis

Gathering and analyzing health data is an important endeavor that helps health organizations and officials understand the impact of disease outbreaks and pandemics, and also plan and project for the future to detect and possibly prevent a reoccurrence. Week two of the Sidehustle Bootcamp had us working on health insight analysis and Team Six opted to work with datasets on Malaria and Covid-19 (USA). Below, is a breakdown of the steps taken in analyzing both datasets.

Data Collection

Both datasets were gotten from the Kaggle database. Sourcing for the right datasets was anything but easy as most datasets did not contain the requirements but we persevered.

Cleaning steps on COVID-19 Data

The following steps were taken to clean and prepare the COVID-19 datasets:

  1. We deleted all "United States" data in the State column as it was not required.

United States is a country and not a state.

  1. We deleted all "All" data in the Sex column, only analyzing data belonging to both male and female sexes.

  2. We deleted all "All Ages" data in the Age group column. We only analyzed data selected in the different age groups.

INSIGHTS AND OBSERVATIONS

  1. Age group 85 years and above recorded the highest number of deaths (47683) for covid-19 while the age group 15-24 years recorded the lowest number of death (89).

  2. More male than female deaths were recorded. 53.85% male deaths against 46.15% deaths were recorded.

  3. New York City accounted for 13.86% of the Sum of COVID-19 Deaths. Across all 53 States, the Sum of COVID-19 Deaths ranged from 0 to 20,559. followed closely by New Jersey with 13,988 deaths.

  4. New York recorded the highest number of Pneumonia, Influenza & covid deaths with 23,501 deaths followed closely by California with 16,795.

  5. New York recorded the highest number of Pneumonia & covid deaths with 7905 deaths followed closely by New Jersey at 6,775.

Malaria Dataset Cleaning

All blanks columns were deleted along with all unrequired data

We replaced all null entries with 0

After cleaning on excel, we uploaded the data to Power BI for analysis and visualization and came up with the following insights.

The year 1992 recorded the highest number of deaths from malaria

Africa is the continent with the most malaria-related deaths in the world

Children under the age of 5 had a higher mortality rate than children between ages 5 and 14

Adults aged 50 – 69 had a higher mortality rate than those aged 70 and above

Of the top 10 countries with malaria prevalence by standardized age, Burkina Faso recorded the highest while Liberia had the lowest prevalence.