I will continue to build on my earlier posts about my COVID19 Dashboard. In this series, I am writing a few different blog articles describing not only how I built it, but some things I learned in the process.
The current list of planned topics include:
PART 4 - Reporting based on population
In looking at the data, I quickly realized simply looking at
total numbers or even daily changes was only giving me a limited view of what
was happening. Were the large number of cases in New York
simply because they have more people than most other states, or were they
growing at a different rate. Are New
York citizens at a higher risk of acquiring the virus than California
citizens. It became evident that I
needed to look at the data based on the population.
Step 1 – Make sure I had the population data
I needed to make sure I had the population data at the
levels I needed it. The data I began
with had the population detail for each US County, but I did not have Global
population numbers. I did some quick
research to find the latest country population numbers at https://www.worldometers.info/world-population/population-by-country/. Since this data does not change very often, I
did not link Power BI to the source data.
Instead, I simply used Copy and Paste to create a new table in my data
model.
I did run into a few issues when I brought in this information. Since I only had Country Name and Population,
I was going to build the relationship to my country table on country name. However, some of the countries were spelled
differently or they were showing as a province of another country. I used the transformation tools of Power Query,
to clean the data up as much as possible.
Once the data was clean, I brought may population table into my data
model and related it to my country table.
Step 2 – Create the need measures
This was actually a pretty straight forward step. I identified that I needed four different
measures related to population. I was interested
in looking at both total cases and total deaths related to the population, and
I wanted to view it two different ways.
- As a % of the population
- Per 100k people
The four measures I created were simply division
- Total Cases /
Population Total
Case % of Population = DIVIDE([Total Cases],sum('US State
Province'[Population]),0)
- Total Deaths /
Population Total
Death % of Population = DIVIDE([Total Deaths],sum('US State
Province'[Population]),0)
- (Total Cases /
Population) * 100,000 Total Cases per 100k = DIVIDE([Total Cases],sum('US State
Province'[Population]),0)*100000
- (Total Deaths
/ Population) * 100,000 Total Deaths per 100k = DIVIDE([Total Deaths],sum('US State
Province'[Population]),0)*100000
Step 3 – Create my desired visualizations
A simple matrix with conditional formatting on the measures
A scatter chart looking at the relationship between cases and deaths
With these measure and visualizations, I was able to quickly confirm that most states have very similar numbers. However, there are a few states (New York, New Jersey and Louisiana) that appear to be outliers.
Step 4 – Compare total cases with total cases per 100k
In looking at what I had discovered so far, I decided it
would be good to compare the total case numbers with total cases per 100k population. To do this I decided to rank the states based
on total number of cases and total number of cases per 100k. This required using the RANKX function.
- Rank total cases = Rankx(ALL('US State Province'[State]),[Total
Cases])
- Rank total cases per 100k = Rankx(ALL('US State
Province'[State]),[Total Cases per 100k])
Since I am using RANK in the table, I need to make sure it
ranks each state against all of the other states, so instead of simply entering
the table, I included ALL(‘table’[rank column])
This quickly showed me that not only is New York #1 in total
cases, the are #1 in total cases per 100k.
However, California is #6 in total cases, but #32 in cases per 100k
This certainly puts a different perspective on the numbers.
In my next article, I will look at some of the mapping
abilities.
Comments
Post a Comment