As I mentioned in my earlier post on my COVID 19 Dashboard, I
am going to write a few different blog articles describing not only how I built
it, but some things I learned in the process.
The current list of planned topics include:
- ·
Working with growing CSV files
- ·
Using index and merge when transforming data
- ·
Using time functions to calculate the daily
change
- ·
Reporting based on the population
- ·
Working with maps, setting data types correctly
- ·
Possibly more to come depending on how long this
all goes….
Let’s get started at the beginning. To build any Power BI report, you must have
data, so my adventure began on the web.
I searched various sites for data, and there is plenty of it out there. However, I wanted to find something that
would continue to be updated and had not only international data, but also USA State
data. While looking at www.covidtracker.com, I found all their
data was available in GitHub. I went here
to see what was available. I found they
had time series data available for both global and US cases and deaths. This looked very promising as all the data was
in CSV files and it was updating automatically.
Finally, I could use the RAW version of the data with the corresponding
URL with a Web data connection in Power BI.
I am sure many of you have brought in CSV data before, so I
won’t go through all the steps. After
the data was available in Power Query, I was able to transform the data by
using the first row as headers, but I quickly noticed instead of having a separate
row for each day, the file had a separate column for each day. For example, the data had over 80 columns
when I made my connection. This did not
concern me, because I am familiar with “Unpivot Columns”. This will allow me to move use date column to
a row. If you are not familiar with this
feature, Microsoft has some great support here.
I went ahead and used “Unpivot Columns”, did some other
basic transformations, and applied my changes, so I could start on the
report. I thought everything was good,
until the next day, when I tried to refresh the data. All my connections where good, but no matter
what I did the next day did not appear on any of my reports. Remember, I mentioned the data has a separate
column for each day, so every night the source data gets a new column for the
latest day of statistics. I decided to
look at my query in Power Query Advanced Editor.
That is when I noticed the number of Columns were hard coded
in my query. As in the screenshot,
Columns=84. After doing some research, I
discovered that this parameter was not necessary. If it is removed from the query, Power Query
will build the columns based on the number of columns in the first row of the
file. Since my file has the header in
the first row, I was comfortable in knowing all my columns would be
identified. I simply removed “Columns=84,”
and my data now refreshes correctly.
I do not work with a lot of CSV files, and they do not
typically grow on a regular basis.
However, if you ever must deal with a growing CSV file, make sure to
check for the number of columns being hard coded in your query.
In my next article, I will discuss some of the other transformations
I did with this data.
Comments
Post a Comment