Covid 19 Dashboard - Article 1 - Starting the process - DATA (The growing CSV file)


As I mentioned in my earlier post on my COVID 19 Dashboard, I am going to write a few different blog articles describing not only how I built it, but some things I learned in the process.

The current list of planned topics include:
  • ·         Working with growing CSV files
  • ·         Using index and merge when transforming data
  • ·         Using time functions to calculate the daily change
  • ·         Reporting based on the population
  • ·         Working with maps, setting data types correctly
  • ·         Possibly more to come depending on how long this all goes….

Let’s get started at the beginning.  To build any Power BI report, you must have data, so my adventure began on the web.  I searched various sites for data, and there is plenty of it out there.  However, I wanted to find something that would continue to be updated and had not only international data, but also USA State data.  While looking at www.covidtracker.com, I found all their data was available in GitHub.  I went here to see what was available.  I found they had time series data available for both global and US cases and deaths.  This looked very promising as all the data was in CSV files and it was updating automatically.  Finally, I could use the RAW version of the data with the corresponding URL with a Web data connection in Power BI.

I am sure many of you have brought in CSV data before, so I won’t go through all the steps.  After the data was available in Power Query, I was able to transform the data by using the first row as headers, but I quickly noticed instead of having a separate row for each day, the file had a separate column for each day.  For example, the data had over 80 columns when I made my connection.  This did not concern me, because I am familiar with “Unpivot Columns”.  This will allow me to move use date column to a row.  If you are not familiar with this feature, Microsoft has some great support here.




I went ahead and used “Unpivot Columns”, did some other basic transformations, and applied my changes, so I could start on the report.  I thought everything was good, until the next day, when I tried to refresh the data.  All my connections where good, but no matter what I did the next day did not appear on any of my reports.  Remember, I mentioned the data has a separate column for each day, so every night the source data gets a new column for the latest day of statistics.  I decided to look at my query in Power Query Advanced Editor.


That is when I noticed the number of Columns were hard coded in my query.  As in the screenshot, Columns=84.  After doing some research, I discovered that this parameter was not necessary.  If it is removed from the query, Power Query will build the columns based on the number of columns in the first row of the file.  Since my file has the header in the first row, I was comfortable in knowing all my columns would be identified.  I simply removed “Columns=84,” and my data now refreshes correctly.

I do not work with a lot of CSV files, and they do not typically grow on a regular basis.  However, if you ever must deal with a growing CSV file, make sure to check for the number of columns being hard coded in your query.

In my next article, I will discuss some of the other transformations I did with this data.

Comments

Popular posts from this blog

Creating a date table with a fiscal year in your Power BI data model

Showing the last 12 months based on a date slicer

Using Excel, OneDrive and Power Automate for Dynamics 365 Finance & Supply Chains Journals