Wednesday, April 15, 2015

Creating a Simple Report with Power Query and Power View

In this tutorial, I will show you how to pull data into Excel using Power Query and then put together a simple report with Power View. I have found that you can get rather elaborate with these tools, but today I am focusing on something simple to start with. Since the month of April is Alcoholism Awareness Month, I want to put together a little report showing how many people we have lost in the US to alcohol related ailments. I will be using a database that contains mortality data provided by the World Health Organization. If you would like to download a copy of the database, I have made it available here. So, let's get right into it.

Inside Excel, in a blank workbook, click the Power Query tab in the ribbon. Power Query can read data from multiple sources including the web, Azure, IMDB2, and many others. I am going to use a SQL Server database. Over to the left side of the ribbon, select 'From Database' and select 'From SQL Server Database' in the drop down menu.


























This will open a window asking for a database connection. All that needs to be provided is the server name. I could also provide the database name and a SQL statement. Normally, I would just add the server name and the database name. Power Query is mainly used for exploring and transforming data. However, the data I will be using is a lot like a random data dump and I have run into some viewing limitations in Power Query that makes filtering a little problematic in this example. So, I will provide a SQL query that is going to return a small portion of what I am looking for.








Remember: Normally, all I would add is the server and database names.








When I click OK, it loads the results into a query editor. Here, I can filter values, move columns around, and perform some data transformations.














I don't have to do much to this data, since I used a query that did most of the work, but I do need to filter some things out. First, I will filter out all the other countries. When I click the drop down arrow in the Country column, it opens a menu for sorting and filtering.







Notice towards the bottom of the menu the warning about an incomplete list. Only a small sample is shown. When I click the option 'Load More', then I can see the full list of countries. Keep in mind that this window has a row limit of 1000.












For this example, I will deselect the 'Select All' option and choose United States of America at the bottom of the list. When I click OK, the data is refreshed.




















Next, I will do the same with the year column. I only want to look at years 1999 through 2010, which is the latest year that I have. Then, I will filter the Cause column. The query I used contains only causes of death that involve alcohol. If I had only pointed to the database and selected the tables I needed, then this list of columns would not be complete. The database I am working with has thousands of causes of death, and would leave out most of the columns I want. As you can see in he screenshot below, WHO has a tendency to get incredibly detailed on their causes of death.

























For this example, I only want Alcohol Abuse, Alcoholic cirrhosis of the liver, and Alcoholic Induced Chronic Pancreatitis. After choosing those items, I click the 'Close and Load' option in the upper left corner and the data is loaded into Excel.




To recap what I have so far, I am looking at the years 1999 through 2010 for the US, deaths across different age groups and all together, also divided by male and female, and this is covering three different types of alcohol related causes of death.

In the ribbon, select the Insert tab and click 'Power View'. The result is a small, default table with a few fields. In my case, it's summing everything, even the values of the Year column.



The Design tab in the ribbon should be opened. If it isn't, click on table and that should do it. The first thing I want to do is make sure the year isn't being summed. To the far right, there is the Power View Fields pane. Towards the bottom right, I click the drop down arrow next to the Year fields and select 'Do not summarize'.



I can visualize this data in a number of ways at this point. In the query I used, I returned the numbers of deaths for multiple age groups. In this Power View sheet, I will only look at the total number of deaths. So, in the Power View Fields pane to the right, at the top, I will uncheck all the age groups and make sure I am looking at DeathsAllAges. I could also look at deaths by gender, but for now, I will uncheck the gender field as well.



Now, I just have to choose a style of chart. You may find that the chart you choose is normally influenced by the kind of data you are working with. When working in Power View, it's a good idea to try out a few different types of charts and tweak them to your liking. For this example, I will choose a clustered bar chart. The chart is displayed and I can see the data, but not exactly the way I want it. I do have to move a few things around in the Power View Fields pane. See the below screen shot for the final arrangement.




I end up with a simple bar chart showing the numbers of the cause of death for each year from 1999 to 2010. I can scroll through to see how the numbers increase or decrease.




What does this data tell us? Well, it appears that alcoholic cirrhosis of the liver is the main cause of death when looking at deaths related to alcohol. Keep in mind, this isn't looking at traffic accidents, but alcoholism and alcohol addiction. As I scroll through the chart, I can see how the rate of cirrhosis of the liver maintains a constant increase as pancreatitis has decreased, and general alcohol abuse slightly increases. What would be even more interesting is if I could get data for 2011 through 2014 and see how alcoholism was affected by the recession. Also, keep in mind that there may be discrepancies in this database as I have found WHO doesn't seem to label things perfectly. There are still a little over 400,000 rows missing from the database because of causes of death that don't have label codes yet under the new ICD10 standards.

0 comments:

Post a Comment