Wednesday, December 20, 2017

Run a Query in Excel and Remove Column Headers

For the past few weeks, I have been manually pulling data for our HR department to upload to a state run reporting site for nursing certifications. Basically, I run a query, copy the data from SSMS, and then paste it into an Excel template for them. Then, the HR rep uploads the data. Yesterday, I figured it was probably a better idea to embed a query in the Excel file and let them refresh it when needed.

The following is a step by step walk through of how I did it. For this example, I am using a query to return generic data from AdventureWorks since the data I was originally using contains clinical licensing information and doesn't need to be shown here.

Note: Also be aware that you will need to make sure the users of the file have permissions to query the database.

First, there is the Excel file. The one I was working with was a template for a data upload, so the first 7 rows were all used for column headers specifically to give instructions on using the template. Imagine, if you will, the most colorful, but otherwise boring, column headers you have ever seen. Actually, they look like this…

The first thing we need here is a data connection. In the toolbar, click on the Data tab and select ‘From Other Sources’. In the dropdown menu that opens, select ‘From SQL Server’.

Enter the server name, select your log on preferences, and click ‘Next’.

Select the database you would like to pull from. Below that, select the table. I selected only one of the tables I needed, since I used a query later on in the setup process anyway. Click ‘Next’.

On the last screen, there is the option to assign a friendly name and a description to the connection file. Just click ‘Finish’. This brings up the data import options. Here, we have to select the cell in with the return results will begin. As you can see in the below screenshot, I select cell B7. 

Next, click on the Properties button. In the Connection Properties window, select the Definition tab. This is where we can define what exactly we are looking for. Here, I have switched the ‘Command type’ to SQL. In the ‘Command text’ field, enter the query you would like to run. As mentioned before, in this example, I am just using query to pull in generic employee data from the AdventureWorks data warehouse. 

The query I am using is as follows…

Select e.Employeekey
From DimEmployee e
LEFT JOIN FactResellerSales r
                ON e.EmployeeKey = r.EmployeeKey
Where r.OrderDate between '2008-04-01' and '2008-05-01'

Click OK and you will receive a message about the original connection being changed. Just click Yes. Then click OK on the Import Data window.
The spreadsheet will be populated with the results of the query. However, as seen in the screen shot below, it includes the column headers.

Because of how this template is used, I do not want the column headers. This brings us to the reason I am posting this blog. I went insane trying to find a way to get rid of them.

First, make sure you have selected a cell in the table. It doesn't matter what you select, just click on something. This will make the ‘Design’ tab available in the tool bar.

There, take note of the check box next to ‘Header Row’. All you have to do is clear that box. That will clear the header row.

Now, select those empty cells, right click and select delete. Then, choose the option to shift the cells up. Now you can refresh the data. 

From this point on, any time the file is opened and the data is refreshed, the column headers will not be displayed.

Monday, September 19, 2016

SSRS: Change Text Color for Better Readability

When using a tool that is meant to visualize data, it helps to make the visualization pleasing to look at. Typically, this kind of discussion would spiral out of control into an abyss of SSRS techniques and theories about visualization, but I just want to share a quick tip when working with tables containing a lot of numbers.

I have been working on a dashboard displaying the amount of helpdesk tickets that have been opened and processed. Part of this dashboard contains a table showing the amount of tickets that have been opened for different amounts of days; basically, divided into buckets if you will. Take a look at the screenshot below. Just glance at it and see what you can learn in just a second or two.

It’s not terrible, but you’re staring at a whole lot of black. Don’t get me wrong, I have no problem with that (my entire wardrobe is black), but since there are so many 0’s in the table, the user would have to take some time to make sure they are really reading the table to see the numbers they need. They could easily overlook something.

What I prefer to do, since there are so many 0’s throughout the table, is change the color to give all the 0’s a faded effect. Take a look at the next screenshot.

That is much easier to look at. The fields that have numbers can easily be recognized at a glance. To do this, on the design tab, select a text box and then view its properties in the properties pane of SSDT. Under the font section, click the dropdown next to the ‘Color’ property. At the bottom of the color palette menu is a link that reads ‘Expression…’. Click that and it opens the expression editor. The expression that we use here is as follows…

=iif (SUM(Fields!Day0_2.Value) > 0, "Black", "LightGrey" )

This is a simple ‘If, Then, Else’ expression. All it is stating is that if the value of the field in question is greater than 0, then the font will be black. If it is not greater than 0, then it will be light grey. Notice that I am actually using the sum of the values in that field. This is because I have row groups in the table that I can expand and collapse. If I were to remove the SUM part of the expression, then the colors would not turn out quite right.

Repeat this for each text box in the table. 

Wednesday, January 20, 2016

Using REST in SSIS

Today, I'm going to show you an easy way to use REST API calls to bring data into SSIS. Since I am doing this the easy way, I'm using Task Factory. That's right, Task Factory has a new REST Source that enables users to retrieve data from websites using the REST API. It's a rather powerful source component that can be used in a number of ways. This blog post is not really covering all it's capabilities however. I want to focus on general usage and how to split values into separate columns when retrieving data in JSON format.

For this example, I am going to use the Citrix GoTo Webinar API to retrieve data about past webinars. Since I can pull from any site that uses REST, I make it a point to use the GoTo Webinar API reference.

First of all, when using the REST Source, you have to create a connection manager. You will find that there are two different types. There is a generic REST connection and a REST OAuth2 connection.

REST Connection

OAuth2 Connection

In this Example, I am using the OAuth2 connection. You will notice in the screenshot below that it requires an API key and an access token. Both of these are provided by the website that you are connecting to. Not to go too far into details there, you have to register on the developer version of the website.

Now, inside the the editor, you need an API Endpoint URL. In the case of GoTo Webinar, Citrix provides the base URL, and then the different parts of the URL for the the individual operations. What I really want to focus on here are the output columns. As you can see in the shot below, each column that needs to be returned will have to be manually added by using the 'Add New Output Column' button at the bottom of the window.

The column name can be whatever you like, of course, but you must also specify the Token Path. This information is provided on the Citrix website referenced above for the API. You can see that all I am doing here is returning information about past webinars. What I want to focus on here is the StartTime column.

On the left, when I click on Preview/Test API I have the ability to preview the data. Notice in the results grid the contents of the StartTime column. This column contains both start and end times of each webinar. Not something I want.

I need to get just the start time. Back to the API Endpoint tab. I have requested the results be returned in JSON. This will determine how I pull back just the start time from the times token path.

On the StartTime column, I need to change the Token Path. in order to return only the start time value, use the following token path.


It should look something like this.

Now when I preview the results, I get just the start time in my StartTime column.

If needed, I could also add a column for the end time and use the same JSON expression to retrieve the endTime value.

As you can see, it is a very simple way to access data through REST API. Check out the free trial of Task Factory to try it out for yourselves.

Thursday, June 11, 2015

Looping Through a Result Set with the ForEach Loop

In today's blog post, I am going to show you another use for the ForEach Loop container in SSIS. Most of the time, this container is used for looping through a group of files, however, there are many different uses for it.

Let's say you have a need to take some rows from a source and aggregate some values. Normally, something like that is done in the data flow, but it can also be done in the control flow.

First, create 5 variables. Pay close attention to the data types.

In the control flow of an SSIS package, add an Execute SQL Task and open the editor. Set the 'Result Set' property to Full result set. Create an OLE DB connection to the AdventureWorks2012 database. Add the following T-SQL  statement.

SELECT Name, SafetyStockLevel, ListPrice
From Production.Product
Where ListPrice < 20
AND ListPrice > 0

This query just pulls a small set of data from the Product table.

Click the 'Result Set' option to the left. Set the Result Name to 0 and for the Variable Name, select objProductList. This variable will hold the results returned by the query. Click OK to close the editor.

Next, add a ForEach Loop container and connect the Execute SQL  task to it. Open the editor and click Collection. In the enumerator drop down menu, select Foreach ADO Enumerator. In the ADO Object Source Variable drop down menu, select objProductList and for the Enumeration Mode, select Rows in the first table.

Open Variable Mappings. In the drop down menu for the first variable, select intStock, then for the next one, select intListPrice. Click OK to close the editor.

Add an expression task inside the ForEach Loop. In the expression editor, use the following expression...

@[User::intTotalValue] = @[User::intStock]* @[User::intListPrice] +  @[User::intTotalValue]

This expression will multiply the number of stock with the list price value of each item and then add that to the value of the intTotalValue variable. Click OK to close the editor.

To test this out, right click the ForEach Loop and select 'Edit Breakpoints'. Select the last option; Break at the beginning of every iteration of the loop.

Now, execute the package. The package will stop and a watch window should open at the bottom of the screen. If it doesn't, click Debug > Windows > Watch 1. In the watch window, add the names of the variables you want to monitor. Keep in mind that the variable names are case sensitive here.

Each time the execution is continued, the variable values will change and the value of intTotalValue will increase. The interesting part about this is that you won't see the final value of intTotalValue. I actually drove myself insane trying to find out what was wrong with the expression, but there is nothing wrong at all. If there is a need to check the final value of the variable, a script task can be use to display the value.

In the Script task editor, select intTotalValue as the ReadOnlyVariable. Then, click Edit Script.

In the script editor, use the following script.


Close the script editor and then click OK to close the script task editor. Remove the breakpoint from the ForEach loop and execute the package. You should see a message box that displays the final value of intTotalValue.

And there you have it. Again, this is something that is normally done in the data flow, but in case you ever encounter a need to do it at the control flow level, you know that it can be done.

Thursday, June 4, 2015

Scheduling Doc xPress with Windows Task Scheduler

A few weeks ago, I posted a tutorial on scheduling a Doc xPress snapshot with a SQL agent job. Today, I will show how this is done with the Windows task scheduler.

For the most part, the steps are similar. The first thing to do is to create that XML request file. From inside a Doc xPress solution, click the 'button that reads 'Create Snapshot Request File'.

This will prompt you for a name and location to which to save the request file. The snapshot request file contains all the information about your solution. I have renamed mine and saved it to a location on my C: drive.

Now, I can create and schedule a task. Open Windows Task Scheduler and select the option to create a new task. On the general tab, give the task a name. Add a desired description and choose the appropriate security options. I have set mine to run whether or not I am logged in, with the highest privileges and configured for my current OS.

Next, click on the 'Triggers' tab. Here, the task can be put on a schedule. The frequency of this task really depends on the environment and how often it changes. I would set it up to run weekly, just to start.

Then, open the 'Actions' tab and click New. Make sure that the selected option in the Action drop down menu is 'Start a program'. In the 'Program/script' field, enter the file path of the Doc xPress executable, surrounded in double quotes, followed by the full path and file name of the snapshot request file, also surrounded by double quotes. The command should look like this...

"C:\Program Files (x86)\Pragmatic Works\Pragmatic Workbench\DocxPress.exe" "C:\Snapshots\LocalEnvironment.xml"

So the Action window should look like this...

Click OK to close the action window. Click OK again to close out the Task Editor. That's all there is to it. As you can see, similar to using the SQL agent.

Wednesday, May 27, 2015

Checksum vs Hashbytes

Have you ever had to write an incredibly long UPDATE statement based on the values of multiple columns? Depending on the amount of columns you're working with, it can be maddening. Instead of wasting so much of your time on typing out all those column names, you can use either a Checksum, or Hashbyte value. Their are pros and cons to each, and I will discuss and demonstrate here.


The Checksum function in SQL Server will return a numeric value based on the values of fields in a row. You can use the checksum value as a means of comparison when detecting changes in a row. Let's say, for example, you have 20 different columns that could change values. If a value changes, then the Checksum value for that row will change as well. Generating the Checksum is simple. In the example below, I am returning rows from the AdventureWorks2012 database. The Production.Product table to be exact. I am only returning a few columns. The query is as follows.

Select ProductID,
Checksum(ProductID, ProductNumber, Name, Color, ListPrice, ProductLine) AS ChecksumValue
From Production.Product

Notice I am using the Checksum function with all the columns that would like to include in the checksum enclosed in parenthesis. I don't have to use all the columns if I don't want to. Here are the results of the query.

With the checksum, you can compare the value of just one field when looking for changes. If any of the values of this row were to change, then so too would the Checksum value.

Checksum is nice in that it can be used with any data type and doesn't care about NULL's. There is a downside, however. When working with small data sets, the checksum will more than likely always be a unique value. As the data sets grow, there is a chance that the values won't always be unique.


The Hashbytes function returns a hash value of the specified column values. When using the Hashbytes function, you can specify a hash type. You can choose MD2, MD4, MD5, SHA, SHA1, or SHA2. If I take the above example code and use Hashbytes instead of Checksum, the query would be written as follows.

Select ProductID,
HASHBYTES('MD5', Name + color + ProductLine) as HashValue
From Production.Product

You should notice I am only using a few of the columns. I will explain that, but lets look at the results first.

The hash value is much longer and more complex than the Checksum value. There is a much smaller chance of having duplicate values with the hash. However, there is an issue. You can see in the image above that we have some NULL values. With Hashbytes, if there is a NULL in one or more columns, then the hash will be NULL as well. To avoid this, we would need to replace the NULLs.

Select ProductID,
       HASHBYTES('MD5', Name + ISNULL(color, 'Unknown') + ISNULL(ProductLine, 'Unknown')) as HashValue  
From Production.Product

Also, notice that the columns being included in the hash are not separated by commas, but concatenated. Here are the new results.

As I mentioned before, I am only using certain columns, not all of them. The reason for this is that hashbytes will only work with varchar, nvarchar, or varbinary data types. If I want to use the other columns in this query as part of the hash, I would have to cast them.

In the end, Hashbytes requires a little more code, but it does have an advantage over Checksum.

Friday, May 22, 2015

Site Updates

I have started up a YouTube channel to which I will be posting video tutorials to go along with some of my blog posts. You can view the channel here. I also added a new page in which the videos will be embedded. So enjoy the material here, on YouTube, wherever. Hopefully, I will be able to fill it with useful content on a regular basis. Feel free to leave me some suggestions in the comments section.