Real Time ETL with SSIS and WMI

Often when we design ETL packages we assume that there is a hard start point, or a specific part of the day when the ETL Process can begin. However, in a distributed environment, remote machines may exchange data with a host at random intervals and it may be this exchange that should dictate the start of an ETL process. For example we may want to start our ETL when files are copied to an FTP directory. This posts looks at how to monitor a directory location using WMI (Window Management Instrumentation) and start ETL for this type of scenario.

Overview

Remote systems are often disconnected from their host. That is, the remote site retains its own data independently of a continuous connection to the host. In order to sync data between the two sites, the remote systems extract master data and transfer files back to the host. These transfers occur when the remote site needs to operate independently of the host.

Data transferred back to the host may be triggered by some event at the remote site (for example closing the daily register at the remote). The implication of this is that data arrives at the host at different and unknown times. If we wish to provide a low latency ETL at the host, we need to load this data as soon as it becomes available on the host (that is, as soon as the files are created).

A Brute Force Approach

One method of managing this is to cycle our ETL between our expected load times. For example, we can schedule a job to run at 5 minute intervals while we expect the files to be deposited. This will achieve low latency but is restricted to a latency determined by the execution interval. For example, if we run the job every five minutes, we can expect a latency of at least 5+ minutes.

One consideration of using this approach lies in the execution history for the job. Because each job executes in its own right, the history for the job shows each execution (that is, one for every 5 minutes as below).

Reviewing the log for the job may be troublesome where we expect or desire to see a single job for each day.

A WMI Approach

Another approach is to execute the package and have it listen for changes in the target directory (for want of a better word), then the package commences the lifting component of the ETL after those changes have occurred (that is the files are created). In this situation, the package is running but effectively dormant until a file is created. Furthermore, we can get a much finer polling interval than would be achieved with the Brute Force approach. An example of this type of approach can be seen in the control flow below.

The key consideration for this package is that it executes and continuously and polls the source directory until the WMI query is satisfied (and files have been created). Once this occurs, the package can then iterate over the created files and conduct the expected ETL (which in this case is just moves the file and deletes it).

The WMI Task

The msdn reference for the WMI Task can be found here. For our purposes, we are really only concerned with the ActionAtEvent and WqlQuery components. A zero time out dictates that the task will not time out however we could use this to control how long we listen for and choose an appropriate time out action and event (log the event or log and fire the package).

The Connection Manger simply specifies the server and namespace;

With the query defined as;

SELECT * FROM __InstanceCreationEvent WITHIN 10

WHERE TargetInstance ISA “CIM_DirectoryContainsFile”

and TargetInstance.GroupComponent= “Win32_Directory.Name=\”E:\\\\Blog\\\\Process Folder\””

 

Also note that the escape characters (ie the ‘\’ preceding a special character) are required in the directory name. The WMI reference can be found here.

Query polling is defined by the WITHIN clause and indicates the number of seconds to wait before the query is run again (10 seconds in this case).

 

Advertisements

Visualizing Data with Microsoft Power View (Review)

This book is a must ready for anyone who is interested in using Power View or anyone who wants to make their analysis and reports interactive with Power View (well that was straight to the point, now lets get into some detail).

A current trend in the business intelligence landscape is a shift in focus from an IT user to the business user. That is, the person asking questions needs to interact with the data rather than specify report requirements to a technical user. Power View is Microsoft’s offering in this area and empowers the end user through an interactive environment for visualising data (whether it is creation, alteration or manipulation). In-fact, many organisations can take advantage of this product without cost. “Visualizing Data with Microsoft Power View” is a step by step guide which not only shows the end user how to use the Power View product but also get the most benefit out of their visualizations.

Written for the Business User

“Visualizing Data with Microsoft Power View” is well targeted at the business user and written in a context that does not alienate a non-technical user. The structure of the book builds on prior learning and reinforces sound business scenarios with practical examples. Every scenario has a “Learn by Doing” exercise which gives a practical, easy to follow example of how to apply the discussed content. The examples are applicable, easy to follow and relevant.

Included Data and Additional Media

In addition to the book content, there is a range of downloadable content (or installable if the book is purchased as a hardcopy) which includes videos (over 4 hours) and the data that is used in the examples. The videos reinforce the lessons, highlighting applicable areas of the screen and the commands used. The appendices go into the steps required to install the data in your own environment.

Book Structure

Discarding the appendixes, the book is broken down into two parts (the chapter outlines for the book are below). Part I deals with Power View and how to use it in an existing environment. If you were only interested in creating, viewing and using visualizations, Part I would be enough (say for example, managers and sales people). Part II extends the offering by diving into model creation (the data that Power View uses). This is more applicable to a savvy business user, power user (and so on). Notwithstanding the likely audience for Part II, it is very easy to follow with plenty of practical examples and exercises.

Chapter Outlines

The structure of the book is outlined as;

Part I – Power View
• Chapter 1 – Getting Started
• Chapter 2 – The Table Visualization
• Chapter 3 – Additional Visualizations
• Chapter 4 – Charting
• Chapter 5 – Bringing Your Data to Life
• Chapter 6 – Sharing Reports

Part II – Creating a BI Semantic Model (BISM)
• Chapter 7 – BISM: Getting Started
• Chapter 8 – Basic BI Semantic Model Design
• Chapter 9 – BI Semantic Model: Additional Model Features
• Chapter 10 – Using DAX Expressions
• Chapter 11 – Deploying Tabular BI Semantic Models

Part III – Appendixes
• Appendix A – Installing and Configuring the Sample Data
• Appendix B – Creating a Virtual Learning Environment

You can get “Visualizing Data with Microsoft Power View” at http://mhprofessional.com/product.php?isbn=0071780823