I had previously discussed some of the considerations, implications and issues associated with the automation of data warehousing. That article addressed some common discussion points that I have witnessed on resistance from IT/BI professionals who refuse to concede that their expertise could be replaced by an automation tool. Of course, there are always extremes of opinion, application and circumstance so there is never any clear cut rule that can be applied in all circumstances (and for all clients). For myself, the question revolves around where the value adding activities are and manually developing ETL does not rank to high on my priority list. So, the general benefits I see in warehouse automation revolve around 3 areas;
1. Management and a consistent approach since we expect the tool to transform our business based model to a physical model (and manage everything in between).
2. The people (expertise) factor – An ability to remove and reorganise the typical layers of development. Since the traditional development stages are moved to an automated process, the reliance on specific roles of the business intelligence life-cycle is also removed. We may have one person (perhaps not an IT specialist) responsible for development and management.
3. Return on investment– The ability to deliver data faster to those that need it. Delivering a prototype data warehouse within a week offers tangible benefits and reduces the capacity constraint associated with development.
Now to the point of this article – BIReady.
BIReady is a data warehouse automation tool (developed in the Netherlands) that has recently been purchased by Attunity (NASDAQ: ATTU). Attunity needs no introduction and so I expect there are plans on the horizon for the product through additional investment and support. Prior to that, one of the most noticeable observations about the company was their inclusion of Bill Inmon on their advisory board – great kudos there!
The application is managed through a (complied) Access application. To be clear, this is just the front end that the user interacts with. Actually there are two parts to the application, the BIReady Manager (developers front end) and the BIReady Engine (Java application) that is accessed by the front end and basically does all the work.
Application data is stored in a relational connection (you can choose from various vendor databases: SQL Server, Oracle, Teradata, DB2, MySql) however, it is important to note that the front end does not store data in the Access engine. At first, I found the front end to be a bit ugly and dated – sorry Access. For someone that works in IT, the mere mention of Access tends to send shivers down your spine. Form design (in Access) is generally unattractive however, the operation of the application (menu design around distinct operational activities) mimics how one would expect to build a data warehouse. This is an important consideration because it permits fast track development for novices and the development cycle for BIReady.
Perhaps the most robust feature of BIReady is the approach they’ve adopted in the development cycle. Here, they adopt a CIF (Corporate Information Factory) approach that builds a true data warehouse in 3rd Normal Form (alternatively Data Vault) and then spin data marts from that for end user consumption. The data warehouse is designed around the data model (which can be reverse engineered or built from scratch) with the user identifying what attributes should be historically retained. Perhaps the best way to analyse its operation is therefore to look at the group of its core operating modules (a data model, a data warehouse, a data mart and its ETL (Process Instruction) Generator.
The data model defines entities and the relationships between them (screen shot shown below). Here, the entities in their attribute is can be seen along with the historical recognition of those attributes in the model (e.g. type 1 for not recording changes to the attribute or type 2 for recording the change in values). An interesting (and noteworthy) addition here is the block feature for each attribute. The block specifies how the the attributes a materialised in a physical table (with each block number defining a table). A single entity can therefore be broken up into model tables depending on how quickly we expect that data to change.
Should you wish to jump start development, the data model can be reverse engineered from source systems. Here, entities (you can choose which entities) and relationships in the source system can be reproduced in the BIReady model with only a few mouse clicks (regardless of the stage of development). There is (of course) a bone of contention as to whether this is a good practice since it places a heavy reliance on the design of the source (and the model not be sustainable on a system change) … However, those discussions are for another day :).
The data warehouse is the physical implementation the data model and can occur using either the Third Normal Form or Data Vault architecture. If you peek under the hood of the product it is interesting to note that the names of entities and attributes in the physical implementation are not materialised in the data warehouse.
For example the categories entity above is stored in two tables (given the two blocks as described above) with system generated names. We can (should we desire) trace this structure from the model to the physical implementation however, I think it shows the intent of BIReady to remove a user from that layer. The user is expected to access the data through a data mart.
A data mart modeller uses the data warehouse to build a star schema(s). The application can support the creation of multiple data marts from the same data model so, therefore it’s entirely possible (and arguably desirable) to siphon off functional schemas and deliver departmental reporting. The data mart editor (below) works against the data model (which was discussed above) to show the physical implementation of the data mart in a hierarchical manner (working from facts to dimensions). There is some flexibility in the designer by allowing the designer to combine entities (say for example including Category data in the products table) and selecting attributes that design. Naturally the user can choose what attributes (and measures) they wish to show in the dimension and fact.
Since the data mart is built using the data warehouse as its base, the designer can choose to show type 1 (or 2) dimension versions regardless of the structure that is chosen in the warehouse design. This of course relies on type 2 attributes being recorded in the warehouse (why would we choose anything different?). The key point is that if the data warehouse records type II changes, the designer has the ability to show a ‘current’ view of the data or an historic view with minimal effort on their behalf.
The ETL (Processing Instruction) Generator manages the extraction of data from source systems, population of the data warehouse and subsequent population in data marts (additionally the data mart can be a virtual data mart meaning it is built over views and no additional data processing is required). This generator can create multiple jobs (depending on how you want to configure it) which can be run through the applications interface or as a command line batch file (calling on the BIReady engine to execute).
Of the product, I expect most of the work has gone into this part of the engine and one of the benefits it brings is the ability to determine and apply parallel processing to its commands. That is, the engine determines what dependencies are required between objects and then can build a parallel activities (should such an opportunity exist).
The ability to manipulate processing instructions is where I have found the most power in the product (and sadly fun). Essentially the processing instructions automate all of the manual work that an ETL developer would do. It builds commands to extract source tables into a staging area (you can define the mapping) and then uses that area to populate the data warehouse. However the product allows you to create and execute scripts as part of that process. These occur against each entity in build stages (Pre Staging, Post Staging, Pre Data warehouse load, Post data warehouse load and so on) so alter the underlying data as you see fit. For example splitting names, salutation variation, pivot deconstruction etc. Perhaps not an option for a novice developer but an option nevertheless.
A Quick Note on Data Warehouse Automation
Having seen BIReady implemented in a production environment, I think there is certainly a need to identify that it as a data warehouse automation tool. This may sound obvious but there are some implications to that function which need to be identified (or at least brought to attention).
BIReady manages everything related to its data warehouse, it builds and maintains all its tables, ETL and logging and you do have the opportunity to influence that process. It even manages its own versioning.
It is not an ETL tool as one might associate with traditional ETL development. Sounds obvious right? However, the implication of this is that data movement tasks generated by the product relate specifically to the generation and manipulation of data within the data warehouse and not the components of (Extraction, Transform or Loading). For example, you cannot create a job to simply move data from one table to another (other than where it relates to processing required by the model).
Firstly, I’ve found BIReady to be extremely productive. A project could deliver a functional data warehouse and data marts in under a week – no problem. That’s a real bonus in my eyes.
BIReady is reliant on relational sources (they support 7 RDBMS) and source data must exist in one of those platforms in order to be consumed. This may be an issue for some, however, having seen many BI implementations where the sole purpose is to transform transactional system data into an informational data perspective (aka a data mart), I don’t really see a problem there. In fact, I can easily imagine how their approach in building a data warehouse first and then the subsequent data marts could save a lot of pain on any BI project (of course, you have to have to foresight to retain history).
There is no doubt in my mind that there is a niche market for BIReady. Is it one size fits all? Certainly not but then no product is – is it.