Biml XXI – A Framework for Dimension Processing


This post looks at building an SSIS package for dimension processing – we assume that you want fine control over the processing of your SSAS cube and so use different processing activities for dimensions, facts and database (more on facts and databases later).    The reason why you would want to use a different processing option of each object relates to the processing window for you database.  If you have a large amount of fact data, it may not be possible to process the entire database within the allocated time and so we move to an approach where you process the dimensions first as an update and then the measure groups (facts) that have changed data.  Dimensions have to be processed using an update so the existing fact data is not cleared.

At the end of the post, we’ll have 2 snippets.  The first puts all dimensions into a single processing task.  Through that, you can specify the degree of parallelisation (it is part of the SSAS processing control) and secondly, our package will put individual processing tasks into a sequence container (with a parallel constraint mode).  The idea of the post though, is just to specify the SSAS server and the database and then have biml generate the package for you.

Database Dimensions

In a previous post, I looked at extracting dimension data from a database.  Its important to remember that (in a well designed system), dimensions are shared across cubes (the dimension is a major object for the database) and therefore the start of our code is identifying dimensions in the database.  In order to do that, I have a function  that returns a dictionary of dimension unique names and display names).  The code requires a reference to the AMO class (in my system this was found at )

C:\Program Files (x86)\Microsoft SQL Server\110\SDK\Assemblies\Microsoft.AnalysisServices.DLL .


using System.Collections.Generic;
using Microsoft.AnalysisServices; 

 

static Dictionary<string, string> DimensionKeys(string Server, string Database)
{
     Dictionary<string, string> _dimensions = new Dictionary<string, string>();
     Server _ssas_server = new Server();
     _ssas_server.Connect(Server); 

     Database _db = _ssas_server.Databases[Database];
     foreach (Dimension _dim in _db.Dimensions)
         _dimensions.Add(_dim.ID, _dim.Name); 

     _ssas_server.Disconnect();
     _ssas_server.Dispose(); 

     return _dimensions;
} 

The Analysis Services Processing Task

The SSIS control flow includes a SSAS Processing task.  In biml we can implement the with a simple example that processes the Account dimension of Adventure Works (as in the following).

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Connections>
    <AnalysisServicesConnection Name="olap" ConnectionString="Data Source=.\SQL2008R2;PROVIDER=MSOLAP;Impersonation Level=Impersonate;" Server=".\SQL2008R2" />
  </Connections>
  <Packages>
    <Package Name="ProcessSass" ConstraintMode="Linear">
         <Tasks>
            <AnalysisServicesProcessing Name="Process_Account" ConnectionName="olap">
                  <ProcessingConfigurations>
                    <DimensionProcessingConfiguration DatabaseId="Adventure Works DW 2008R2" ProcessingOption="ProcessUpdate"  DimensionId="Dim Account" />
                  </ProcessingConfigurations>
            </AnalysisServicesProcessing>
        </Tasks>
    </Package>
  </Packages>
</Biml> 

The important parts of the processing task is the processing option and the dimension id (not its name).  Of course, the dimension id may be different from its name (as the following screen image shows … remember that this is Adventure Works).

image

Naturally, if we want to process other dimensions as part of this task, we would include more DimensionProcessingConfiguration tags within the ProcessingConfiguration node.

Populating Configuration Tags.

The first snippet (I’ve included the full snippet for ease of use) populates the configuration tags (as shown below)

image

I think the key takeaways for the code are the inclusion of the AnalysisServices assembly (we have not looked at this before) and the use of server, database and processing types as ‘control’ strings at the start of the code (meaning they will only need to be set once in the code).  The iteration over each dimension to add a processing configuration (dimension processing command) should not be unfamiliar if you have been following this series.


<#@ assembly name="C:\Program Files (x86)\Microsoft SQL Server\110\SDK\Assemblies\Microsoft.AnalysisServices.dll" #> 

<#@ import namespace="System.Collections.Generic" #>
<#@ import namespace="Microsoft.AnalysisServices" #> 

<#@ template language="C#" #> 

<#
    String _ssas_server = @".\SQL2008R2";
    String _ssas_database = "Adventure Works DW 2008R2";
    String _process_type = "ProcessUpdate";
#> 

<#+ 
Dictionary<string, string> DimensionKeys(string Server, string Database)
{
     Dictionary<string, string> _dimensions = new Dictionary<string, string>();
     Server _ssas_server = new Server();
     _ssas_server.Connect(Server); 

     Database _db = _ssas_server.Databases[Database];
     foreach (Dimension _dim in _db.Dimensions)
         _dimensions.Add(_dim.ID, _dim.Name); 

     _ssas_server.Disconnect();
     _ssas_server.Dispose(); 

     return _dimensions;
} 

#>

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Connections>
    <AnalysisServicesConnection Name="olap" ConnectionString="Data Source=<#= _ssas_server #>;PROVIDER=MSOLAP;Impersonation Level=Impersonate;" Server="<#= _ssas_server #>" />
  </Connections>
  <Packages>
    <Package Name="ProcessSass" ConstraintMode="Linear">
         <Tasks>
            <AnalysisServicesProcessing Name="Process_Dimensions" ConnectionName="olap">
                  <ProcessingConfigurations>
                    <# 
                    Dictionary<string, string> _dimensions = DimensionKeys("Data Source=" + _ssas_server , _ssas_database);
                    foreach (string _dim in _dimensions.Keys){ #>
                        <DimensionProcessingConfiguration DatabaseId="<#= _ssas_database #>" ProcessingOption="<#= _process_type #>"  DimensionId="<#= _dim #>" />
                    <# } #>
                  </ProcessingConfigurations>
            </AnalysisServicesProcessing>
        </Tasks>
    </Package>
  </Packages>
</Biml>

A Single Processing Task For Each Dimension

Alternatively, our strategy could be to create a processing task for each dimension and then wrap that in a container.  Note that the container has a parallel constraint which allows some parallel processing.  Our package will look like the following.

image


<#@ assembly name="C:\Program Files (x86)\Microsoft SQL Server\110\SDK\Assemblies\Microsoft.AnalysisServices.dll" #> 

<#@ import namespace="System.Collections.Generic" #>
<#@ import namespace="Microsoft.AnalysisServices" #> 

<#@ template language="C#" #> 

<#
    String _ssas_server = @".\SQL2008R2";
    String _ssas_database = "Adventure Works DW 2008R2";
    String _process_type = "ProcessUpdate";
#> 

<#+ 
Dictionary<string, string> DimensionKeys(string Server, string Database)
{
     Dictionary<string, string> _dimensions = new Dictionary<string, string>();
     Server _ssas_server = new Server();
     _ssas_server.Connect(Server); 

     Database _db = _ssas_server.Databases[Database];
     foreach (Dimension _dim in _db.Dimensions)
         _dimensions.Add(_dim.ID, _dim.Name); 

     _ssas_server.Disconnect();
     _ssas_server.Dispose(); 

     return _dimensions;
} 

#>

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Connections>
    <AnalysisServicesConnection Name="olap" ConnectionString="Data Source=<#= _ssas_server #>;PROVIDER=MSOLAP;Impersonation Level=Impersonate;" Server="<#= _ssas_server #>" />
  </Connections>
  <Packages>
    <Package Name="ProcessSass" ConstraintMode="Linear">
         <Tasks>
            <Container Name="Process Dimensions" ConstraintMode="Parallel">
            <Tasks>
            <# 
                Dictionary<string, string> _dimensions = DimensionKeys("Data Source=" + _ssas_server , _ssas_database);
                foreach (string _dim in _dimensions.Keys){ #>            
                <AnalysisServicesProcessing Name="Process_<#= _dim #>" ConnectionName="olap">
                  <ProcessingConfigurations>
                        <DimensionProcessingConfiguration DatabaseId="<#= _ssas_database #>" ProcessingOption="<#= _process_type #>"  DimensionId="<#= _dim #>" />
                  </ProcessingConfigurations>
            </AnalysisServicesProcessing>
            <# } #>
            </Tasks>
            </Container>
        </Tasks>
    </Package>
  </Packages>
</Biml>

Conclusion

This post includes two snippets for building dimension processing packages.  The approach (using ProcessUpdate) is required as the first stage of processing when the database is too big to be processed within the allowed window.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s