Welcome to blogs.conchango.com Sign in | Join | Help

Welcome to blogs.conchango.com

Christian Wade's Blog

  • Databind ChartFX to OLAP Datasource

    I’ve just gone through an informal and limited evaluation process for a charting tool for a BI dashboard.  It included some of the RS extension products, ProClarity and ChartFX.  Of course I would have been keen to use ProClarity, but in this case the cost was a little high and, to be honest, it would have been a sledgehammer to crack a nut.


    ChartFX is the charting engine used by ProClarity.  It looks and feels exactly the same as ProClarity charts.  In theory, everything you can do with ProClarity charts, you can do with ChartFX – you just have to code it yourself (which is a lot easier than it sounds).


    Anyway, when evaluating ChartFX, I initially looked at the OLAP extension to it.  Seemed to work great at first and I was very impressed with it.  However, when I got to doing anything a little out of the ordinary like multiple Y axes or specifying X axes labeling, the code simply didn’t work!  For example, the code at the bottom of the following snippet has no effect.


    AdoMultiDimensionalData ADOMD1 = new AdoMultiDimensionalData();

    string connstr = "Provider=msolap; Data Source=localhost;Initial Catalog=Adventure Works;";

    string ds =

    @"SELECT

        [Date].[Calendar].[Calendar Year].Members ON 0,

        {

            [Measures].[Internet Sales Amount],

            [Measures].[Internet Freight Cost]

        } ON 1

    FROM

        [Adventure Works];";

    ADOMD1.Connect(connstr, ds);

    Olap1.DataSource = ADOMD1;


    // This code hides the Analysis Bar

    Olap1.AnalysisBar = AnalysisBar.Menu;

    ToolBar tb = (ToolBar)Olap1.AnalysisObject;

    tb.Visible = false;

    Chart1.Gallery = Gallery.Bar;


    // Code that doesn't work!

    AxisY addlAxisY = new AxisY();

    addlAxisY.Visible = true;

    addlAxisY.Position = AxisPosition.Far;

    Chart1.AxesY.Add(addlAxisY);

    Chart1.Series[0].AxisY = addlAxisY;


    The reason for the code not working is that the OLAP extension uses databinding, which overrides code such as this.  Here’s what the chart looks like.


    Chart with no extra Y axis

    The way I got round this problem was to not use the OLAP extension at all.  Instead, I am using the ChartFX API to iterate an ADOMD.NET CellSet and set the chart data.  Here is the code; it works fine for MDX with 2 axes.  You could of course take it a little further to display crossjoined sets, etc.  Here is the code.


    using (AdomdConnection conn = new AdomdConnection("Data Source=localhost;Initial Catalog=Adventure Works;"))

    {

        conn.Open();

        //Create a command, using this connection

        AdomdCommand cmd = conn.CreateCommand();

        cmd.CommandText =

        @"SELECT

            [Date].[Calendar].[Calendar Year].Members ON 0,

            {

                [Measures].[Internet Sales Amount],

                [Measures].[Internet Freight Cost]

            } ON 1

        FROM

            [Adventure Works];";

        //Execute the query, returning a cellset

        CellSet cs = cmd.ExecuteCellSet();

        //Get back rows representing each series

        TupleCollection tuplesOnRows = cs.Axes[1].Set.Tuples;

        //Set the number of series

        Chart1.Data.Series = tuplesOnRows.Count;

        //Get back columns representing each x axis category

        TupleCollection tuplesOnColumns = cs.Axes[0].Set.Tuples;

        Chart1.Data.Points = tuplesOnColumns.Count;

        //Set the data values and series names

        for (int row = 0; row < tuplesOnRows.Count; row++)

        {

            Chart1.Series[row].Text = tuplesOnRows[row].Members[0].Caption;

            for (int col = 0; col < tuplesOnColumns.Count; col++)

            {

                Chart1.Data[row, col] = Convert.ToDouble(cs.Cells[col, row].Value);

            }

        }

        //Set the x axis category names

        for (int i = 0; i < tuplesOnColumns.Count; i++)

        {

            Chart1.AxisX.Labels[ i ] = tuplesOnColumns[ i ].Members[0].Caption;

        }

        conn.Close();

    } // using connection


    //Set chart type

    Chart1.Gallery = Gallery.Bar;

    //Create secondary y axes series and its appearance properties

    AxisY addlAxisY = new AxisY();

    addlAxisY.Visible = true;

    addlAxisY.Position = AxisPosition.Far;

    Chart1.AxesY.Add(addlAxisY);

    Chart1.Series[1].AxisY = addlAxisY;



    Here’s what the chart looks like.


    Chart with extra Y axis

    We now have an AJAX enabled UI with blistering performance.  Changing the slicing of the chart has it refresh instantly.  No more RS spinnies!





  • SQL Server Standard - Recursive Hierarchies to XML

    I wrote an article for the September 2006 issue of SQL Server Standard magazine:

     SQL Server Standard

    The version that was finally published was a very cut down version with minimal code snippets.  This was for obvious reasons; they had to fit the content onto small columns in the magazine.  For the readers who would prefer the more verbose version, here it is!  I guess this is further reference material for the published article; I’m sure my friends at SQL Server Standard won’t mind.

    Suppose we have a sizeable recursive hierarchy in our SQL Server 2005 relational database.  We want to export it to XML.  This could be for a variety of reasons.  We may wish to use it as a pre-cached input to a UI control (e.g. a tree control); we may wish to export it to another system using a predefined format.  The possibilities are endless.


    Here is an example input dataset using the Northwind Employees table.

    Northwind Employees

    And this is how we want it represented in XML based on the ReportsTo self-referencing relationship.


    <Employee EmployeeID="2" LastName="Fuller" FirstName="Andrew">

                <Employee EmployeeID="1" LastName="Davolio" FirstName="Nancy" />

                <Employee EmployeeID="3" LastName="Leverling" FirstName="Janet" />

                <Employee EmployeeID="4" LastName="Peacock" FirstName="Margaret" />

                <Employee EmployeeID="5" LastName="Buchanan" FirstName="Steven">

                            <Employee EmployeeID="6" LastName="Suyama" FirstName="Michael" />

                            <Employee EmployeeID="7" LastName="King" FirstName="Robert" />

                            <Employee EmployeeID="9" LastName="Dodsworth" FirstName="Anne" />

                </Employee>

                <Employee EmployeeID="8" LastName="Callahan" FirstName="Laura" />

    </Employee>


    On the face of it, this seems like a simple nut to crack.  However, there are various design options available to us.  This article explores some of the options.  Performance is deemed the overriding factor when evaluating the optimal approach.


    It is assumed that it is beneficial to convert the data to XML in the database; i.e. in-process with SQL Server.  However, the overall evaluation of the different approaches would be the same even if the transformation to the desired format took place in the middle tier.


    All the code in this post can be downloaded from here.  This is a link to a zip file containing a solution and a SQL Server project.  To use it, it is necessary to change the database connection string in the project properties.  The TSQL scripts that generate the XML output can be found in the Option1.sql and Option2.sql solution files.


    Note: A Northwind database restored to SQL Server 2005 is also required to run the code.  It is necessary to build and deploy the above project.  A backup of Northwind from SQL Server 2005 can be downloaded from here.



    What were the options we had in SQL Server 2000 for exporting relational data to XML?  We had the FOR XML clause of the SELECT statement.  Incidentally, SQL Server 2005 has a new mode called FOR XML PATH.  This mode is powerful because it allows us to use XPath-type syntax to define our XML structure.  FOR XML PATH will meet about 95% of the use cases for which we traditionally had to resort to FOR XML EXPLICIT.  FOR XML PATH is a lot less cumbersome and easy to use.  However, both these options require us to know the depth of the lowest level node when writing the query.  For recursive hierarchies, we do not actually know the lowest depth beforehand because it is variable.  This means it is not feasible to use FOR XML and directly arrive at our desired XML output.  The best we can do with FOR XML is to run the output through an XSLT transform.



    Option 1: FOR XML AUTO and XSLT

    The SQL Server 2005 functionality for XML provided through XQuery and XML indexing can be considerably enhanced by using SQL CLR to apply XSLT transformations.  Here we are using a SQL CLR function to convert the output of a FOR XML AUTO query against our Northwind dataset in order to arrive at the desired output.


    DECLARE @input xml, @stylesheet xml

    SELECT            @input =

                            (SELECT EmployeeID, ReportsTo, LastName, FirstName

                            FROM Employees AS Employee

                            FOR XML AUTO, ROOT('Employees')),

                            @stylesheet =

                            CAST(

    '<?xml version="1.0" encoding="UTF-8"?>

    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

                <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

                <xsl:template match="/Employees" >

                            <xsl:apply-templates select="Employee[not(@ReportsTo)]" />

                </xsl:template>

                <xsl:template match="Employee[not(@ReportsTo)]">

                            <xsl:element name="Employee">

                                        <xsl:attribute name="EmployeeID">

                                                    <xsl:value-of select="@EmployeeID" />

                                        </xsl:attribute>

                                        <xsl:attribute name="LastName">

                                                    <xsl:value-of select="@LastName" />

                                        </xsl:attribute>

                                        <xsl:attribute name="FirstName">

                                                    <xsl:value-of select="@FirstName" />

                                        </xsl:attribute>

                                        <xsl:call-template  name="FindChildNodes">

                                                    <xsl:with-param name="ReportsTo" select="@EmployeeID" />

                                        </xsl:call-template>

                            </xsl:element>

                </xsl:template>

                <xsl:template name="FindChildNodes">

                            <xsl:param name="ReportsTo" />

                            <xsl:for-each select="//Employees/Employee[@ReportsTo=$ReportsTo]">

                                        <xsl:variable name="EmployeeID" select="@EmployeeID" />

                                        <xsl:element name="Employee">

                                                    <xsl:attribute name="EmployeeID">

                                                                <xsl:value-of select="@EmployeeID" />

                                                    </xsl:attribute>

                                                    <xsl:attribute name="LastName">

                                                                <xsl:value-of select="@LastName" />

                                                    </xsl:attribute>

                                                    <xsl:attribute name="FirstName">

                                                                <xsl:value-of select="@FirstName" />

                                                    </xsl:attribute>

                                                    <!-- only call template recursively if EmployeeID is a ReportsTo of another node -->

                                                    <xsl:if test="count(//Employees/Employee[@ReportsTo = $EmployeeID]) > 0">

                                                                <xsl:call-template  name="FindChildNodes">

                                                                            <xsl:with-param name="ReportsTo" select="@EmployeeID" />

                                                                </xsl:call-template>

                                                    </xsl:if>

                                        </xsl:element>

                            </xsl:for-each>

                </xsl:template>

    </xsl:stylesheet>

    ' AS xml)

    SELECT dbo.ApplyTransform(@input, @stylesheet)


    Further discussion regarding the implementation of the XSLT stylesheet is outside the scope of this article.


    Here is an implementation of the ApplyTransform function that receives the stylesheet.  It basically just exposes the System.Xml.Xsl.CompiledTransform class’ Transform method to TSQL.


    using System;

    using System.Data;

    using System.Data.SqlClient;

    using System.Data.SqlTypes;

    using System.Xml;

    using System.Text;

    using System.IO;

    using Microsoft.SqlServer.Server;

     [SqlFunction(DataAccess = DataAccessKind.None, SystemDataAccess = SystemDataAccessKind.None, IsDeterministic = false, IsPrecise = false)]

    [return: SqlFacet(IsFixedLength = false, MaxSize = -1)]

    public static SqlXml ApplyTransform(SqlXml data, SqlXml styleSheet)

    {

        // Create a stream for the xml output

        MemoryStream ms = new MemoryStream();

        XmlWriter xw = XmlWriter.Create(ms);

        // Load the XML and transform it

        XslCompiledTransform ctx = new XslCompiledTransform(false);

        ctx.Load(styleSheet.CreateReader());

        ctx.Transform(data.CreateReader(), xw);

        // return the output

        return new SqlXml(ms);

    }


    Assuming some prerequisite knowledge of XSL stylesheets, this appears to be a good option to achieve our goal.  However, let us evaluate others before deciding the optimum choice.



    We could of course simply access the relational data in a SQL CLR stored procedure and write it to XML.  However, if we don’t access the data in the same order it will appear in the XML, we will have to load the whole structure into the DOM.  Loading the whole structure into the DOM means loading it into memory; if we are talking about a million rows, we may well not have enough memory available!  The ‘right order’ is as follows.  This allows us to write it to XML sequentially, which is vastly more performant and efficient.

    • Node1
      • Node2
        • Node3
        • Node4
      • Node5

    How can we return the data in this order from a single TSQL query?  Probably the best option is to use a recursive Common Table Expression (CTE).  CTEs are a new feature in SQL Server 2005.  They can be conceptualized similarly to views and derived tables in that they encapsulate an inner query.  However, they differentiate themselves from views and derived tables in the way they deal with recursion.  The syntax for a recursive CTE is as follows.


    WITH RecursiveCTE(<column_list>)

    AS

    (

                -- Anchor Member:

                SELECT ...

                FROM <some_table(s)>

                ...

               

                UNION ALL

               

                -- Recursive Member

                SELECT ...

                FROM <some_table(s)>

                JOIN RecursiveCTE

                ...

    )

    -- Outer Query

    SELECT ...

    FROM RecursiveCTE

    ...


    The WITH clause is the definition of the CTE and it precedes the outer query, which refers back to the CTE.  Within the WITH clause, the anchor member is a SELECT statement that acts as the seed for recursion.  It is merged using the UNION ALL operator to the recursive member, which is a SELECT statement that refers back to the CTE; hence it is recursive.


    Note: it was possible to return the data in the ‘right order’ from a single TSQL query in SQL Server 2000 using a recursive table-valued user-defined function.  I tried this approach and the performance is far worse than the recursive CTE.  For large datasets, the execution time was multiplied by a factor of at least 5!  For example code of how this would be done using a recursive table-valued user-defined function, see this old blog post of mine.

    Note: this whitepaper describes a way of avoiding using a table-valued UDF.  It does however use a scalar UDF, so it is still limited to 32 nested recursions.  If you know your hierarchies will never exceed 32 nested recursions, the scalar UDF method is probably a good way to go.  However, given the nature of recursive hierarchies, this may be an assumption you cannot make.

    Note: recursive CTEs are by default limited to 100 nested recursions.  However, this only acts as a safeguard against infinite recursion.  This limit can be changed using the MAXRECURSION query hint to anything up to 32,767 nested recursions.  Recursive table-valued user-defined functions, on the other hand, are limited to 32 nested recursions.


    Note: FOR XML cannot be used by the anchor or recursive member SELECT statements in a recursive CTE.  This is why we need a CLR routine to present the data as XML.



    Option 2: Recursive CTE and sequential writing of XML


    The usp_OrderedEmployeeHierarchy stored procedure below returns the data in the order we require.


    CREATE PROCEDURE usp_OrderedEmployeeHierarchy (@Seed int)

    AS

                ;WITH EmployeeCTE(EmployeeID, ReportsTo, LastName, FirstName, Depth, SortCol)

                AS

                (

                  SELECT EmployeeID, ReportsTo, LastName, FirstName, 0, CAST(EmployeeID AS varbinary(max))

                  FROM Employees

                  WHERE EmployeeID = @Seed

                  UNION ALL

                  SELECT E.EmployeeID, E.ReportsTo, E.LastName, E.FirstName, M.Depth+1,

                            CAST(SortCol + CAST(E.EmployeeID AS binary(4)) AS varbinary(max))

                  FROM Employees AS E

                            JOIN EmployeeCTE AS M

                              ON E.ReportsTo = M.EmployeeID

                )

                SELECT

                  EmployeeID, ReportsTo, LastName, FirstName, Depth

                --,SortCol

                FROM EmployeeCTE

                ORDER BY SortCol

    GO


    When passing a @Seed parameter of 2, the following dataset is returned.


    Ordered Northwind Employees

    The way the CTE handles ordering the data is through the SortCol column.  For the seed of recursion, this column is the EmployeeID cast to varbinary(max).  For each subsequent recursive member, it is the existing SortCol value concatenated with the binary representation of the EmployeeID.  If we include it in the output, the following dataset is returned.


    Ordered Northwind Employees with SortCol

    As shown, the SortCol column provides a useful structure on which we can sort the dataset.  The data is thereby retrieved in the required order.


    The Depth column is initialized at zero for the seed of the recursion and then incremented by one for each recursion.


    This is the SQL CLR stored procedure that writes the XML sequentially.


    using System;

    using System.Data;

    using System.Data.SqlClient;

    using System.Data.SqlTypes;

    using System.Xml;

    using System.Text;

    using System.IO;

    using Microsoft.SqlServer.Server;

    [Microsoft.SqlServer.Server.SqlProcedure(Name = "usp_GetEmployeeHierarchyXml")]

    public static void GetEmployeeHierarchyXml(SqlInt32 seed)

    {

        // Create a stream for the xml output

        MemoryStream ms = new MemoryStream();

        XmlWriter xw = XmlWriter.Create(ms);

        // Depth counters to identify when to close elements

        int previousDepth = -1;

        int newDepth = 0;

        using (SqlConnection conn = new SqlConnection("context connection=true"))

        {

            conn.Open();

            using (SqlCommand cmd = conn.CreateCommand())

            {

                cmd.CommandType = CommandType.StoredProcedure;

                cmd.CommandText = "usp_OrderedEmployeeHierarchy";

                cmd.Parameters.Add(new SqlParameter("@Seed", seed));

                SqlDataReader reader = cmd.ExecuteReader();

                while (reader.Read())

                {

                    // Get the depth of the current node

                    newDepth = Convert.ToInt32(reader["Depth"]);

                    // Do we need close an open element(s)?

                    CloseOpenElements(xw, previousDepth, newDepth);

                    xw.WriteStartElement("Employee");

                    xw.WriteAttributeString("EmployeeID", Convert.ToString(reader["EmployeeID"]));

                    xw.WriteAttributeString("LastName", Convert.ToString(reader["LastName"]));

                    xw.WriteAttributeString("FirstName", Convert.ToString(reader["FirstName"]));

                    previousDepth = newDepth;

                }

                // Now we are back at depth 0; do we need close an open element(s)?

                newDepth = 0;

                CloseOpenElements(xw, previousDepth, newDepth);

                xw.Close();

                // Return the results to the client

                SqlDataRecord record;

                record = new SqlDataRecord(new SqlMetaData[] { new SqlMetaData("EmployeeHierarchy", SqlDbType.Xml) });

                // Set the record field.

                record.SetSqlXml(0, new SqlXml(ms));

                // Return the record to the client.

                SqlContext.Pipe.Send(record);

            }

        }

    }

    private static void CloseOpenElements(XmlWriter xw, int previousDepth, int newDepth)

    {

        if (newDepth <= previousDepth)

            for (int i = 0; i <= previousDepth - newDepth; i++)

                xw.WriteEndElement();

    }


    The GetEmployeeHierarchyXml stored procedure executes the usp_OrderedEmployeeHierarchy stored procedure for data access.  A System.Xml.XmlWriter object is used to sequentially write the XML.  The Depth column is used to keep track of when to close open XML elements.  A Microsoft.SqlServer.Server.SqlDataRecord object is used to return a one-row, one-column, XML-datatype result set to the client.  This is done using the static Send method, which is a member of the Microsoft.SqlServer.Server.SqlContext class’ Pipe property.  The SqlContext class is a class dedicated solely to in-process data access.  It contains other interesting properties and methods, which are outside the scope of this article.


    Executing usp_GetEmployeeHierarchyXml returns the XML in the desired format.



    Some readers may be asking the question as to why we do not simply write a SQL CLR stored procedure that retrieves a set of child nodes at a time for every single node in the hierarchy.  This does mean a lot more TSQL queries being executed, but is conceptually the simplest approach.  However, this is not feasible because multiple active resultsets (MARS) is not supported in SQL CLR.  We would therefore have to instantiate many connection objects.  This also presents a problem because we can only have one open ‘context’ connection at a time.  I did attempt this approach processing the resultset in a client-side application and the performance was far worse than Option 1 or 2.  This approach has therefore been ruled out.


    So, here is the question we’ve all been waiting for: which is the most efficient choice?  I tried both approaches on a product hierarchy, which happened to originate from Analysis Services, containing 8,700 members.  Here are the findings.  Note that Option 2 assumes a cached execution plan for the usp_OrderedEmployeeHierarchy stored procedure.


    Option 1: 38 seconds

    Option 2: 15 seconds


    This is a performance gain of over 60% !!


    We can therefore conclude that CTEs provide a well architected method for accessing recursive relational data.  A prime example of this is when exporting such data to XML.


  • Currency Conversion in Analysis Services 2005

    I recently investigated currency conversion in SSAS for a customer.  I found that there is not much on the web in the way of examples.  Here is an attempt to correct that.

    First thing to do: figure out what type of currency conversion you need.  There are 3 types.

    1. MANY-TO-MANY.  This is where the facts are stored in multiple currencies – i.e. amounts for the same measure are in various currencies in the same fact table.  Also, the users might want to report the total amount in different currencies.
    2. MANY-TO-ONE.  The facts are stored in multiple currencies, but this time it is only necessary to ever report the total amount in a single currency – e.g. a corporate currency.
    3. ONE-TO-MANY.   The facts are always stored in one currency only.  However, the users might want to report the total amount in different currencies.

    This post focuses on the different options available for One-to-Many.  With One-to-Many, there are 2 options: measure expressions or the currency conversion wizard.

    If you create a DSV on top of AdventureWorksDW and include DimTime, FactInternetSales, FactCurrencyRate and DimCurrency, it should look something like this.

    DSV1

    The relationship between FactInternetSales and DimCurrency is for the case where your facts are stored in multiple currencies (Many-to-Many and Many-to-One).  To look at One-to-Many, we will delete this relationship.  Also, we will delete the ShipDate and DueDate relationships between FactInternetSales and DimTime, but leave the OrderDate relationship (this post is about currency conversion, not role-playing dimensions!).

    We now have the following DSV diagram.

    DSV2

    ONE-TO-MANY USING MEASURE EXPRESSIONS

    I created a cube on top of this DSV.  Here is a backup of it http://blogs.conchango.com/christianwade/CurrencyConversionBackup.zip  You can download, restore and open it using the VisualStudio using the File > Open > Analysis Services Database menu option.

    The only measures in it are Sales Amount (from FactInternetSales) and End Of Day Rate (from FactCurrencyRate).  Sales Amount has a FormatString of “Currency”.  End Of Day Rate has an AggregateFunction property value of “LastNonEmpty”.  LastNonEmpty is a semi-additive measure.  We want it to sum for all dimensions except Date.  For the Date dimension, it will take the last non-empty child.  For example, the exchange rate we would want to see for a week member in a report would not be the sum of the exchange rate on Monday + exchange rate on Tuesday + exchange rate on Wednesday, …  Instead, we just want the most recent exchange rate (maybe exchange rate on Saturday).

    The Date dimension is very simple.  The Date attribute, which is the key attribute, has a KeyColumns property of TimeKey (which is the surrogate key of the DimTime table) and a NameColumn of FullDateAlternateKey (what the users see at Date level).

    I used the Dimension Wizard to create the Date dimension so that it would be flagged with Type = Time, etc.  This is one of the few cases where having these properties set correctly actually affects cube behaviour.  For example, semi-additive measures and some MDX functions like YTD won't work without Type = Time.

    Date Dimension

    Here are the mappings in the Dimension Usage tab.  They are pretty straightforward.  There is a many-to-many relationship between Currency and Fact Internet Sales.

    Dimension Usage

    Here is a screenshot of the Currency dimension.  The main points about the Currency dimension are as follows.

    • The KeyColumns property of the Currency attribute is set to CurrencyKey (which is the surrogate key in for the DimCurrency table in AdventureWorksDW).
    • The NameColumn property of the Currency attribute is set to CurrencyName (which is what we want the users to see).
    • The Type property of the dimension is set to Currency.  The only purpose of this (unlike Type=Time that does affect cube behaviour) is to inform client tools in case they want to display currency dimensions differently to regular dimensions.
    • The Type property of the Currency attribute is set to CurrencyName.  Again, this is just to inform client tools.
    • The IsAggregatable property of the Currency attribute is set to False.  This removes the All level for the attribute.  We would after all not want to sum the values of different currencies.  500 pounds sterling plus 100,000 cruzeiros equals 100,500 of what currency?  Monopoly money?
    • Set the DefaultMember property of the Currency attribute to whatever the Sales Amount values are stored in.  In the case of AdventureWorksDW, it is US Dollars.
    • The attribute relationship between Currency and Currency Alternate Key has its Cardinality property set to One.  This is because, for a particular Currency, there can be only one Currency Alternate Key – i.e. they have a one-to-one relationship.  This improves efficiency in aggregation because the numbers will always be the same (e.g. £500 for “Great Britain Pounds” will always result in £500 for “GBP”).  Analysis Services will therefore not bother figuring out the aggregate values for Currency Alternate Key.  It will simply re-use those of Currency.
    • Set the AttributeHierarchyEnabled property of the Currency Alternate Key attribute to False.  If it is not set, Analysis Services will not allow deployment of the project because of the attribute relationship with Currency and that Currency has IsAggregatable = False.

    Currency Dimension

    Here is the measure expressions bit.  The Sales Amount measure has the following measure expression: “[Sales Amount] / [End Of Day Rate]”.  The facts are stored in US Dollars and the “pivot currency” is US Dollars.  The pivot currency is the currency the exchange rate values convert from.  The measure expression is a division rather than a multiplication because this is the way the exchange rates are held.  For example, the exchange rate for GBP is roughly 1.5 in the FactCurrencyRate table.  Therefore, to convert $15 from the pivot currency to GBP, 15 / 1.5 gives us 10.  Doing a multiplication would result in £22.50 (obviously wrong).  Note: measure expressions are done at leaf level.

    Measure Expression

    It is worth going back into the Dimension Usage tab and setting the DirectSlice property of the many-to-many relationship.  Setting it to “([Currency].[Currency].&[100])” means that, when querying Sales Amounts by the base/pivot currency (US Dollars), Analysis Services will just return the value as it appears in the fact table – i.e. without applying the measure expression.  After all, there is no need to convert US Dollars into US Dollars!  If we did not set DirectSlice, we would have to ensure that the FactCurrencyRate table has an exchange rate of 1 for converting US Dollars into US Dollars (and for every single day for which we have data).  Otherwise, Analysis Services would be looking for an End Of Day Rate to divide by, find null and return null.  It should also perform better when querying by USD with DirectSlice populated correctly because it doesn't have to bother with the conversion calculation at all.  So what we want is just a tuple with the default member for every enabled attribute in the outer many-to-many dimension.  Incidentally, this is what the Root(<dimension_expression>) function would return, but we can't use any MDX functions in the DirectSlice property (hence the Functions pane is disabled), so we have to explicitly list each default member in a tuple.

    DirectSlice

    Lastly, we will insert some MDX in the Calculations tab that sets the Locale ID for currencies.  This will avoid displaying 500 pounds sterling as “$500” (goodness gracious!).  For a complete list of the Locale IDs, see http://www.microsoft.com/globaldev/reference/lcid-all.mspx
    Language([Currency].[Currency].[United Kingdom Pound]) = 2057;
    Language([Currency].[Currency].[Brazilian Real]) = 1046;
    Language([Currency].[Currency].[US Dollar]) = 1033;
     
    These assignments are for illustration purposes only.  The main problem with this approach is maintainability.  If we bring in new currencies in the future, we need to modify the MDX script.  The ideal would be to store them in the DimCurrency table, expose them using the ValueColumn property of the Currency attribute, and use that for the assignment.  Some may prefer to go the member property route, but I think this is a good use case for the ValueColumn property.
    Language(([Currency].[Currency].Members, [Measures].[Sales Amount])) =
        [Dim Currency].[Currency].CurrentMember.MemberValue;
     

    Locale ID MDX

    Let’s build and deploy!  Now let’s browse!

    Browse at aggregated level

    On the face of it, these numbers seem OK.  29,358,677 is roughly 1.5 times 19,685,311.  But let’s double check.  Copying and pasting into Excel reveals that this is not the case.  There is actually a discrepancy of £784,195!  Why is that?  Well, this is actually the behaviour that we want …

    Check numbers in Excel at aggregated level

    If you remember, measure expressions are done at leaf level.  This calculation is, as George Spofford would say, “non-commutative”.  Addition/subtraction combined with multiplication/division will result in different numbers depending on whether the multiplication/division is done at leaf level or the aggregated level.  Analysis Services has done the currency conversion at leaf level – i.e. at the day that each transaction actually took place.  This is of course much more accurate.

    So let’s test this as well.  Drilling into Date reveals the following.

    Browse to leaf level

    Copying and pasting the Date-level data into Excel and applying the multiplication reveals that the numbers are indeed calculated correctly at leaf level.

    Check leaf-level numbers in Excel

    For one-to-many currency conversion, I prefer using measure expressions rather than the currency conversion wizard.  As shown by this post – http://blogs.conchango.com/christianwade/archive/2006/07/25/4256.aspx  – measure expressions are the most performant way of doing this kind of calculation.  The wizard generates MDX that does the calculation at runtime, so it is not quite as performant for large volumes.  Furthermore, virtually all the work we have just gone through to enable measure-expression currency conversion would also have to be done prior to running the wizard (with the exception of the measure expression itself).  Anyway, for completeness, let’s take a look at how the currency conversion wizard does it.

    ONE-TO-MANY USING CURRENCY CONVERSION WIZARD

    Picking up where we left off, just delete the measure expression and run the wizard.  As I said, pretty much all the other stuff we had to do to enable measure expressions has to be done prior to running the wizard anyway.

    From the Cube Structure tab, select Cube > Add Business Intelligence.

    Run wizard

    Select “Define Currency Conversion” and click Next.

    Wizard page 1

    Fill in the next page in the wizard as shown here.  Specify which measure group contains the exchange rates.  The pivot currency, as discussed above, is US Dollars.  “1.5 US Dollars per 1 United Kingdom Pound” sounds right, so let’s go with that.

    Wizard page 2

    Select Sales Amount as the measure(s) we want to convert.

    Wizard page 3

    Specify One-to-many currency conversion.

    Wizard page 4

    Now it wants us to specify the “reporting currencies”.  It will actually create another currency dimension called “Reporting Currency”, which will be based on this selection.  I would imagine that most cubes that only need one-to-many currency conversion will have the base Currency dimension purely for the purpose of viewing the values in foreign currencies!  Adding another currency dimension can be a little overkill in my opinion.  I selected United Kingdom Pound and Brazilian Real for my reporting currencies.

    Wizard page 5

    The last page shows us what changes will be made to the cube/DSV.  Click Finish.

    Let’s take a look at what it actually did.  Firstly, we have a new entity in our DSV called Reporting Currency.  It is basically the same as the FactCurrencyRate, but filtered on the pivot currency and the other currencies we selected as reporting currencies.  Interestingly, it is not related to any other entity in the DSV.

    Reporting Currency entity in DSV

    The wizard also created a dimension called Reporting Currency that is based on the Reporting Currency entity in the DSV.  It is very similar to our Currency dimension.  The Currency attribute has a DefaultMember property of US Dolllars, its IsAggregatable property is set to False, etc., etc.  In fact the only difference of any significance is that it has not set the Cardinality property of the attribute relationship to One.

    Reporting Currency dimension

    And here is the MDX script it generated.  Note: there is a bug in this script.  If you are using a Date dimension that is called something different to the base dimension name (e.g. Order Date vs. Date), you will have to replace “Scope( Leaves([Date])” with “Scope( Leaves([Order Date])”.

          // <Currency conversion>

               

                // Currency conversion wizard generated script.      

                // Currency conversion generated on: 24 August 2006 15:36:38

                // by user: Conchango2 

                // Currency conversion type: OneToMany   

                // Selected members to be converted: Sales Amount    

                // Please be aware that any changes that you decide to make to it may be overridden the next time you run the Currency Conversion wizard again. 

         

                // All currency conversion formulas are calculated for the non pivot currency and at leaf of the time dimension

                Scope ( { Measures.[Sales Amount]} );

                      Scope( Leaves([Date]) ,

                            Except([Reporting Currency].[Currency].[Currency].Members, [Reporting Currency].[Currency].[Currency].[US Dollar]));

               

                    // This section overrides the Pivot Currency values with the Converted value for each selected measures/account members/account type members needing to be converted with Measure rate End Of Day Rate

                    // LinkMember is used to reference the currency from the source currency dimension in the rate cube. 

                    Scope( { Measures.[Sales Amount]} );

                           This = [Reporting Currency].[Currency].[US Dollar] / (Measures.[End Of Day Rate], LinkMember([Reporting Currency].[Currency].CurrentMember, [Currency].[Currency])) ;

                    End Scope;   

               

                      End Scope; // Leaves of time and non pivot currency  

                End Scope; // Measures

               

            // End of the currency conversion wizard generated script

          // </Currency conversion>

    Having done some (thorough) scoping, the line of code that is of real interest is

    This = [Reporting Currency].[Currency].[US Dollar] / (Measures.[End Of Day Rate], LinkMember([Reporting Currency].[Currency].CurrentMember, [Currency].[Currency])) ;

    This line is doing the division in a similar way to the measure expression.  However, it is using the LinkMember function to do a runtime link between Currency and Reporting Currency based on the member name.  This is effectively just replacing the Currency dimension with the Reporting Currency dimension for reporting purposes.

    SUMMARY

    As I said, if all you have is a one-to-many currency conversion requirement, I think the measure expressions approach is more elegant.  It stores the calculated data on disk rather than calculating at runtime, so it should perform a little better for large volumes (despite the fact that cell assignments in MDX script do perform very well).  Also, there is only one currency dimension (instead of Currency and Reporting Currency).  For many cubes that just require one-to-many currency conversion, this is a simpler model.  Plus a lot of the work you have to do to use the measure expressions approach has to be done anyway in order to run the wizard.  This is in contrast to the Time Analysis wizard, which I think adds great value.  I think the currency conversion wizard adds more value for many-to-many currency conversion.  In this case, you would invariably want the 2 currency dimensions.

  • Analysis Services Stored Procedure Project

    Jon Axon, Chris Webb, Mosha Pasumansky, Greg Galloway, Darren Gosbell  (and I although not as much as these guys!) have released some very interesting and useful MDX sprocs.  Check out http://www.codeplex.com/Wiki/View.aspx?ProjectName=ASStoredProcedures

    Enjoy!

     

  • Measure expressions: how performant are they?

     

    Oh, and before the anoraks out there mail me saying “did you know that ‘performant’ is not recognized as an adjective by the major dictionaries?”, yes I do know – but I like the word, so I’m using it!

     

    We are modeling the business of the Multi-Dimensional Parcel Service (MDPS).

    MDPS Truck

    They ship parcels domestically, internationally (and between dimensions).  We are only concerned with the operations of their trucks.  The trucks either

    ·        Collect parcels from depots to deliver to customers

    ·        Collect parcels from customers to deliver to depots

     

    The customer pays for the delivery/collection, so revenue can be associated with a customer.  The job of collecting the parcels and delivering them has costs associated with it – e.g. petrol.  Costs are therefore associated with a job.

     

    DSV

     

    It is common for a truck to be part loaded in order to serve multiple customers.  In this case there would be multiple delivery/collection addresses for the same job.  It is therefore conceivable that we could have the following data in our Revenue table.

     

    Revenue Table

    The parcels for JobID of 1 were delivered to two customers (those with CustomerIDs of 1 and 2).  However, Customer 1 only took up a small amount of space in the truck.  The job took place in December and he was shipping Halloween decorations so there wasn’t much demand.  Customer 2, on the other hand, was shipping Christmas trees, so he used the vast majority of the truck’s space.

    How then should we assign costs to a customer?  You have probably already noticed that Customer is a many-to-many dimension with regard to Cost.  A customer can have many jobs performed, and a job can be done for more than one customer at a time.

     

    In our example, the total cost of Job 1 (JobID of 1) is £500.  With the default behaviour of many-to-many dimensions (i.e. no measure expression or MDX to split the costs), both customers 1 and 2 would be assigned a cost of £500.  This seems a little unfair – especially for Customer 1 because he only used a small part of the truck space.  It is a business decision as to how to split costs in this type of scenario.  In this hypothetical case, it makes sense to do a proportional split based on revenue.

     

    This is what the CostMultiplier field is for.  For a given JobID, the sum of CostMultiplier always equals 1.  So, for the first row, 500 / 5,000 is 0.1.  For the second row, 4,500 / 5,000 is 0.9.  For the third row, 3,000 / 3,000 is 1.

     

    Using MDX or a measure expression, we can multiply the cost for a customer by the CostMultiplier.  The following screenshot is therefore how an OLAP client tool would present the costs in our example.

     

    Costs by Customer

     

    Note: doing the proportional split based on revenue results in the data being aggregated using a regular sum.  The cost for Customer 1 + Customer 2 + Customer 3 is £800.  £800 is also the actual total cost of Job 1 and Job 2.  Summing the individual customers using the many-to-many dimension without the proportional split would have resulted in £1,300 (£500 + £500 + £300).

     

    Note: the proportional split has to be done at leaf level.  This is because the calculation is, as George Spofford would say, “non-commutative”.  It involves addition/subtraction and multiplication/division, so we will get different numbers at the aggregated level depending on whether or not the calculation is done at leaf level.

     

    So, let’s do it!  I created a relational database containing the above 4 tables and populated it with (quite a lot of) data.  I then created a cube on top of it to allow testing of how performant the different design options are.

     

    You can download a backup of the cube that uses calculated members (Option 1 and 2) from here.  For a backup of the cube that uses the measure expression (Option 3), download from here.

     

    Here is the metadata of the cube.

     

    Cube Metadata

     

    Here are the numbers of rows for each table

    Customer –        53,186 rows

    Revenue –    4,370,206 rows

    Job –                979,921 rows

    Cost –              334,342 rows

     

    To populate the CostMultipler field, we can run the following query.

     

    UPDATE Revenue

          SET CostMultiplier =

          CASE

                WHEN dt.Amount = 0 or dt.Amount IS NULL THEN 0

                ELSE r.Amount / dt.Amount

          END

          FROM Revenue r

          INNER JOIN

          (

                SELECT JobID, SUM(Amount) Amount

                FROM Revenue r

                GROUP BY JobID

          ) dt ON r.JobID = dt.JobID

     

    Now, when we run the following query, we should get back 1 for every single JobID.

     

    SELECT

          JobID,

          SUM(CostMultiplier)

    FROM

          Revenue

    GROUP BY

          JobID

     

    I set the attribute relationship’s Cardinality and RelationshipType properties to improve performance (see this post).  I also of course designed aggregations on the 2 measure group partitions.

     

    Various options are evaluated for performance in getting the costs by customer including the proportional split.  It is assumed that we do not have a query cache when testing performance.  The AS query cache can be cleared by running an XMLA command as pointed out by Darren Gosbell here.

     

    OPTION 1 – TRADITIONAL STYLE MDX

     

    I created a calculated measure called “Cost – Traditional MDX”.  It takes the traditional approach of defining the calculation in the member declaration.  See the MDX below.  Obviously only the calculated measure would be exposed in the perspectives; not the base measure.

     

    Note: this will only work if the only exposed hierarchy from the [Customer] dimension is [Category – Customer].

     

    CREATE MEMBER CURRENTCUBE.[Measures].[Cost - Traditional MDX]

    AS

        IIf

        (

            [Customer].[Category - Customer].CurrentMember.Level IS

            [Customer].[Category - Customer].[(All)],

            // If at the all level of customer, just return the hidden base measure

            [Measures].[Cost],

            // Otherwise split the cost at leaf level proportionally to revenue

            Sum

            (

                Descendants

                (

                    [Customer].[Category - Customer].CurrentMember, , LEAVES

                ) *

                Descendants

                (

                    [Job].[Reference].CurrentMember, , LEAVES

                ),

                ([Measures].[Cost] * [Measures].[Cost Multiplier])

            )

        ),

    FORMAT_STRING = "Currency",

    NON_EMPTY_BEHAVIOR = { [Cost] },

    VISIBLE = 1;

     

    Guess how long the following query takes (on my dev VPC)?  About 1 minute 15 seconds.  There are only about 100 rows returned.  The reason it takes so long is that it has to go down to leaf level at runtime for all the Job/Customer combinations.

     

    SELECT

          [Measures].[Cost - Traditional MDX] ON 0,

          {

                [Customer].[Category - Customer].[All],

                [Customer].[Category - Customer].[All].Children

          } ON 1

    FROM

          [Measure Expressions]

     

    OPTION 2 – MDX SCRIPT

     

    We can improve on Option 1 using MDX script assignments.

     

    CREATE MEMBER CURRENTCUBE.[Measures].[Cost - MDX Script]

    AS

        [Measures].[Cost],

    FORMAT_STRING = "Currency",

    NON_EMPTY_BEHAVIOR = { [Cost] },

    VISIBLE = 1;

    SCOPE ([Measures].[Cost - MDX Script]);

       

        // Assign to all cells in the [Customer].[Category - Customer]

        // hierarchy except the [All] member

        ( [Customer].[Category - Customer].Members(1) : Null ) =

        Sum

        (

            Descendants

            (

                [Customer].[Category - Customer].CurrentMember, , LEAVES

            ) *

            Descendants

            (

                [Job].[Reference].CurrentMember, , LEAVES

            ),

            ([Measures].[Cost] * [Measures].[Cost Multiplier])

        );

       

    END SCOPE;

     

    This brought the query time down from over 1 minute 15 seconds to about 50 seconds.  It does of course return the same data.

     

    UPDATE: LEAF-LEVEL MDX SCRIPT ASSIGNMENT 

     

    Some readers may be asking why we didn't execute something like the following MDX script assigment.

     

    SCOPE (Leaves(Customer), Leaves(Job));

        [Measures].[Cost] = [Measures].[Cost] * ValidMeasure([Measures].[Cost Multiplier]);

    END SCOPE;

     

    This would be taking a similar approach to what is described by Mosha here.  The Leaves function returns a set of tuples representing the lowest level of granularity across all attributes in the dimension that is passed in as a parameter.  ValidMeasure is used purely as a performance optimisation (see Mosha's post above).  As [Measures].[Cost] is a base measure rather than a calculated measure, it should automatically aggregate up the hierarchy (i.e. to customer category level, which is what our query is interested in) right?  Wrong!  Take a look at Page 388 to 390 of the MDX Solutions book.  It states that the outer dimension in a many-to-many relationship (in this case Customer) will not automatically aggregate up the hierarchy.  This is similar behaviour to MDX script assigments involving calculated measures.  This approach is normally very performant as it doesn't need to use the Descendants function; Analysis Services deals with the aggregation automatically.  Unfortunately, we have to rule it out in this case.

     

    OPTION 3 – MEASURE EXPRESSIONS 

     

    I created another cube and set up a measure expression for this purpose.  It is simply “[Cost]*[Cost Multiplier]”.

     

    Note: when using measure expressions, it is often a good idea to set the DirectSlice property for the dimension relationship (in the Dimension Usage tab).  For the case of MDPS, setting the direct slice to Root([Customer]) (as shown by the following screenshot) means that the All member for Job/Customer attributes shows the actual total of costs – not just those which have a corresponding [Cost Multiplier].  If we do not set DirectSlice, Jobs with no revenue would not contribute to the total cost for all Jobs.

     

    The Root(<dimension_expression>) function returns a tuple with the All/Default member of every attribute/hierarchy in the dimension.

     

     

    Direct Slice

     

    To re-iterate, it makes sense to set the DirectSlice as above for the case of MDPS because we want to report the total cost for all jobs (e.g. [Job].[Reference].[All]) irrespective of whether revenue has been received for them.

     

    Using measure expressions brought the query time down to less than 20 seconds.

     

    SUMMARY

     

    Considering that querying Cost by Customer without any MDX or measure expressions (i.e. without the proportional split) takes about 12 seconds (due to sheer volume of data), I’d say measure expressions are indeed pretty performant!

     

     

     

  • Fact modelling and M2M dimensions in AS2005

    It is generally considered data warehousing best practice to split your facts into different tables if the granularity/dimensionality is different.  This is a very common question in data warehouse design.  Consider the following data model.

     

    Data Model

     

    The company traditionally sells its products through sales people visiting customer sites.  They now have a website which is another channel for sales.  The goods are delivered to the customer through Delivery Jobs.  A delivery job can deliver to multiple customers in one go.

     

    It might be tempting to have a single measure group for sales (rather than 2 as in the above model).  We might then create a UNION query for it in the DSV.  But what would we use for SalesPersionID in the OnlineSales SELECT statement?  We would have had to introduce a “Not Applicable” member in SalesPerson (possibly with SalesPersionID of -2) for this purpose.

     

    Although it might make sense in a transactional system to have a single Sales table, it is generally considered best practice to split out the facts.  I think even Kimball and Inmon might agree with each other on this one!

     

    Having split out the facts, we can of course still have a Total Sales measure that sums OnsiteSales and OnlineSales.  This works fine in the cube.  Assuming we have IgnoreUnrelatedDimensions set to false on both measure groups, if we were to query Total Sales by Sales Person, it would only ever return sales numbers from Onsite Sales.  In most cases, this works fine and it is what the user would expect.  We also then do not need to explain what on earth this “Not Applicable” member is all about.

     

    But here is the question we’ve been building up to: what if we want to query DeliveryJob count by Customer or by Product?   If we had taken the single sales fact table approach, this would have been a “walk in the park” (I was going to use the English rather than American equivalent of this analogy – the one whose acronym is POP – but I decided not to).  This would have allowed simply creating a many-to-many dimension between

              Customer => AllSales => DeliveryJob,

              SalesPerson => AllSales => DeliveryJob,

              Product => AllSales => DeliveryJob

    The Sales measure group would be used as the many-to-many mapping measure group.

     

    Having split out the facts, we cannot do this because there are effectively 2 many-to-many mapping measure groups.  Analysis Services 2005, despite is fantastically enhanced modelling capabilities, does not allow this.

     

    Here is a method, which in my opinion gets round this problem well.  Introduce the AllSales measure group that does UNION OnsiteSales and OnlineSales.

     

    Note: querying the OnsiteSales and OnlineSales measures are unaffected.  They still query the respective measure groups and have nothing to do with AllSales.

     

     

    Data Model

     

     

    The only purpose of the AllSales measure group is to act as a many-to-many mapping table against the DeliveryJob fact from Customer and Product.  We don’t need a “Not Applicable” member in SalesPerson because it is not related to AllSales.

     

    Here are the many-to-many relationships to query DeliveryJob Count:

     

    Dimension relationships

     

    AS2005 forces us to have at least one measure per measure group.  I therefore created an AllSales count, but it is not likely to ever be used by the end user.  Given this, I don’t believe it is even necessary to design aggregations against the AllSales measure group.  Designing aggregations for DeliveryJob alone should do it.

     

    I was recently faced with this modelling challenge although for completely different data.  I solved it using this method.  I think this could be a very common nut to crack when modelling complex relationships between entities with AS2005.  Many readers, when faced with this challenge, might have come up with the same solution.  If that is the case, hopefully it will take you less time to arrive at the solution as a result of this post!

     

     

     

     

     

  • AS 2005 Optimization Experiences

    I’m approaching the end of an enjoyable data warehousing project and we have collected some size/volume statistics around our Analysis Services 2005 cube.  We have:

     

    * 48 Dimensions.  This includes role-playing dimensions.

    * 751 Attributes and user hierarchies (!).  So if a dimension has 5 attributes and 2 user hierarchies, this counts as 7.

    * 21 Measure groups

    * 68 Measures (including calculated ones)

     

    This is all in one mother cube.

     

    It currently only has 6 months worth of data and the fact table with the most data has 2,195,236 rows.

    The dimension with the most members has 32,789 leaf-level members.

     

    The fact tables are likely to grow considerably over the next few years.  It may require partitioning in the future, but we don’t feel this will be an issue for at least a couple of years.

     

    The cube generally performs very well for user queries.  They have been using it in anger for a while now without any performance problems of note – although the beefier production server is quite robust.  Obviously, doing a regular cross join on thousands of members isn’t a good idea (as with any AS implementation).

     

    However, one thing that did make a noticeable difference (especially on dev/test where the servers are less beefy) is setting attribute relationship properties (Cardinality and RelationshipType).  I have of course configured the attribute relationships to define aggregation paths for natural hierarchies.  I had also heard that setting the Cardinality property and RelationshipType properties on the attribute relationships improves performance.  We are now able to push more obscure buttons that were previously out of bounds.

     

    When going through (all!) the attribute relationships, it became evident that about a quarter of them did not require one-to-many, which is the default Cardinality property.  An example of this might be where you have a surrogate key as an attribute (which might be required as an attribute for reference dimensions) and a business key as another attribute in the same dimension.  There would obviously be a one-to-one relationship between the two.

     

    We are doing full cube processing, so I set all the RelationshipType properties to Rigid.  You only need this set to Flexible if you are doing incremental processing for changing dimension members.  The documentation out there (e.g. http://msdn2.microsoft.com/en-us/library/ms166553.aspx ) is unclear whether this improves query time; it makes sense that it would improve processing time.

     

    As I said above, these 2 changes to the attribute relationship properties did seem to improve query performance.  I did both the changes in one go so I’m not sure which is the most effective.

     

    The whole cube now takes between 15 and 20 minutes to do a full process.  There was a time when it was taking many hours.  We traced it down to a query that Analysis Services was running against the warehouse relational database.  It was doing a join between 2 views (about quarter of a million rows).  We have a views metadata layer on top of the warehouse to allow reusability of business logic across reporting/analytical tools.  The view on the RHS was joining back to the one on the LHS for a derived column.  The execution plan showed a Cartesian type operation that was exploding the number of rows exponentially.  All we had to do was change the ETL calculate the derived column and, hey presto, cube processing was reduced to 15 minutes.

     

     

  • MDX Script: Current/Relative Period

    There are differing approaches out there on how best to get a current period member in MDX.  It is often a requirement to show, for example, [Current Month] and [Current Month - 1] members side by side and then perform comparison calculations between the two.  This allows OLAP reports to dynamically refresh the numbers when a new period arrives rather than depending on the user changing the slicing/members displayed.

    Here is a backup of an AS 2005 database that contains all the code in this post.  Just restore it and open using the File > Open > Analysis Services Database option.

    Current period is not to be confused with “relative period”.  Relative period tells us the period relative to the one in current context.  For example, if we have the following Sales data by year,

                  Sales
    2004       100
    2005       200
    2006       300

    We could cross join year with the relative year members to give us the following.

                                                          Sales
    2004             Relative Year               100
    2004             Relative Year - 1 
    2005             Relative Year                200
    2005             Relative Year - 1           100
    2006             Relative Year                300
    2006             Relative Year - 1           200
    2007             Relative Year 
    2007             Relative Year - 1           300

    The terminology of “Relative” vs. “Current” will obviously vary between implementations, but I think this is the clearest way of differentiating the two types of functionality for the purposes of this post.

    Current Period, on the other hand, gives us the following functionality.

                                 Current Year             Current Year - 1
    All Products            600                          500
    Bikes                     100                          150
    Cars                       200                          150
    Caravans                300                           200

    This simply saves the user having to change the year member when a new year arrives.  This might not seem like much to ask if it only has to be done once a year, but the same principle applies to Quarter, Month, Week, etc.  Having to change the Week member every Monday morning would be a little annoying.

    I chose to take a similar approach to the time intelligence wizard in AS2005.  I am using MDX script against calculated members (so I don’t need to worry about all the possible permutations for aggregation up the hierarchy as this doesn’t apply to calculated members).  I personally think MDX script is very cool.  Granted it is more difficult to get your head round than the traditional approach of defining calculations in the member declaration, but there are often real performance benefits as illustrated by this post.

    I have a Named Calculation column called “Date Calculations” in the date dimension table of my DSV and its value is hard coded to “Current Period”.  This gives us an attribute hierarchy in which to place our calculated members for time analysis.

    It is a convention to use the Current Period member (which is the DefaultMember) in the MDX script assignments CrossJoined with other members that contribute to the assignment.  Although this is not always necessary, it ensures that we have the required dimensionality and it can avoid infinite recursion.

    The current period calculations are quite simple MDX script assignments.  The Members( … ) function comes in useful for external functions (e.g. MDX sprocs) that return a string identifier for a given member.  I opted for the VBA approach as it is easy and works fine for my requirements.

    For completeness, I have also included a “Relative Year - 1” calculation.  This is a simple approach that is similar to the AS 2000 implementation described by this article.

    For a more sophisticated approach to relative period (and potentially more powerful depending on your requirements), see Chris Webb’s blog entry.  The advantage of doing it the way that Chris describes is that we have an attribute hierarchy for each level of relative period (e.g. year, quarter, month).  We can then group these into a user hierarchy.  This implementation is outside the scope of this post.  For my particular requirements, I chose to place all time calculations under the single [Time Calcs] attribute hierarchy.

     

    Here is the code:

    CALCULATE;

    CREATE MEMBER CURRENTCUBE.[Time].[Time Calcs].[Current Month]

    AS Null ;

    CREATE MEMBER CURRENTCUBE.[Time].[Time Calcs].[Current Month - 1]

    AS Null ;

    CREATE MEMBER CURRENTCUBE.[Time].[Time Calcs].[Current Year]

    AS Null ;

    CREATE MEMBER CURRENTCUBE.[Time].[Time Calcs].[Current Year - 1]

    AS Null ;

    CREATE MEMBER CURRENTCUBE.[Time].[Time Calcs].[Relative Year - 1]

    AS Null ;

    -- Current Month

      (

        [Time].[Time Calcs].[Current Month],

        {

            [Measures].[Consequence Count],

            [Measures].[Incident Count]

        }

      ) =

        Sum

        (

            { [Time].[Time Calcs].DefaultMember } *

            Members

            (

                "[Time].[Month].[" +

                VBA!Format(VBA!Now(), "MMM-yy") + "]"

            )

        );

    -- Current Month - 1

      (

        [Time].[Time Calcs].[Current Month - 1],

        {

            [Measures].[Consequence Count],

            [Measures].[Incident Count]

        }

      ) =

        Sum

        (

            { [Time].[Time Calcs].DefaultMember } *

            Members

            (

                "[Time].[Month].[" +

                VBA!Format(VBA!DateAdd("m", -1, VBA!Now()), "MMM-yy") + "]"

            )

        );

    -- Current Year

      (

        [Time].[Time Calcs].[Current Year],

        {

            [Measures].[Consequence Count],

            [Measures].[Incident Count]

        }

      ) =

        Sum

        (

            { [Time].[Time Calcs].DefaultMember } *

            Members

            (

                "[Time].[Year].[" +

                VBA!CStr(VBA!Year(VBA!Now())) + "]"

            )

        );

    -- Current Year - 1

      (

        [Time].[Time Calcs].[Current Year - 1],

        {

            [Measures].[Consequence Count],

            [Measures].[Incident Count]

        }

      ) =

        Sum

        (

            { [Time].[Time Calcs].DefaultMember } *

            Members

            (

                "[Time].[Year].[" +

                VBA!CStr(VBA!CInt(VBA!Year(VBA!Now())) - 1) + "]"

            )

        );

    -- Relative Year - 1

      (

        [Time].[Time Calcs].[Relative Year - 1],

        {

            [Measures].[Consequence Count],

            [Measures].[Incident Count]

        }

      ) =

        Sum

        (

            { [Time].[Time Calcs].DefaultMember } *

            ParallelPeriod

            (

                [Time].[Calendar].[Year], 1, [Time].[Calendar].CurrentMember

            )

        );


    We can now run the following 2 queries and they should return the same result; assuming it is still 2006 when you reading this post!

    SELECT

     {

      ([Time].[Year].&[2006]),

      ([Time].[Year].&[2005])

     } ON 0,

     {

      [Measures].Members

     } ON 1

    FROM

     [Current-Relative Date];

    SELECT

     {

      ([Time].[Time Calcs].[Current Year]),

      ([Time].[Time Calcs].[Current Year - 1])

     } ON 0,

     {

      [Measures].Members

     } ON 1

    FROM

     [Current-Relative Date];

    Here is a query that illustrates relative period.

    SELECT

     {

      ([Time].[Time Calcs].&[Current Period]),

      ([Time].[Time Calcs].[Relative Year - 1])

     } ON 0,

     {

      [Time].[Calendar].[Year].Members

      //[Time].[Calendar].[Quarter].Members (another option)

     } ON 1

    FROM

     [Current-Relative Date];

     

     

  • DeveloperDeveloperDeveloper

    Quick post to say how impressed I am with the DeveloperDeveloperDeveloper event.  The quality of the .NET presenters was very high.  There was lots of cool content around .NET V2 and V3.

     

    I presented my SQL Server Development Tips n’ Tricks session.  It went really well and I’m glad to have done one of these developer events in the UK.  My session recording can be downloaded from here.

     

  • Data Architect in CA!

    We have a major project kicking off north of LA. It should last at least a year, if not longer. We have filled most of the key roles. However, we still need a very good data architect with considerable data warehousing experience. If you fit the bill and you are up for it, contact me via my blog. Someone from the US would be preferable, but for the right candidate, we can make things happen!

     

  • Nice use of DML with output: update a collection in 1 db round trip

     

    I’ve seen a few Conchango developers in the past use sp_xml_preparedocument and OPENXML to commit a serialized collection object to the database.  We can do this in SQL Server 2000 and it avoids a round trip to the database for each collection member.  This can mean a huge performance improvement.  Imagine having to call an update stored proc 500 times for 500 members rather than just a single db call !  Applications that store collection(s) in memory prior to committal - e.g. 'save button' functionality - can benefit considerably from this approach.

     

    The only problem (in 2000) was that we could not get back the newly generated identity column values.  @@Identity or SCOPE_IDENTITY obviously don’t help because they relate to only one row.  This meant we’d often have to re-populate the collection after the update, which meant another hit on the database.

     

    However, in SQL Server 2005, we have DML with OUTPUT.  This allows us to resolve this – i.e. we can repopulate the xml with the new identity column values ready to be de-serialized.

     

    Here is an example of the update sproc for a Customer.Orders collection.

     

    /*

    <procname>usp_CustomerOrders_Update</procname>

    <summary>

    Update orders for a customer.orders collection

    </summary>

    <history>

    <entry date="2005-11-24" name="Christian Wade" action="Created" />

    </history>

    */

    CREATE PROCEDURE [dbo].[usp_CustomerOrders_Update]

    (

          @CustomerID     int,

          @Orders         xml   OUT

    )

    AS

          -- sample xml format:

          --<Orders>

          --  <Order OrderID="110" CustomerID="2" OrderValue="250.8711" OrderDate="2005-11-01T00:00:00" />

          --  <Order OrderID="111" CustomerID="2" OrderValue="350.8711" OrderDate="2005-11-01T00:00:00" />

          --  <Order OrderID="112" CustomerID="2" OrderValue="450.8711" OrderDate="2005-11-01T00:00:00" />

         --</Orders>

         

          BEGIN TRY

               

                -- prepare temp tables

                CREATE TABLE #InputOrder

                (

                      OrderID                         int,

                      CustomerID                      int,

                      OrderValue                      smallmoney,

                      OrderDate                       smalldatetime

                )

               

                CREATE TABLE #OutputOrder

                (

                      OrderID                         int,

                      CustomerID                      int,

                      OrderValue                      smallmoney,

                      OrderDate                       smalldatetime

                )

               

                -- populate input temp table

                INSERT INTO #InputOrder

                (

                      OrderID,

                      CustomerID,

                      OrderValue,

                      OrderDate

                )

                SELECT

                      OrdersTbl.rows.value('@OrderID', 'int'),

                      OrdersTbl.rows.value('@CustomerID', 'int'),

                      OrdersTbl.rows.value('@OrderValue', 'smallmoney'),

                      OrdersTbl.rows.value('@OrderDate', 'datetime')

                FROM

                      @Orders.nodes('/Orders/Order') OrdersTbl(rows)

               

                -- now we've prepared the data, update it in the database

                BEGIN TRAN

               

                      -- first we want to delete existing orders for

                      -- the same customer that are not in our input dataset

                      DELETE FROM dbo.[Order]

                      WHERE CustomerID = @CustomerID

                      AND OrderID NOT IN

                      (

                            SELECT

                                  OrderID

                            FROM

                                  #InputOrder

                      )

               

                      -- now update orders with the same id

                      UPDATE dbo.[Order]

                      SET

                      CustomerID = i.CustomerID,

                      OrderValue = i.OrderValue,

                            OrderDate = i.OrderDate

                      OUTPUT

                            Inserted.OrderID,

                            Inserted.CustomerID,

                            Inserted.OrderValue,

                            Inserted.OrderDate

                      INTO  #OutputOrder

                      FROM  dbo.Order c

                      INNER JOIN

                            #InputOrder i ON c.OrderID = i.OrderID

               

                      -- now insert values that are not already in the order table

                      INSERT dbo.[Order]

                      (

                            CustomerID,

                            OrderValue,

                            OrderDate

                      )

                      OUTPUT

                            Inserted.OrderID,

                            Inserted.CustomerID,

                            Inserted.OrderValue,

                            Inserted.OrderDate

                      INTO  #OutputOrder

                      SELECT

                            i.CustomerID,

                            i.OrderValue,

                            i.OrderDate

                      FROM

                            #InputOrder i

                      WHERE

                            i.OrderID NOT IN

                            (

                                  SELECT

                                        c2.OrderID

                                  FROM

                                        dbo.Order c2

                            )

               

                COMMIT TRAN

               

                SELECT

                      @Orders =

                      '<Orders>' +

                      (

                            SELECT

                                 o.OrderID,

                                 o.CustomerID,

                                 o.OrderValue,

                                 o.OrderDate

                            FROM

                                 #OutputOrder o

                            FOR XML AUTO

                      ) +

                      '</Orders>'

                FROM

                      #OutputOrder

               

          END TRY

         

    BEGIN CATCH

          EXECUTE usp_LogAndRethrowError

    END CATCH

     

     

    Here is the code that serialises the collection ready to be passed in as a parameter – and then deserializes it after calling the sproc.

    public static string Serialize(object obj) {
        StringWriter writer = new StringWriter();
        XmlSerializer xs = new XmlSerializer(obj.GetType());
        xs.Serialize(writer, obj);
        return writer.ToString();
    }
    public static object Deserialize(string xml, object obj) {
        XmlSerializer xs = new XmlSerializer(obj.GetType());
        StringReader reader = new StringReader(xml);
        return xs.Deserialize(reader);
    }

     

    Note: I had to use XmlAttributeAttribute on the properties of the Orders class to get it to serialize as the sproc expects it.

     

     

     

     

     

     
  • MDX Sprocs and Scripting: An Interesting Example

    A colleague of mine once asked me to do a “Number of Days since Last Incident” calculated measure (sounds like fun!).  Basically, he had a measure called Incidents and wanted a running count of days since the last incident.

     

    E.g.

    Date

            Incidents

            Days Since Last Incident

    1 Jan

            1

            0

    2 Jan

            0

            1

    3 Jan

            0

            2

    4 Jan

            2

            0

    5 Jan

            0

            1

    6 Jan

            0

            2

    7 Jan

            0

            3

     

    Easy peasy, right?  It obviously needs to maintain dimensionality of Incidents, so a good candidate for a calculated measure.

     

    Let’s evaluate the options for a simplified version that only works at the day level.  For this purpose, I created a simple cube: one measure group and 2 dimensions (Time and Territory).  Here is a screenshot of the cube DSV.

     

    I added a server time dimension to dimension IncidentDate for simplicity.  It has a standard Calendar hierarchy.

     

      

    Couldn’t be simpler, right?  The data and the solution with all the code can be downloaded here.

     

    Standard recursive MDX

     

    The traditional AS2000 approach would be to create a regular recursive MDX calculated member.  Here is the MDX.

     

    WITH MEMBER [Measures].[Days Since Last Incident]

    AS

        IIf

        (

                  //check at day level

            Not [Time].[Calendar].CurrentMember.Level IS [Time].[Calendar].[Date],

                  "NA",

            Case

                //check we are not at the beginning of the time dim

                When IsEmpty

                     (

                         [Time].[Calendar].CurrentMember.PrevMember

                     )

                Then 0

                //check if the current time member has incidents measure that is empty or 0

                When IsEmpty

                    (

                                      (

                                             [Time].[Calendar].CurrentMember,

                                             [Measures].[Incident Count]

                                      )

                    )

                Or  (

                        [Time].[Calendar].CurrentMember,

                        [Measures].[Incident Count]

                    ) = 0

                //if so, we need to calculate how many days since the last incident

                Then Sum

                    (

                        [Time].[Calendar].CurrentMember.PrevMember,

                        [Measures].[Days Since Last Incident]

                    ) + 1

                         //if not, there is at least one incident for the current day, so 0

                Else 0

            End

        )

    SELECT

           {

                  [Measures].[Incident Count],

                  [Measures].[Days Since Last Incident]

           } ON 0

        ,Descendants

        (

            [Time].[Calendar].[Year].[Calendar 2005],

            [Time].[Calendar].[Date]

        ) ON 1

    FROM

           [Recursive Mdx Sproc];

     

    In most cases, this would probably work fine.  However, in my test cube, the Incident Counts are sparse.  This means that the number of recursive calls for each day is quite large and this has an adverse effect on query performance.  This query takes 19 seconds on my laptop for all the days in 2005.

     

     

    MDX Scripting

     

    I was of course keen to try this using MDX scripting.  Here is the script.

     

    CALCULATE;

    CREATE MEMBER CURRENTCUBE.[Measures].[Days Since Last Incident]

        AS "NA";

    SCOPE ([Measures].[Days Since Last Incident]);

    ( [Time].[Date].Members ( 1 ) : Null ) =

    ( Case

        //check if the current time member has incidents measure that is empty or 0

        When IsEmpty

            (

                      (

                             [Time].[Calendar].CurrentMember,

                             [Measures].[Incident Count]

                      )

            )

        Or  (

                [Time].[Calendar].CurrentMember,

                [Measures].[Incident Count]

            ) = 0

        //if so, calculate how many days since the last incident using a recursive

        //call that increments by 1 for each call

        Then Sum

            (

                [Time].[Calendar].CurrentMember.PrevMember,

                [Measures].[Days Since Last Incident]

            ) + 1

        //if not, there is at least one incident for the current day, so 0

        Else 0

      End);

    //when at the beginning of time, evolve

    ( [Time].[Date].Members ( 0 ) ) = Null;

    END SCOPE;

     

    I was flabbergasted to see that the MDX scripting approach takes 1 second for the same query!!!  There are obviously some amazing optimizations available through MDX Scripting.

     

    MDX Sproc V1

     

    A voice in my head told me to try it using an MDX sproc.  Here is the .NET code.  It is by no means production quality; it is purely illustrating a concept.  I certainly was not expecting it to improve on 1 second!

     

    // recursive approach

    public static MDXValue DaysSinceLastIncident(

        Set days,

        Tuple incidentCountMeasure,

        int currentDayIndex)

    {

        try

        {

            // are we at the beginning of time?

            if (currentDayIndex == 0)

                return null;

            return DaysSinceLastIncidentRecursive(

                days,

                incidentCountMeasure,

                currentDayIndex,

                0);

        }

        catch (Exception ex)

        {

            return ex.ToString();

        }

    }

    private static int DaysSinceLastIncidentRecursive(

        Set days,

        Tuple incidentCountMeasure,

        int currentDayIndex,

        int dayCount)

    {

        // have we counted back to the beginning of time?

        if (currentDayIndex == 0)

            return 0;

        //check if the current time member has incidents measure that is empty or 0

        Expression exp = new Expression("[" + days.Tuples[currentDayIndex].Members[0].Caption + "]");

        MDXValue incidentCount = exp.Calculate(incidentCountMeasure);

        if (incidentCount == null || incidentCount.ToInt32() == 0)

        {

            //if so, calculate how many days since the last incident using a recursive

            //call that increments by 1 for each call

            return DaysSinceLastIncidentRecursive(

                days,

                incidentCountMeasure,

                currentDayIndex - 1,

                dayCount + 1);

        }

        else

        {

            //if not, there is at least one incident for the current day, so return

            return dayCount;

        }

    }

     

    In terms of what this code is doing, it is actually very similar to the standard recursive MDX approach above.  Probably the most significant difference is that, since we haven’t got an equivalent function to PrevMember in the Microsoft.AnalysisServices.AdomdServer namespace, we are passing in currentDayIndex and subtracting 1 instead.  Here is the MDX that uses it.

     

    WITH

    MEMBER [Measures].[Days Since Last Incident]

    AS

        IIf

        (

                  //check at day level

            Not [Time].[Calendar].CurrentMember.Level IS [Time].[Calendar].[Date],

                  "NA",

            //call the sproc and let it do the work

            ClassLibrary1.DaysSinceLastIncident

            (

                         [Time].[Date].[All].Children,

                         [Measures].[Incident Count],

                         Rank

                         (

                               [Time].[Date].CurrentMember,

                               [Time].[Date].[All].Children

                         ) - 1  // taking away 1 because 1-based

            )

        )

    SELECT

           {

                  [Measures].[Incident Count],

                  [Measures].[Days Since Last Incident]

           } ON 0

        ,Descendants

        (

            [Time].[Calendar].[Year].[Calendar 2005],

            [Time].[Calendar].[Date]

        ) ON 1

    FROM

           [Recursive Mdx Sprocs];

     

     

     

    As you can see, in order to pass in currentDayIndex, we use the Rank function.  The query takes 52 seconds on my machine.  Why does it take so much longer than the standard recursive MDX approach?  I guess the biggest reason is that we are taking hit for jumping to and fro from managed code to COM interop.

     

    However, this code does illustrate a concept. The recursive call has to be made n times for every single day member.  If we look at the required output dataset, should this really be necessary?

     

    Date

            Incidents

            Days Since Last Incident

    1 Jan

            1

            0

    2 Jan

            0

            1

    3 Jan

            0

            2

    4 Jan

            2

            0

    5 Jan

            0

            1

    6 Jan

            0

            2

    7 Jan

            0

            3

     

    Why can’t we just iterate the whole dataset once only taking a running count as we go along (rather than count back from every single day member)?  This would obviously vastly improve the query performance.  This is what I tried to do for MDX Sproc V2.

     

    MDX Sproc V2

     

    It is not possible for an MDX sproc to populate all the return data values for a set in one foul swoop.  Instead, our stored procedure DaysSinceLastIncident2 has to be called once for every single data value that is returned.  This makes it a little difficult to take our ‘one-iteration’ approach.  We are forced to do the iteration for the first call of the sproc, store the resulting data in memory, and use it for lookup purposes each time the sproc is subsequently called from MDX.  The following code is the only way I managed to do it.  Note: there are problems associated with it and these are discussed below.

     

    // keep cached lookup data in memory

    private static Hashtable daysSinceLastIncidentLookup;

    public static MDXValue DaysSinceLastIncident2(

        Set days,

        Tuple incidentCountMeasure,

        string currentDayCaption)

    {

        try

        {

            //initialize daysSinceLastIncidentLookup if necessary

            if (daysSinceLastIncidentLookup == null)

                InitializeDaysSinceLastIncidentLookup(days, incidentCountMeasure);

            return (Int32)daysSinceLastIncidentLookup[currentDayCaption];

        }

        catch (Exception ex)

        {

            return ex.ToString();

        }

    }

    private static void InitializeDaysSinceLastIncidentLookup(

        Set days,

        Tuple incidentCountMeasure)

    {

        daysSinceLastIncidentLookup = new Hashtable();

        Expression exp = new Expression();

        int daysSinceLastIncident = 0;

        bool foundFirstIncident = false;

        foreach (Tuple tpl in days.Tuples)

        {

            exp.ExpressionText = "[" + tpl.Members[0].Caption + "]";

            MDXValue incidentCountValue = exp.Calculate(incidentCountMeasure).ToInt32();

            if (incidentCountValue == null || incidentCountValue.ToInt32() == 0)

            {

                //i.e. no incident found, so count the day (assuming we've already had at least one incident)

                if (foundFirstIncident)

                    daysSinceLastIncident += 1;

            }

            else

            {

                foundFirstIncident = true;

                daysSinceLastIncident = 0;

            }

            daysSinceLastIncidentLookup.Add(

                tpl.Members[0].Caption,

                daysSinceLastIncident);

        }

    }

     

    And here is the MDX that calls it:

     

    WITH

    MEMBER [Measures].[Days Since Last Incident]

    AS

        ClassLibrary1.DaysSinceLastIncident2

        (

                  [Time].[Date].[All].Children,

                  [Measures].[Incident Count],

                  [Time].[Date].CurrentMember.Name

        )

    SELECT

           {

                  [Measures].[Incident Count],

                  [Measures].[Days Since Last Incident]

           } ON 0

        ,Descendants

        (

            [Time].[Calendar].[Year].[Calendar 2005],

            [Time].[Calendar].[Date]

        ) ON 1

    FROM

           [Recursive Mdx Sprocs];

     

    It takes 4 seconds.  This time, rather than pass in currentDayIndex, we pass in [Time].[Date].CurrentMember.Name, which is used for the lookup having prepared the data in memory.

     

    Unfortunately, it is necessary to store a lookup value for every day in the whole time dimension rather than just those days we need data for.  This is because we don’t know, at the point when the sproc is executed for the first member, what other members will follow.

     

    This code would not work in production.  It is solely to illustrate the concept.  Here are the reasons why it would not work.

    • There is no way of guaranteeing the lookup cache is re-initialized each time the outer MDX query is run.  Therefore, the second time the query is run, it would use the already existing cache.  If the dimensionality of the second query were different – e.g. sliced by a territory – we would get back incorrect results.  If we could at least determine the dimensionality of the current query from within an MDX sproc, we could re-initialize the cache for different dimensionality.  This would be a nice addition to the Context class.
    • Even if we could re-initialize the lookup cache, it could cause erroneous behaviour in a multi-user environment.  This is because it could be shared by multiple concurrent queries.

    So, in summary, MDX sprocs are cool.  They are a vast improvement on COM functions for MDX.  The fact that we can receive multidimensional objects such as Tuples, Sets, Members, etc. is fantastic.  As per my previous post, there is lots of potential in MDX sprocs.

     

    However, here is an enhancement illustrated by this article, which would (in my humble opinion) greatly increase the power of MDX sprocs.

    User-defined aggregate functions in SQL CLR have a method called Init() that is run for initialization.  Moreover, it is possible to instantiate the class decorated with the SqlUserDefinedAggregate attribute once per aggregate function call (not for each member that is then aggregated).  This means that class level, non-static variables can be reliably accessed by each method invocation of an aggregated member.

    Something conceptually similar for MDX sprocs would be fantastic.  This would allow initialization for the MDX query as a whole and then reliable access to a reserved memory space from each method invocation of a calculated data cell.

     

     

     

  • MDX Sprocs: Usage

    I am naturally interested in MDX stored procs so I tried them out.  I have often wondered how they compare to SQL CLR.  I hereby commit myself to (trying to) do a few blog posts in this space!  Here are some findings regarding when to use them.

    It is well known that the strengths of SQL CLR for real-life scenarios are as follows.

    1. Computationally intensive operations
    2. Iteration through lists
    3. Complicated / conditional execution logic
    4. Leveraging the .NET Framework for string handling, mathematical functions, regular expressions, etc.

    What about MDX sprocs?  Here is a summary of when I think they would be particularly useful.

     

    1. Complex calculations that require iteration.  Similarly to TSQL vs. SQL CLR, some custom calculations – e.g. financial – can be pretty much unworkable without iterating a dataset and .NET is good at this.  Not to mention the plethora of mathematical functions provided out-of-the-box by the .NET Framework.
    2. Real-time integration with data outside the cube.
      1. Dynamic sets (sets for which tuples/members are populated at runtime).  This could, for example, be used for a custom security implementation.
      2. We could also use data outside the cube to manipulate the cube data itself (not just which members are displayed).  For example, if we have a measure called Market Value and one of its inputs is current stock price(s), even the 5 minute latency (as would probably be the case with proactive caching) would be unacceptable - the numbers could be out by £ millions.  Also, if Market Value were the only measure in the cube with a real-time requirement, pro-active caching could be a sledgehammer to crack a nut.  Implementing this with an MDX sproc would be a very simple solution.

    Chris Webb has blogged about MDX sprocs recently.  Here are some other good ideas:

     

    1. Overcoming the limitations of drillthrough by accessing the relational data in the sproc.  Sounds promising – watch this space.
    2. Leveraging the .NET Framework for string handling functions etc.  For example, this could be used to filter dimension members.

    I’ve also seen examples of using MDX sprocs to administer AS through AMO, but I haven’t really seen a compelling reason to not just do it directly through AMO.

     

    Admittedly MDX sprocs are a bit of a niche area, but I think they have huge potential.

    Any other ideas for usefulness would be welcome.  I’m sure there are other clever ways of leveraging the biggest ever class library (the .NET Framework BCL).  I guess we just need to be a little creative!

     

  • Hello

    Hello!  I can’t believe it has been almost a year that I haven’t blogged.  I’ve been maxed out on projects and found it difficult to dedicate the time.

     

    My current project, which I have been at for over 6 months, is a data warehouse implementation.  I’ve enjoyed getting into the meat of AS 2005 and cementing my data warehouse design knowledge – with the help of Kristian Wedberg, Dan Perrin and Howard Pull.  We used Panorama for the front end.

     

    Proclarity came in to demo to us at one of our community days.  By co-incidence, it was 2 days before the big announcement.  I must say I was impressed with their product (before the acquisition).  I’d only ever really used Panorama in the past or created my own AS client, so it was an education.  Proclarity seemed quite bug free and with a consistent UI.  They also assured me that the pasting of direct MDX will work! (something I’ve never managed to get working properly with good old Panorama)

     

    I’ll be doing my SQL Server Development Tips and Tricks presentation at Developer Developer Developer Day in Microsoft Reading early June.  Should be fun and I’m looking forward to getting back in the swing of it.

     

    I spent a couple of days reading MDX Solutions 2nd Edition this week.  This is a great book – even better than the first edition!  Congrats to Chris Webb for this.

  • Professional Association for SQL Server

    I got back from PASS Europe on Saturday.  It was a GREAT convention!  It was the first ever PASS Europe and I’m proud to have been a part of it.  I hope this convention will grow to command the reputation and attendance it deserves.

     

    All the usual suspects were there in fine form - e.g. Kimberly Tripp, Gert Drapers, etc.  It was great to meet people face-to-face with whom I’d only had a virtual relationship. Chris Webb’s presentation entitled New Cube Design Features in Analysis Services 2005 and Darren Green / Allan Mitchell’s presentation entitled SQL Server 2005 Integration Services: Thinking Outside of the DTS Box were my favourites.

     

    I presented on SQL Server 2005 development; specifically SQL-CLR and TSQL.  It went very well and I am pleased because it was the first time I presented at PASS.  It was great preparation for the PASS 2005 Community Summit in Texas this September where I will also be presenting.  There were about 300 attendees in total at PASS Europe.  They normally expect about 3,000 for PASS USA!  Not to mention that it will take place just prior to the release of SQL Server 2005.  The anticipation will be immense and I wouldn't be surprised if Bill Baker announces the RTM date at the keynote.  We think there may be considerably more than 3,000 this time around; plus a lot of press.

     

    I’m particularly looking forward to Aaron Johal's session entitled Unlocking the understanding of locks in SQL Server 2005 and .NET and Bob Beauchemin’s pre-conference seminar entitled New Features in SQL Server 2005/.NET 2.0 for Developers.

     

     

     

Powered by Community Server (Personal Edition), by Telligent Systems