Archive for 'using'

I’ve had a lot of requests for more MDX content on my blog, so here’s something I’ve been meaning to write up for a long time: a worked example of how to use scoped assignments to implement two different types year-to-date calculation on two different hierarchies in the same dimension. Knowledge of how to use scoped assignments is the sign of a true MDX master (you can watch a video of a session I gave on the basics of scoped assignments at SQLBits here if you’re unfamiliar with them) but that’s because they can be very difficult to write and there’s surprisingly little information out there on the internet about them. They are incredibly powerful, though, and often they provide the most elegant and best-performing way to solve a problem.

Let’s start by looking at the Date dimension in the Adventure Works DW sample database, and more specifically the attributes, user hierarchies and attribute relationships:

image

image

Notice how we have two user hierarchies for Calendar Years (which start on January 1st) and Fiscal Years (which start on July 1st), called Calendar and Fiscal. Now, let’s say that you have a requirement to to show year-to-date values for a measure for both the Calendar and Fiscal hierarchies. It would be very easy to implement this as two separate calculated measures but what if you needed to show both types of year-to-date in the same calculated measure, showing Fiscal year-to-dates when the Fiscal hierarchy was used in a query and Calendar year-to-dates when the Calendar hierarchy was used in a query?

This is possible using scoped assignments. The first thing to point out, though, is that this is only going to be possible if you change the structure of the dimension. Why? Well, take a look at the Date levels of both hierarchies: they are both built using the Date attribute. If you were running a query with your YTD calculation on columns and only the Date attribute hierarchy on rows, would you expect to see Calendar or Fiscal YTD values? Certainly you couldn’t see both in the same cell, and this is the problem: if you expect to see Calendar YTD values at the bottom of your Calendar hierarchy, and Fiscal YTD values at the bottom of your Fiscal hierarchy, you need two separate Date attributes to do this. If you overwrite the values in the same cells twice using a scoped assignment, then you will only see the result of the second assignment.

Therefore, what we need to do is to create two new attributes, Calendar Date and Fiscal Date, to serve as the lowest levels of the Calendar and Fiscal hierarchies instead of the Date attribute. Here’s what the new attribute relationships look like:

image

From the end-user’s point of view nothing seems to have changed (these new attributes can have their AttributeHierarchyVisible property set to False) but this now means we have two attributes, two different slices of the cube, whose values we can overwrite separately instead of just one.

Now for the MDX. A good rule to follow when writing scoped assignments is to always use attribute hierarchies and never use user hierarchies; this is because there are rules you have to obey about the shape of the subcube of data you are overwriting with your scoped assignment. When defining a scope using only attribute hierarchies, you can only use the following types of set:

  • Every single member on the attribute hierarchy
  • Just one member on the attribute hierarchy
  • Multiple members on the attribute hierarchy not including the All Member

I wrote up a detailed set of rules for defining scopes in MDX Solutions, if you’re interested; if you don’t follow these rules you’ll get the infamous “An arbitrary shape of the sets is not allowed in the current context” error (I know a joke about that, incidentally).

For this calculation, we need to make two scoped assignments on a single calculated measure called [YTD Sales]: one to show the Fiscal YTD calculation for the Fiscal Date, Fiscal Month Name, Fiscal Quarter, Fiscal Semester and Fiscal Year attributes; and one to show the Calendar YTD calculation for the Calendar Date, Calendar Month Name, Calendar Quarter, Calendar Semester and Calendar Year attributes. When scoping on ranges of attributes like this, there’s another easy rule to follow: scope on the set of every member, including the All Member, from the attribute hierarchy at the lowest granularity attribute, and the set of every member, not including the All Member, from the highest granularity attribute. These two sets also need to be in the same, rather than separate, SCOPE statements for reasons I explain here.

This results in the following MDX:

CREATE MEMBER CURRENTCUBE.MEASURES.[YTD Sales] AS NULL;

 

SCOPE(MEASURES.[YTD Sales]);

    --Calendar YTD

    SCOPE([Date].[Calendar Date].MEMBERS, [Date].[Calendar Year].[Calendar Year].MEMBERS);

        THIS = AGGREGATE(

                PERIODSTODATE([Date].[Calendar].[Calendar Year], [Date].[Calendar].CURRENTMEMBER), 

                [Measures].[Internet Sales Amount]);

    END SCOPE;

    --Fiscal YTD

    SCOPE([Date].[Fiscal Date].MEMBERS, [Date].[Fiscal Year].[Fiscal Year].MEMBERS);

        THIS = AGGREGATE(

                PERIODSTODATE([Date].[Fiscal].[Fiscal Year], [Date].[Fiscal].CURRENTMEMBER),

                [Measures].[Internet Sales Amount]);

    END SCOPE;

END SCOPE;

Now you can see the YTD Sales calculated measure returns Calendar YTD values for the Calendar hierarchy:

image

…and it returns Fiscal YTD values for the Fiscal hierarchy:

image

There’s one last trick I want to show. It can be very difficult to know that your scoped assignment is covering the subcube you want it to cover, so while debugging scoped assignments I find it helps to assign values to the BACK_COLOR cell property so I can see exactly where I’m scoping. Here’s the MDX above with extra assignments to set the cell background colour to orange for the Calendar YTD calculation and blue for the Fiscal YTD calculations:

CREATE MEMBER CURRENTCUBE.MEASURES.[YTD Sales] AS NULL;

 

SCOPE(MEASURES.[YTD Sales]);

    --Calendar YTD

    SCOPE([Date].[Calendar Date].MEMBERS, [Date].[Calendar Year].[Calendar Year].MEMBERS);

        THIS = AGGREGATE(

                PERIODSTODATE([Date].[Calendar].[Calendar Year], [Date].[Calendar].CURRENTMEMBER), 

                [Measures].[Internet Sales Amount]);

        BACK_COLOR(THIS) = RGB(255,128,64);

    END SCOPE;

    --Fiscal YTD

    SCOPE([Date].[Fiscal Date].MEMBERS, [Date].[Fiscal Year].[Fiscal Year].MEMBERS);

        THIS = AGGREGATE(

                PERIODSTODATE([Date].[Fiscal].[Fiscal Year], [Date].[Fiscal].CURRENTMEMBER),

                [Measures].[Internet Sales Amount]);

        BACK_COLOR(THIS) = RGB(64,128,255);

    END SCOPE;

END SCOPE;

This now shows up in an Excel PivotTable as you can see below:

image

image

It can also help you understand what’s going on in the scenarios where the scopes overlap, for example if you put the Calendar and Fiscal hierarchies on rows and columns in the same PivotTable: the Fiscal hierarchy takes precedence, because it’s defined second.

image


Chris Webb’s BI Blog

One of the first questions I get asked after showing someone PowerPivot for the first time is “Can I add new data to a PowerPivot table that already has data in it?”. Out of the box, of course, the answer is no: when you process a table in PowerPivot you have to reload all the data from your data source, you can’t just append new data (unless you’re using copy/paste to load data, which isn’t a good idea). However, there are a lot of self-service BI scenarios where the ability to do this would be extremely useful: for example, you might want to scrape stock quotes from a web page every day and then, in an Excel workbook, accumulate that data in a table so you can analyse historical stock prices with PowerPivot. I ran into a scenario very much like this last week and I thought that Data Explorer should be able to help here. It can, but it’s not obvious how to do it – hence this blog post!

Here’s a super-simple example of how to accumulate data in a table then. Let’s start with a csv file that contains the following data:

Product,Sales
Apples,1
Oranges,2

It’s straightforward to import this data into Excel using Data Explorer and the ‘From csv’ data source:

image

 

Here’s the code that Data Explorer generates:

let

    Source = Csv.Document(File.Contents("C:\InputData.csv")),

    FirstRowAsHeader = Table.PromoteHeaders(Source),

    ChangedType = Table.TransformColumnTypes(FirstRowAsHeader,

                              {{"Product", type text}, {"Sales", type number}})

in

    ChangedType

 

Now, let’s imagine that you want to keep the data from this file in Excel and every time you click Refresh in Data Explorer you add the data from the file onto the end of the existing data you’ve already captured. The first thing you’ll probably want to do in this scenario is add a new column to the data that gives the date and time that the data was loaded, and you can do that quite easily in Data Explorer using the DateTimeZone.UtcNow() function as follows:

Table.AddColumn(ChangedType, "Load Date", each DateTimeZone.UtcNow())

Data Explorer has functionality to append the data from one query onto the end of another query, but the problem you have to solve now is that when you click Refresh you want the new data to be appended onto the end of the data that has already been collected. It’s a recursive scenario not unlike the one I grappled with here. The solution to this problem is to first of all load the data into the PowerPivot (ie what we should be calling the Excel Data Model now) by clicking on the Load To Data Model link in the Data Explorer query pane:

image

Then, on a new sheet, create an Excel query table that returns all the data from the PowerPivot table that you’ve just loaded data into. Kasper shows how to do this here; there’s no need for any special DAX, you just need to connect to the PowerPivot table in the Existing Connections dialog:

image

image

At this point you should have two tables on two sheets that contain the same data. The next step is to modify the original Data Explorer query so that it contains a new step that appends data from the table you’ve just created (ie the table getting the data from PowerPivot) onto the data from the csv file. This can be done with three new steps, first to get the data from the new Excel table:

Excel.CurrentWorkbook(){[Name="ExistingData"]}[Content]

Then to make sure the Load Date is treated as a DateTimeZone type:

Table.TransformColumnTypes(GetExistingData,{{"Load Date", type datetimezone}})

Then finally to combine the two tables:

Table.Combine({ChangedType1,InsertedCustom})

Now, whenever you Refresh your Data Explorer query, you will see the data from the csv file appended to the data that has already been loaded:

image

image

Here’s the complete code:

let

    Source = Csv.Document(File.Contents("C:\InputData.csv")),

    FirstRowAsHeader = Table.PromoteHeaders(Source),

    ChangedType = Table.TransformColumnTypes(FirstRowAsHeader,

                  {{"Product", type text}, {"Sales", type number}}),

    InsertedCustom = Table.AddColumn(ChangedType, "Load Date", each DateTimeZone.UtcNow()),

    Custom1 = Excel.CurrentWorkbook(){[Name="Table_Input_Data"]}[Content],

    ChangedType1 = Table.TransformColumnTypes(Custom1,{{"Load Date", type datetimezone}}),

    Custom2 = Table.Combine({ChangedType1,InsertedCustom})

in

    Custom2

Now as I said, this is just a super-simple example and in the real world you’d need extra functionality to do things like delete rows you’ve already loaded and so on; but that’s all doable I think. It’s also worth mentioning that I encountered some strange errors and behaviour when implementing this, partly due to Data Explorer still being in preview I guess, so if you want to recreate this query you’ll need to follow my instructions exactly.

You can download the sample workbook here, and the csv file here.


Chris Webb’s BI Blog

Creating And Using Functions in Data Explorer

I’ve been reading through the Data Explorer Formula Language and Library specifications (which can be downloaded here) over the last few weeks, learning about what’s possible with it when you go beyond the UI. This will translate into a lot of Data Explorer blog posts in the coming weeks…! Here’s the first of those posts.

Today I’m going to look at how to create and use functions. Page 86 onwards in the Formula Language specification has a lot of detail about how to define a function in the M language that Data Explorer uses, but it doesn’t include any practical examples of how functions can be created and used in Data Explorer and Excel unfortunately. I’ve done some experimentation, though, and this post aims to fill that gap.

First of all, take a look at this Excel worksheet with two tables in it, called InputNumbers1 and InputNumbers2:

image

Let’s say that you want to create two Data Explorer queries that each use one of these tables as a source, and in both cases you want to add a third column that displays the product of the two existing columns. So, for example, for the first table you want a new column with the values 2, 12, 30 and 56 in.

Now this is a very simple calculation and certainly one that Data Explorer can handle easily, but let’s say that you want to perform this operation many times in many different queries and for obvious reasons you don’t want to have to duplicate the same logic in each query. You want to create a function!

To do this, you need to use the Write Query option from the From Other Sources button on the Data Explorer tab in the ribbon:

image

Then change the name of the new query to MultiplicationFunction by double-clicking on the query name in large letters at the top of the Edit Query dialog:

image

Then enter the following code as the only step:

= (x,y) => x * y

This defines a query with two parameters, x and y, and returns the product of x and y. Click Done to save the query. The query will show the following result in the worksheet:

image

You can ignore this new table, and you can disable the data load onto the worksheet, but it seems like you always need to have the table there and you certainly can’t delete it.

To use this new function create a new query from the table InputNumbers1 (shown above) using the From Table button. Then in the Edit Query dialog create a new step by selecting Insert Column/Custom:

image

And then use the new function to calculate each row in the new column, passing the two existing columns as parameters:

image

This results in the following step:

= Table.AddColumn(Source, "Custom", each MultiplicationFunction([FirstNumber],[SecondNumber]))

image

And so you have your new column showing the product of the [FirstNumber] and [SecondNumber] columns. Then do the same thing for the InputNumbers2 table and you’ll have two Data Explorer queries now with custom columns in:

image

image

Finally, to make sure that the function is indeed being used, go and alter the definition of the function to be

= (x,y) => x / y

Refresh both of the other queries, and you’ll see the contents of the custom column have changed for both of them:

image

image

I can imagine that the ability to break functionality out into functions will reduce the complexity, and improvement the maintainability, of many ETL solutions that get built with Data Explorer in the future.

You can download my sample Excel workbook with this demo in here.


Chris Webb’s BI Blog

 Page 1 of 5  1  2  3  4  5 »