Thursday, August 30, 2012

Understanding Analysis Services Deployment Modes - Analysis Services and PowerPivot Team Blog - Site Home - MSDN Blogs

Analysis Services has 3 instance install modes.  Multidimensional is for the "legacy" Analysis Services cubes.  Tabular is for a standalone tabular server hosting persisted tabular models, or temporary models created by Visual Studio.  Sharepoint mode is for integration with PowerPivot functionality in Sharepoint.

There are some key features which may force you to select both the Tabular and PowerPivot modes for different functional requirements.

Understanding Analysis Services Deployment Modes - Analysis Services and PowerPivot Team Blog - Site Home - MSDN Blogs: "Obviously, this way of connecting to a PowerPivot database is very different from connecting to a Tabular database, but the differences between Tabular and SharePoint modes go even deeper than that because the databases themselves are very different in nature. Tabular databases 
are permanent databases. PowerPivot databases, on the other hand, are temporary. They can be deleted from the SSAS server and reloaded from their workbooks at any time."

'via Blog this'

Wednesday, August 08, 2012

Spider Schema

The Spider Schema is an interesting approach to the distinct count performance issue with Analysis Services, and modelling data in general.  It is an approach to fix some of the flaws with star schema design.  It adds an intermediate table between dimension tables and fact tables, which house the unique key combinations for dimensions in the fact.

It sounds a bit like the Data Vault architecture promoted by Dan Linstedt, though targeted at reporting on vs vaulting the data.

More details here.

About Me | Spider Schema:

Tuesday, August 07, 2012

Excel 2013's Data Model Feature

Excel is turning into a scaled-down version of Microsoft Access.
The new Data Model feature lets you build out tables and relationships in your spreadsheet.  No more vlookups!

http://office.microsoft.com/en-us/excel-help/create-a-data-model-in-excel-HA102923361.aspx

I have highlighted a new option in the create PivotTable dialog which is to “Add this data to the Data Model”. So what is this Data Model I speak of?


“A Data Model is a new approach for integrating data from multiple tables, effectively building a relational data source inside the Excel workbook. Within Excel, Data Models are used transparently, providing data used in PivotTables, PivotCharts, and Power View reports“. Read more here…

http://blogs.office.com/b/microsoft-excel/archive/2012/08/23/introduction-to-the-data-model-and-relationships.aspx
Data models utilize the features of PowerPivot, now embedded into Excel.  They are managed by the PowerPivot add-in, and more structured than Excel tables.

It would be nice to see if there is some way of 'upsizing' or linking these data tables to SQL Server, as we had with Access.  Being able to give users the ability to store large datasets on the server without admin intervention could change the way they work with Excel.




Wednesday, August 01, 2012

Oracle ORAOLEDB Data Type Hokeyness

My current project pits me against an Oracle Data Source within an Analysis Services multidimensional cube.  We've had a few battles so far, and I think I'm getting closer to winning.

The first quirk is just connecting.   Using the ORAOLEDB.1 provider, we specify / as the user name with a blank password, and ensure connection is trusted.  Under Advanced - Extended Properties, we specify a FetchSize=10000;defaultRowPrefech=10000; (perhaps larger).  This is supposed to improve performance of processing the cube.

The second quirk is creating your DSV.  When using the Oracle connection, it lists all tables in all schemas.    The Oracle environment here is defined by separate schemas instead of separate databases.  Microsoft likes the database segmentation approach.  In a development environment, we have access to everything.   It's not _that_ slow with 18,000 tables and 3x the amount of views, but preselecting a schema would be a nice thing to have.

This laundry list of schemas and tables has a larger impact on the audience using Power Pivot, which regulates Oracle to an "Other Data Source" while my client uses it as their primary data source.  Searching through 18,000 tables, and also placing the views at the bottom in unsorted order is not the most friendly interface for building models.  Exposing the data using OData links is probably the preferred approach anyway... but there has to be a better way to deal with using Oracle as a data source for Power Pivot.

One major quirk/show stopper using Analysis Services against Oracle is with data types.  Sometimes the data types don't get reflected properly.  A Numeric() data type in Oracle is a dynamic type, which SQL doesn't like.  Casting this to NUMERIC(9,0) should theoretically provide you with Int, or Int32 in Analysis Services world.

It doesn't always.  In an Analysis Services data source view, sometimes it doesn't pick up your changes.  Casting a number to NUMERIC(9,0) in the Oracle View and refreshing doesn't appear to do anything.  My data type is still Int64 or Decimal or whatever the Oracle decides is in my future.

The workaround is to use named queries everywhere.  Any changes to the underlying data model requires changing something "case or space or anything" in the named query.  Refresh View doesn't seem to work.

Analysis Services appears to then pick up data types properly.
NUMERIC(5,0) becomes Byte (SmallInt)
NUMERIC(6,0 - 9,0) Becomes Int32 (Int)
NUMERIC(10,0 - xxx,0) becomes Int64 (Bigint)

Anything with decimal places becomes a decimal.  Cast your decimals and round them to ensure precision doesn't blow up the cube.

Seems like an acceptable workaround.  The other way is to manually change the DSV xml.  I will be avoiding that one at all costs.

When you build out your multidimensional cube, all surrogate keys and numeric identity keys should be of data type Int32.  You can use smaller sizes for smaller dimension tables, but usually the tradeoff for size/performance vs. consistency is negligible.  The exception to the Int rule is when you're expecting lots of rows in your dimensions or facts... 2 billion rows?  Int64 or BigInt is what you want.

Would like to hear from others building cubes and PowerPivot models against Oracle data sources.   What tricks or quirks do you see when dealing with non-Microsoft sources?