Wednesday, August 01, 2012

Oracle ORAOLEDB Data Type Hokeyness

My current project pits me against an Oracle Data Source within an Analysis Services multidimensional cube.  We've had a few battles so far, and I think I'm getting closer to winning.

The first quirk is just connecting.   Using the ORAOLEDB.1 provider, we specify / as the user name with a blank password, and ensure connection is trusted.  Under Advanced - Extended Properties, we specify a FetchSize=10000;defaultRowPrefech=10000; (perhaps larger).  This is supposed to improve performance of processing the cube.

The second quirk is creating your DSV.  When using the Oracle connection, it lists all tables in all schemas.    The Oracle environment here is defined by separate schemas instead of separate databases.  Microsoft likes the database segmentation approach.  In a development environment, we have access to everything.   It's not _that_ slow with 18,000 tables and 3x the amount of views, but preselecting a schema would be a nice thing to have.

This laundry list of schemas and tables has a larger impact on the audience using Power Pivot, which regulates Oracle to an "Other Data Source" while my client uses it as their primary data source.  Searching through 18,000 tables, and also placing the views at the bottom in unsorted order is not the most friendly interface for building models.  Exposing the data using OData links is probably the preferred approach anyway... but there has to be a better way to deal with using Oracle as a data source for Power Pivot.

One major quirk/show stopper using Analysis Services against Oracle is with data types.  Sometimes the data types don't get reflected properly.  A Numeric() data type in Oracle is a dynamic type, which SQL doesn't like.  Casting this to NUMERIC(9,0) should theoretically provide you with Int, or Int32 in Analysis Services world.

It doesn't always.  In an Analysis Services data source view, sometimes it doesn't pick up your changes.  Casting a number to NUMERIC(9,0) in the Oracle View and refreshing doesn't appear to do anything.  My data type is still Int64 or Decimal or whatever the Oracle decides is in my future.

The workaround is to use named queries everywhere.  Any changes to the underlying data model requires changing something "case or space or anything" in the named query.  Refresh View doesn't seem to work.

Analysis Services appears to then pick up data types properly.
NUMERIC(5,0) becomes Byte (SmallInt)
NUMERIC(6,0 - 9,0) Becomes Int32 (Int)
NUMERIC(10,0 - xxx,0) becomes Int64 (Bigint)

Anything with decimal places becomes a decimal.  Cast your decimals and round them to ensure precision doesn't blow up the cube.

Seems like an acceptable workaround.  The other way is to manually change the DSV xml.  I will be avoiding that one at all costs.

When you build out your multidimensional cube, all surrogate keys and numeric identity keys should be of data type Int32.  You can use smaller sizes for smaller dimension tables, but usually the tradeoff for size/performance vs. consistency is negligible.  The exception to the Int rule is when you're expecting lots of rows in your dimensions or facts... 2 billion rows?  Int64 or BigInt is what you want.

Would like to hear from others building cubes and PowerPivot models against Oracle data sources.   What tricks or quirks do you see when dealing with non-Microsoft sources?

No comments: