Thursday, November 29, 2012

Performance tips for SSAS | James Serra's Blog

Performance tips for SSAS | James Serra's Blog: "Over the past few years I have developed a list of ways to improve performance in SSAS and in two of the tools that use it: ProClarity and PerformancePoint.  Below are those performance tips along with other important items to be aware of:"

The one that really helped us was to move parameters from the where clause to a subquery.  This can improve performance drastically.

'via Blog this'

Ranting about Cumulative Updates

Don't get me wrong.  Fix it fast and fix it often is a good motto to have when addressing software bugs.  However, trying to decode what was changed between versions isn't so easy.  Especially with products like Microsoft Sharepoint.

Sharepoint has had umpteen patches since Service Pack 1.  Each one is outlined in separate KB articles by install file.  This works out to about 100 tabs open in the browser to figure out what is in each cumulative update.

Is there not a database that has this stuff?

On another note, SQL 2012 SP1 is released, which doesn't include the fixes contained in CU3 & CU4, so download SQL 2012 SP1 and SQL 2012 SP1 CU1.

Monday, November 12, 2012

Big Data Vendor Landscape

A very comprehensive post on the current (well, June anyway) picture of Big Data Vendors along with plenty of contact information.

http://www.rosebt.com/1/post/2012/06/big-data-vendor-landscape.html

Vertica is really eating everyone else's pie.  They partner with many of the traditional front-end BI app developers, including Cognos, Microstrategy and Tableau.

http://www.vertica.com/partners/business-intelligence/

Friday, November 02, 2012

Monday, October 29, 2012

LINKMEMBER MDX Syntax

Some helpful tricks with LINKMEMBER.  Linkmember is used when the cube has multiple hierarchies that could be linked together.  Eg. Order Date & Ship Date could be linked together, even though no relationship exists at the fact level.  Or in my case Security Code and Security Underlier Code.

The key issue I had was that the HideUnrelatedDimension property was set to false in the cube.  This means that you need to explicitly specify the "all" member in the linkmember function, and exclude the all member from the scope of the calculation statement.

Details here.

http://rklh.blogspot.ca/2012/01/scope-statement-on-calculated-measure.html

Thursday, October 04, 2012

Interesting MDX Feature

Apparently there's a keyword test in MDX that returns an empty value, even if the cube doesn't contain a measure called test...

select test on 0
from Adventureworks

Easter egg?

Wednesday, October 03, 2012

Humor in code

So I've been getting this error fairly regularly and thought that Mario's arch enemy had done something to my PC.  Lately it has been shutting off without warning.

The Source was "Bowser"



The browser driver has received too many illegal datagrams from the remote computer ---- to name WORKGROUP on transport NetBT_Tcpip_{-----}. The data is the datagram. No more events will be generated until the reset frequency has expired.
http://blogs.msdn.com/b/larryosterman/archive/2011/05/02/reason-number-9-999-999-why-you-don-t-ever-use-humorous-elements-in-a-shipping-product.aspx

Apparently this means my computer is beyond repair, or at least the network card is sending out line noise. 

Another one that happens every minute since I installed Windows 8:

A corrected hardware error has occurred.
Reported by component: Processor Core
Error Source: Corrected Machine Check
Error Type: No Error
Processor ID: 1
The details view of this entry contains further information.

If it's corrected, how come it keeps showing up every minute?
If it's not an error, why is it called a corrected hardware error?

I always thought the event viewer should have ads for software that fixes event viewer errors or offers up places to buy hardware. That could be a great idea for extra MS revenue...

Reading Larry's blog, I finally figured out how I fixed that insurance office network 17 years ago.
http://blogs.msdn.com/b/larryosterman/archive/2005/01/11/350800.aspx

Flipping the network cards from Auto Detect to Full Duplex solved the issue, since 1 card was blowing up the entire network.

 

Tuesday, September 25, 2012

Grouping in SQL RS

One issue we recently had was that grouping wasn't filtering properly on the report.  If you are having issues grouping and filtering items in Reporting Services, make sure the detail row is not deleted.  If you have to, set the Row Visibility to hidden, but don't delete the detail.

This solved our filtering issue.

Wednesday, September 12, 2012

Database documentation tool - Elsasoft SqlSpec

Documentation is probably something that many people dread.  This tool automates the process of generating documentation across many platforms, including Analysis Services, Teradata and Oracle.

Database documentation tool - Elsasoft SqlSpec:

'via Blog this'

Thursday, August 30, 2012

Understanding Analysis Services Deployment Modes - Analysis Services and PowerPivot Team Blog - Site Home - MSDN Blogs

Analysis Services has 3 instance install modes.  Multidimensional is for the "legacy" Analysis Services cubes.  Tabular is for a standalone tabular server hosting persisted tabular models, or temporary models created by Visual Studio.  Sharepoint mode is for integration with PowerPivot functionality in Sharepoint.

There are some key features which may force you to select both the Tabular and PowerPivot modes for different functional requirements.

Understanding Analysis Services Deployment Modes - Analysis Services and PowerPivot Team Blog - Site Home - MSDN Blogs: "Obviously, this way of connecting to a PowerPivot database is very different from connecting to a Tabular database, but the differences between Tabular and SharePoint modes go even deeper than that because the databases themselves are very different in nature. Tabular databases 
are permanent databases. PowerPivot databases, on the other hand, are temporary. They can be deleted from the SSAS server and reloaded from their workbooks at any time."

'via Blog this'

Wednesday, August 08, 2012

Spider Schema

The Spider Schema is an interesting approach to the distinct count performance issue with Analysis Services, and modelling data in general.  It is an approach to fix some of the flaws with star schema design.  It adds an intermediate table between dimension tables and fact tables, which house the unique key combinations for dimensions in the fact.

It sounds a bit like the Data Vault architecture promoted by Dan Linstedt, though targeted at reporting on vs vaulting the data.

More details here.

About Me | Spider Schema:

Tuesday, August 07, 2012

Excel 2013's Data Model Feature

Excel is turning into a scaled-down version of Microsoft Access.
The new Data Model feature lets you build out tables and relationships in your spreadsheet.  No more vlookups!

http://office.microsoft.com/en-us/excel-help/create-a-data-model-in-excel-HA102923361.aspx

I have highlighted a new option in the create PivotTable dialog which is to “Add this data to the Data Model”. So what is this Data Model I speak of?


“A Data Model is a new approach for integrating data from multiple tables, effectively building a relational data source inside the Excel workbook. Within Excel, Data Models are used transparently, providing data used in PivotTables, PivotCharts, and Power View reports“. Read more here…

http://blogs.office.com/b/microsoft-excel/archive/2012/08/23/introduction-to-the-data-model-and-relationships.aspx
Data models utilize the features of PowerPivot, now embedded into Excel.  They are managed by the PowerPivot add-in, and more structured than Excel tables.

It would be nice to see if there is some way of 'upsizing' or linking these data tables to SQL Server, as we had with Access.  Being able to give users the ability to store large datasets on the server without admin intervention could change the way they work with Excel.




Wednesday, August 01, 2012

Oracle ORAOLEDB Data Type Hokeyness

My current project pits me against an Oracle Data Source within an Analysis Services multidimensional cube.  We've had a few battles so far, and I think I'm getting closer to winning.

The first quirk is just connecting.   Using the ORAOLEDB.1 provider, we specify / as the user name with a blank password, and ensure connection is trusted.  Under Advanced - Extended Properties, we specify a FetchSize=10000;defaultRowPrefech=10000; (perhaps larger).  This is supposed to improve performance of processing the cube.

The second quirk is creating your DSV.  When using the Oracle connection, it lists all tables in all schemas.    The Oracle environment here is defined by separate schemas instead of separate databases.  Microsoft likes the database segmentation approach.  In a development environment, we have access to everything.   It's not _that_ slow with 18,000 tables and 3x the amount of views, but preselecting a schema would be a nice thing to have.

This laundry list of schemas and tables has a larger impact on the audience using Power Pivot, which regulates Oracle to an "Other Data Source" while my client uses it as their primary data source.  Searching through 18,000 tables, and also placing the views at the bottom in unsorted order is not the most friendly interface for building models.  Exposing the data using OData links is probably the preferred approach anyway... but there has to be a better way to deal with using Oracle as a data source for Power Pivot.

One major quirk/show stopper using Analysis Services against Oracle is with data types.  Sometimes the data types don't get reflected properly.  A Numeric() data type in Oracle is a dynamic type, which SQL doesn't like.  Casting this to NUMERIC(9,0) should theoretically provide you with Int, or Int32 in Analysis Services world.

It doesn't always.  In an Analysis Services data source view, sometimes it doesn't pick up your changes.  Casting a number to NUMERIC(9,0) in the Oracle View and refreshing doesn't appear to do anything.  My data type is still Int64 or Decimal or whatever the Oracle decides is in my future.

The workaround is to use named queries everywhere.  Any changes to the underlying data model requires changing something "case or space or anything" in the named query.  Refresh View doesn't seem to work.

Analysis Services appears to then pick up data types properly.
NUMERIC(5,0) becomes Byte (SmallInt)
NUMERIC(6,0 - 9,0) Becomes Int32 (Int)
NUMERIC(10,0 - xxx,0) becomes Int64 (Bigint)

Anything with decimal places becomes a decimal.  Cast your decimals and round them to ensure precision doesn't blow up the cube.

Seems like an acceptable workaround.  The other way is to manually change the DSV xml.  I will be avoiding that one at all costs.

When you build out your multidimensional cube, all surrogate keys and numeric identity keys should be of data type Int32.  You can use smaller sizes for smaller dimension tables, but usually the tradeoff for size/performance vs. consistency is negligible.  The exception to the Int rule is when you're expecting lots of rows in your dimensions or facts... 2 billion rows?  Int64 or BigInt is what you want.

Would like to hear from others building cubes and PowerPivot models against Oracle data sources.   What tricks or quirks do you see when dealing with non-Microsoft sources?


Wednesday, July 25, 2012

The beginning of the end of NoSQL — Too much information

Earlier this year, Gartner killed the term Business Intelligence and its associated Business Intelligence Competency Center (BICC) acronym, and "introduced" the term Business Analytics and the concept of Business Analytic Teams (BATs).

Some NoSQL-categorized vendors now prefer to be called anything but NoSQL, since this connotation lumps together various technologies into a single buzzword acronym.

CouchDB is often categorized as a “NoSQL” database, a term that became increasingly popular in late 2009, and early 2010. While this term is a rather generic characterization of a database, or data store, it does clearly define a break from traditional SQL-based databases. A CouchDB database lacks a schema, or rigid pre-defined data structures such as tables. Data stored in CouchDB is a JSON document(s). The structure of the data, or document(s), can change dynamically to accommodate evolving needs.

So if the term NoSQL is dead, what is the replacement?  NewSQL?  The lack of creativity is amazing.

A database is only a database if it is an organized collection of data.  Are NoSQL databases really organized or are they freeform?  If it's not a database, what is this unstructured set of information called?

Another term that could be headed for the Furby pile is "Big Data" which is apparently a trademark of a corporation.  Massive Data sounds better to me anyway...



The beginning of the end of NoSQL — Too much information

Speaking of Furbys...
http://www.engadget.com/2012/07/06/furby-hands-on-video/

Query Languages and technology Mashups

SQL (Structured Query Language, See-Kwell) has been around for awhile.  IBM engineers developed SEQUEL (Structured English Query Language) in the early 70s to support building and querying relational models.  It was adopted as an ANSI standard in the mid-80s.  I've been using it since the days of Informix & Sybase, prior to SQL Server 6.5.  Various flavours and implementations have been created since then, with the key ones being SQL Server's T-SQL, Oracle's PL/SQL, and IBM's SQL PL, and PostGres PL/SQL, and the various offshoots of those languages.

MDX (Multidimensional Expressions, or Mosha's Data Expressions) are used to query Analysis Services multidimensional and tabular cube models.  Instead of a single axis of rows being returned, there is the potential to return 128 axes of data.  Usually it's just rows, columns, and sometimes pages, since thinking in 128 dimensions hurts most peoples brains.

DMX (Data Mining Extensions) is a a SQL-like query language used for creating and pulling data from multidimensional mining models.  The original team lead for Data Mining at MS is now CTO at a company called Predixions, implementing mining models using PowerPivot.

DAX (Data Analysis Expressions) supercedes MDX for the Analysis Services / PowerPivot tabular models.  MDX is still used in the background for Excel when querying PowerPivot models, but DAX is the way you write expressions and calculated functions.

NoSQL (Not Only SQL) isn't really a language, it is a technology categorization.  NoSQL databases like those used for Facebook and Google aren't necessarily relational in nature.  Usually they are key-value stores.  Think thin columns and deep rows.  Microsoft has Windows Azure Blob Storage for it's NoSQL offering, with others on the way.  HQL (Hive Query Language) is one way of querying a NoSQL database running on Hadoop.  Not to be mistaken with Hibernate Query Language, used to query the ORM framework NHibernate.  Most NoSQL databases fail the ACID test, and aren't necessarily good for transactional systems.  They're great when you need to distribute and analyze massive amounts of data.

NewSQL is either a Sourceforge LDBC driver implementation created around 2003 that offers a simplified SQL syntax, or yet another buzzword coined last year to describe something that isn't SQL and isn't NoSQL.  It's easier to use, more scalable, and performs better.  It provides ACID support and all the features of new and old database technologies.  Or so the vendors of this technology space suggest...

Some products in the NewSQL space include Akiban Server and (perhaps) Microsoft SQL Azure

More info here:
http://www.readwriteweb.com/cloud/2011/04/the-newsql-movement.php

Augmenting legacy SQL databases and technologies with their new "Big Data" columnstore or key-pair counterparts seems to be the Next Big Thing.  Eventually one of the Big Dogs will fill this space, probably Microsoft, Oracle, IBM or SAP.  For now the niche players will nibble at each other's dogfood, until somebody decides to buy out the dogfood supplier.

What we really need is a database to track and compare the companies, names, technologies, features and query languages used for each of these products.  Maybe this database could just be a query language against Wikipedia.  As more and more players enter the database market, research companies and consultants who understand the "Big Picture" become more important, as do content aggregators and search tools like Wikipedia, Linked-In and Google.  

Wednesday, July 18, 2012

Using Sequence Identity Column in SQL 2012

So the conclusion is that identity is 10 times slower than sequence with default cache size. You will not gain significant performance by defining very big cache size.


It sounds like sequence is the way to go when performance and concurrency are issues.  Sequence requires rewriting your application inserts to handle the syntax for sequence.  eg. INSERT INTO x (id) VALUES (NEXT VALUE FOR seqname)
An insert trigger could potentially replicate the functionality of the identity column.

SQL Server 2012 Sequence Internal » John Huang's Blog
 
Further details on the internals of identity vs. sequence are here.
http://dba.stackexchange.com/questions/1635/why-are-denali-sequences-supposed-to-perform-better-than-identity-columns
 
The primary reasons I wouldn't use sequence - in some cases anyway.
 
1. No DISTINCT keyword allowed when using sequences in query.
2. Can't use in views.
3. Adds complexity vs. identity column.
4. Can't use in defaults for column.
5. Requires trigger to implement identity-style insert logic.
6. Can be updated without setting identity insert option.

Monday, July 16, 2012

Tuning Analysis Services

Tuning Analysis Services for better aggregation - parallel partitioning.

Set AggregationMemoryLimitMax and AggregationMemoryLimitMin for performance with multiple-cpu environments.

http://geekswithblogs.net/ManicArchitect/archive/2010/11/02/142558.aspx

Wednesday, July 11, 2012

Data Quality in an Enterprise Warehouse

Current product releases for enterprise customers, especially O/Ses and database systems, are usually at least 1-2 versions behind in my experience.  Enterprise IT doesn't like adopting the latest and greatest technologies until they have been proven, and rightfully so.  Some of the latest releases could stop your business in a second with a critical support issue.  

Not adopting the latest technologies can also stop your business, though less like a train wreck and more like   an old mall department store.  If maintenance isn't kept up, prices don't appear competitive and new products don't keep filling the shelves, the store is probably doomed to stagnate to a slow death.  If software patches and releases aren't kept up, support lifecycles expire and similar platforms become harder to integrate.  Upgrading through 3 versions of an O/S while maintaining compatibility with your in-house proprietary apps becomes not just painful, but nearly impossible.

It's usually the same with data.  Fixing problems immediately without doing root cause analysis can be just a band-aid solution.  Not fixing problems at all could be even worse, and fixing historical data could cost more than it's worth.

Historians say that it is harder to predict the past than it is to predict the future.  The internet is making predicting the past a bit easier, at least the past beyond the mid 90s.

Here's an article from 2005 that's still relevant, regarding data cleanliness in the warehouse.

Despite the great efforts from Inmon, Kimball, and the Others, the world of Data Warehousing is still facing great challenges. Even in 2005, after 14 years of Inmon explaining the concept, more than 50% of today’s data warehouse projects are anticipated to fail [Robert Jaques]. In fact, Ted Friedman, a Principal Analyst in Gartner wrote in 2005, “Many enterprises fail to recognize that they have an issue with data quality. They focus only on identifying, extracting, and loading data to the warehouse, but do not take the time to assess the quality.”


Today’s data warehouses suffer from poor quality of the data. Whether or not the poor quality of data existed a decade ago is a questionable hypothesis. In the 90s, the new breed of software products and the ease of implementing data-moving techniques have opened several avenues that resulted in data duplication. As a result, any data inconsistencies in source systems have been remedied by scrubbing and cleansing them on “local copies” of the data sets rather than taking efforts to correct them at the sources.
If Inmon or Kimball had foreseen the wave of software product proliferation in the 90s that implemented duplicated functionality in an organization, they might have stressed on architecting for better quality.

The premise behind many data warehousing strategies is to bring "all the things" into a central area for reporting, and perhaps analysis.  More so just reporting, or "giving the users the numbers" instead of the Semantic Meaning behind them.  Just like a real warehouse, the historical data starts to collect dust.  Without a proper inventory control system, it becomes hard to find things.  The older data elements appear a bit yellowed and sun-bleached, and perhaps have warped or had some design flaws when they were originally input that were only caught in later versions.  The newest items may look shiny due to improved validation processes during input time, but could have critical defects due to ETL "Chinese Whispers" gone haywire.

The way companies deal with interpreting and fixing bad data is probably the number one reason why most BI projects fail.

 As in a warehouse, it can be hard or just doesn't make ROI sense to go back and fix those old elements.  It is much easier to write them off as being known bad data.  This can cause grief and confusion for those trying to perform historical comparisons and trend analysis on the data.

In a dimensional model, a measure is also called a fact.  What is a fact if it is not quite true?  It doesn't become an opinion, it becomes an error.  Loading errors into a fact table is not a good idea. So we have a conundrum.



The missing component here is being able to publish scores that tell us the data is flawed, and commentary describing why we are not going to fix it, or how data quality is being addressed, or at least how to interpret the data, and ways to ignore data based on a threshold of quality.

As we move toward the "big data" experience, identifying trust-levels within data sources becomes more important.  Facts become opinions instead of errors.  Opinion tables sound much nicer to me than Error tables, and for dealing with data that doesn't have to be down to the penny, like internet sentiment analysis, perhaps opinions may work.

I propose an approach to augmenting current star/snowflake dimension models.   Opinionate your data models.

1. Add an opinion table to your fact table.  Ensure it uses the same grain as the fact table.
2. Include status columns for each column in your fact that you wish to track errors.
3. Include an aggregated status column, and a checksum that can be matched against the source record for auditing purposes.
4. Include a type 2 with history slowly-changing dimension for managing when each fact table load occurred and what the most current load is.
5. Track all errors, questionable data, or data out of the statistical norm using this table.
6. Ensure you have dimensions for ratings, tags and link to a wiki for people to maintain commentary.

This table load could be automated using standard ETL tools, or manually augmented using data input forms.  If you're using Excel 2007/2010 and Analysis Services cubes, you could use the writeback feature to assign scores to existing data and provide 1/0 values for true/false flagging.  Some tools, like Oracle and SQL, automatically create statistics and histograms on tables which you could leverage to find out what data are outliers.  Data Quality Services tools can be trained to understand your data and identify suggestions or even fix your data.



Gamify the process by giving out weekly prizes for those business users that find the most issues with the data, and those technical users that propose the best solutions for fixing the issues.

Sometimes automating data processes can uncover unsettling facts, especially with people & processes that resist automation.  Improving data quality gives more power to the users of that data, and less power to the controllers of the data.  This isn't always a good thing.... depending on your opinion.


Wednesday, June 20, 2012

SSIS SCD Wizard Performance Issue « Data Warehousing and Business Intelligence

In Microsoft BI, when we do a slowly changing dimension type 2, we instinctively do SCD Wizard. The problem with SCD Wizard is the performance. For each source row coming to the SCD box, SSIS does a lookup on the target dimension table to find out if the row already exists or not. That is a crazy idea. And SSIS also update the target dimension table on row by row basis. That is also a crazy idea.



If your dimension is 1000 rows and the source data is 1000 rows, the SCD takes a few minutes. But if your dimension is 500k and the source data is 500k, it’s 5-6 hours performance. The whole idea is fundamentally flawed. On the welcome screen of the SSIS box, there should be a warning: Warning, do not use SSIS SCD Wizard if your dimension is above 1000 rows.


So, if your dimension is 1m rows, what should you do?

SSIS SCD Wizard Performance Issue « Data Warehousing and Business Intelligence

Tuesday, June 19, 2012

Filtering by logged in user

How to get the current user name in MDX.

WITH MEMBER Measures.UserID

as
'UserName()'
SELECT
[Measures].[UserID]
ON COLUMNS
FROM
 
Filtering by logged in user

Friday, June 08, 2012

Wednesday, June 06, 2012

Add Report Server Content Types to a Library (Reporting Services in SharePoint Integrated Mode)

To configure Reporting Services against an existing Sharepoint 2010 farm, you need to go to the root site and enable the proper features.  Then configure libraries in subsites to expose the proper content types (Report Builder Model, Report, Data Source).

It can get a bit tricky.  Here are some links.

http://blog.cloudshare.com/2012/05/22/step-by-step-guide-to-installing-sharepoint-with-sql-2012-powerpivot-powerview-and-reporting-services/

Add Report Server Content Types to a Library (Reporting Services in SharePoint Integrated Mode)

SQL Error 909 when restoring ContosoRetailDW.bak using MS SQL Server Mgt. Studio

If you’re trying to restore the Contoso database samples on a BI or Standard edition of SQL 2012, you’ll get an error about Enterprise features being used.  Here is a script that can be run on a trial Enterprise or Developer edition of SQL 2012 to remove that dependency.

ALTER INDEX ALL ON DimChannel REBUILD WITH (DATA_COMPRESSION = None);
ALTER INDEX ALL ON DimEntity REBUILD WITH (DATA_COMPRESSION = None);
ALTER INDEX ALL ON DimProduct REBUILD WITH (DATA_COMPRESSION = None);
ALTER INDEX ALL ON DimStore REBUILD WITH (DATA_COMPRESSION = None);
ALTER INDEX ALL ON FactExchangeRate REBUILD WITH (DATA_COMPRESSION = None);
ALTER INDEX ALL ON FactInventory REBUILD WITH (DATA_COMPRESSION = None);
ALTER INDEX ALL ON FactITMachine REBUILD WITH (DATA_COMPRESSION = None);
ALTER INDEX ALL ON FactOnlineSales REBUILD WITH (DATA_COMPRESSION = None);
ALTER INDEX ALL ON FactSales REBUILD WITH (DATA_COMPRESSION = None);
ALTER INDEX ALL ON FactSalesQuota REBUILD WITH (DATA_COMPRESSION = None);
ALTER INDEX ALL ON FactStrategyPlan REBUILD WITH (DATA_COMPRESSION = None);

SQL Error 909 when restoring ContosoRetailDW.bak using MS SQL Server Mgt. Studio

Thursday, May 31, 2012

Share cache across SSIS packages

Shared caching is a new feature of SSIS 2012 that could improve performance when using the same large lookup table across multiple packages.  It could be replicated by saving to a raw file in SSIS 2005/8, however this should be faster.


Prior to SSIS 2012 it was not possible to re-use or share the same cache across packages. What that meant is if you created a cache you will be able to consume it only in the very package where the cache was instantiated. Therefore, a developer could take only two routes from here – either make as much heavy ETL processing as possible inside the same package where the cache resides, or populate yet another cache object in every other package where it is needed. The latter was 
especially harmful leading to unnecessary heavy extra database calls and an extended development time. This limitation has been overcome in the SSIS 2012 release.

Monday, May 14, 2012

SQL Live Monitor

 

a .NET application that provides realtime performance data on the target SQL Server instance.
No installation required, data displayed in realtime, and can also be logged to CSV for offline analysis. Will also capture SQL data for processing uing PAL.

SQL Live Monitor

Thursday, May 10, 2012

Layerscape - Home Page

Layerscape is a fascinating Microsoft Research tool that integrates with Worldwide Telescope to provide deep spatial virtualizations.

 

Layerscape - Home Page

Thursday, May 03, 2012

Download: Microsoft® SSIS Balanced Data Distributor - Microsoft Download Center - Download Details

Useful for distributing workloads and speeding up SSIS performance.

Microsoft® SSIS Balanced Data Distributor (BDD) is a new SSIS transform. This transform takes a single input and distributes the incoming rows to one or more outputs uniformly via multithreading. The transform takes one pipeline buffer worth of rows at a time and moves it to the next output in a round robin fashion. It’s balanced and synchronous so if one of the downstream transforms or destinations is slower than the others, the rest of the pipeline will stall so this transform works best if all of the outputs have identical transforms and destinations. The intention of BDD is to improve performance via multi-threading. Several characteristics of the scenarios BDD applies to: 1) the destinations would be uniform, or at least be of the same type. 2) the input is faster than the output, for example, reading from flat file to OleDB.
NOTE:  An updated version of Microsoft SSIS Balanced Data Distributor (BDD) is available. This includes a fix for the problem causing BIDS to crash when removing the BDD transform. For more details, and download information, see
KB 2616527.

Download: Microsoft® SSIS Balanced Data Distributor - Microsoft Download Center - Download Details

Monday, April 30, 2012

SQL Server Security - Site Home - MSDN Blogs

Instead of using CHECKSUM, timestamp, or other methods, hashing data may be a good alternative.

A common scenario in data warehousing applications is knowing what source system records to update, what data needs to be loaded and which data rows can be skipped as nothing has changed since they were last loaded. Another possible scenario is the need to facilitate searching data that is encrypted using cell level encryption or storing application passwords inside the database.

Data Hashing can be used to solve this problem in SQL Server.

SQL Server Security - Site Home - MSDN Blogs

Thursday, April 26, 2012

Twitter Kills The Majesty of Pleasant Conversation


Words are powerful things.  As of late, I have been working on text mining social data feeds, and investigating how Hadoop, R, Azure and SQL Server 2012 fit into the big picture.  I wrote a SQL Integration Services package using a script task that pulls Twitter data from a syndication feed, parses out the words, hash tags, screen names and links, and stores the results in a SQL 2012 (or optionally SQL Azure) database.  Performing some text mining of Twitter status keywords against a selection of “best” and “worst” words in the English language brings back some interesting and slightly depressing results.

I started by entering in the database the twitter screen names to follow.  I targeted a few Canadian bike companies (don’t ask why).  Each time the package is run, it adds the last 20 tweets, and parses screen names of users that are mentioned in each tweet.  And so on.  This recursion builds a very quick six-degrees-of-separation trail and some fairly random discussions. 

Running this process for about 4 days, sometimes 2-3 times per day, produced 5599 tweets.  Originally I was looking at using R and Hadoop to analyze the results, which is a bit like bringing a ballistic missile to a knife fight.  To slice this data with SQL takes only a couple of seconds or less.  Perhaps reading the entire Twitter firehose or analyzing historic tweet data might change the architecture in the future.  For now, things are working pretty well. 

Of a selection of 5599 individual tweets, 9 contain the “best words” and 2135 have the “worst words” as rated by Vocabula Review.  That’s 38% of the sample that have an aura of foolishness or odium, and 0.1% that have an aura of fun and majesty.  The sampling is fairly small, with the top word “valley” only coming up 3 times. 

Another dataset with seeded with a more technology-centric list of twitter users like Robert Scoble some Microsoft folks I follow brought back similar results.  Running this process over the course of a month saved 59,583 tweets containing 796,171 words, links, screen names, emoticons and hash tags.

Of the 796k words, 24,171 came up in the “worst words” filter.  That’s about 30%.  A measly 282 came up in the “best words” filter.  That’s less than 0.001%.

The following Top 5 Words came up.

valley 74
azure 19
simplicity 11
bliss 10
recherche 8

  • Valley makes sense, with Silicon Valley, Napa Valley, and those other west coast valleys being discussed.
  • Azure makes sense, since a steady stream of Windows Azure propaganda continually bubbles from Microsoft.
  • Simplicity comes up a few times when people talk about Apple or iPad.
  • Bliss comes up because of Rob Bliss, a Washington Times columnist, and some comments about cranberry bliss bars.
  • Recherche, well, let’s chalk that up to the fact that some of the best words in the English language are French.  Mon dieu.

With only 140 characters to leverage, you would think that people would use words like “animadversion” or “cachinnation” to provide deep and meaningful expression.  Instead, you get the logorrhea that is the Twitter dataset.

Check out www.vocabula.com to improve your tweets and amaze your followers with fun and majesty.

The BI Software Tool Belt

Excellent list of must-have software for SQL Server and everything else.

My must-have software is:

  • Microsoft OneNote, Live Writer
  • SSMS Tools
  • BIDS Helper
  • WinDirStat

The BI Software Tool Belt

Wednesday, April 25, 2012

SQL Server FineBuild

A wizard for installing SQL?  Yes, it exists…

FineBuild provides 1-click install and best-practice configuration of SQL Server 2012, SQL Server 2008 R2, SQL Server 2008, and SQL Server 2005.

SQL Server FineBuild

Tuesday, April 24, 2012

CQRS & Separation of Church and State

CQRS is a design pattern acronym which stands for Command Query and Responsibility Segregation.  I would call this the R/RW pattern, or the separation of Church and State.  In a reporting application design, it would silo the read queries from the write commands. 

In its simplicity, it is designing two services for each application interface, one that reads and one that writes. 

That is it. That is the entirety of the CQRS pattern.

CQRS, Task Based UIs, Event Sourcing agh! | Greg Young

Since the bulk of applications have highly concurrent reads and minimal or bulk writes, thinking about this pattern during design is very important.  What if you were able to architect a system where you could spin up unlimited services that provide querying, and a few services that provide writes/updates, queuing them up for later?  Would this improve performance and scalability? What if you could split the data for these systems into unlimited read-only models and a few read/write models?

What if the data in the read-only models was effortlessly in-synch with the read/write models, with minimal latency and lazy consistency?

This is one of the tenets behind Azure and cloud computing in general.

You can do this in a SQL Server environment, albeit not so effortlessly and with some gotchas, using Log Shipping, Undo and database snapshots for a read-only picture of the database.  There is some latency, and keeping the databases in synch adds lots of overhead. 

SQL 2012 appears to solve some of the limitations of SQL 2008 R2 log shipping with it’s AlwaysOn capabilities for rolling out read-only, distributed clusters. It still doesn’t seem as effortless as it should be.

Replication also offers a solution, though it can be brittle and has its limitations and maintenance overhead. 

SANs also offer a high-cost solution for fast bulk loads, with some downtime.

You cannot use SAN snapshots of an active SQL Server database file(s) unless you detach them first. This is very intrusive.

I actually do this for a situation where I need 800GB of data from one server to another every night.

  • Detach the database on ServerA
  • SAN snapshot serverA drives F:,G:,H:,I:
  • reattach database on ServerA
  • detach database on ServerB
  • detach drive L,M,N,O on ServerB
  • mount new SAN snapshots L,M.N,O on ServerB
  • attach database files from L,M,N,O on ServerB

It requires an outage on both, but is very fast to virtually move the data to the other server.

http://social.msdn.microsoft.com/Forums/eu/sqldatabaseengine/thread/8f01f55a-66ab-4f96-a7bf-dca10bea64b8

Change data capture is another solution for maintaining multiple concurrent copies of a data stream.  SQL 2012 now has an Oracle CDC feature which improves the integration and reporting story.

The above solutions are very hammer-based approaches to the solution and don’t necessarily relate to CQRS.  Message queuing, eventing, change data capture and pub/sub systems all seem to fall within the CQRS design pattern, though they are really just an extension to the thought process. 

Chad Meyers has an excellent posting on Laws, Rules, Principles, Patterns, and Practices.

Another set of excellent posts on the concept of CQRS.

http://thinkbeforecoding.com/

There doesn’t seem to be a silver bullet to the solution just yet, though I’m sure someone will get it right sooner or later. The CQRS pattern is merely that, a pattern. The devil is in the details.

Monday, April 23, 2012

Ctrl+E not working for executing statements | Microsoft Connect

Upgrading from Denali CTP1?  Your keyboard shortcuts (CTRL-E, CTRL-R) are probably not working.  Use Options – Keyboard – Reset to fix.

Ctrl+E not working for executing statements | Microsoft Connect

Reloading Agent Settings appears in SQL Agent Logs every 90 seconds | Microsoft Connect

One of the things you may want to do as soon as you install SQL 2012 is disable the auto refresh feature of SQL Agent, which will fill up your agent logs with superfluous messages.

We investigated this issue and it's indeed painful behavior from the user perspectives, we will address this issue in nearest Service Pack.
For now I can suggest you the the following workaround:
Go to registry on your SQL Server machine and navigate to corresponding SQL Instance registry key: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\ MSSQL11.YOUR_INSTANCE_NAME\SQLServerAgent
Change registry settings called “AutoRegistryRefresh” from 1 to 0.
The side impact of this: if you change any default settings behavior for SQL Agent, you may need to restart SQL Agent to get it.

Reloading Agent Settings appears in SQL Agent Logs every 90 seconds | Microsoft Connect

Exporting via HTML instead of MHTML - CSS SQL Server Engineers - Site Home - MSDN Blogs

How to configure additional output formats (HTML) for Reporting Services.

There was a question on Twitter about how to display a report in HTML instead of MHTML due to some browser issues.  Based on the fact that it was MHTML, I’m assuming we are talking about exporting a report as the default report view is HTML.  First off, if we look at our export options for a report, we see the following:

Exporting via HTML instead of MHTML - CSS SQL Server Engineers - Site Home - MSDN Blogs

Thursday, April 19, 2012

Jonathan Kehayias | Installing and Configuring SQL Server 2012 Distributed Replay

The distributed replay feature of SQL 2012 allows for load testing a sampled workload against a different SQL server, and spinning up multiple workloads to simulate real activity.

Here’s the install instructions.

Jonathan Kehayias | Installing and Configuring SQL Server 2012 Distributed Replay

Monday, April 16, 2012

SQL Server 2012 Command Line Utilities — DatabaseJournal.com

The MAK talks about command line utilities in SQL 2012.

Microsoft provided many Command line utilities for SQL Server 2012.  In this article I am going to explain the functionality and use of these command line utilities based on the feature it is dealing with.

SQL Server 2012 Command Line Utilities — DatabaseJournal.com

Also, the details on PowerShell cmdlets for SQL

Friday, April 13, 2012

Paging Function Performance in SQL Server 2012 | Ms SQL Girl

Beware the new fetch feature of SQL 2012.  RBAR in disguise?

Although SQL Server 2012 new Paging function is intuitive for the purpose of paging a data set, it comes at a cost for large volume of data. The Top Batching and Direct ID Batching perform significantly better under similar circumstances, i.e. simple Source Data with continuous ID.

Paging Function Performance in SQL Server 2012 | Ms SQL Girl

There is also another bug around identity columns skipping increments after a SQL restart that might affect how paging functions in SQL 2012.  Be sure to test any applications doing server-side paging!

Thursday, April 12, 2012

TechNet Log Parser Studio

 

Log Parser Studio is a utility that allows you to search through and create reports from your IIS, Event, EXADB and others types of logs. It builds on top of Log Parser 2.2 and has a full user interface for easy creation and management of related SQL queries.

TechNet Log Parser Studio

Wednesday, April 11, 2012

Data Warehousing and Business Intelligence

 

Data Warehousing and Business Intelligence

Vincent Rainardi presents an excellent resource for a Business Intelligence project, with plenty of tips and tricks to succeed when building a data warehouse or reporting solution.

Monday, April 09, 2012

Aaron Bertrand : Two bugs you should be aware of

Questioning using SQL 2012 as a production environment?  There are a few critical issues with the current platform that may make you want to think about when SP1 is coming out….

This one is critical for those that think identity should be an auto-incrementing sequential key.

Just to add to your list. There is one more critical bug wherein if you define identity column on a table and insert some values. Thereafter restart SQL server and insert further more values and do a restart again and then when u insert in that table there are large gaps in the identity values.

Aaron Bertrand : Two bugs you should be aware of

This is Sparta...n SQL!: Scripting out statistics

An interesting technique for building a VSDB (Very Small Database) with a very large query plan profile.  If you have a huge database and don’t want to do performance testing of procedures in production, mirroring statistics could be a way to simulate this activity using tiny or even empty tables.

I recently read an excellent post by Kevin Kline regarding scripting out histograms from one database and applying them to another.  What would have made it better is if there was a way to automate it.  And so I pulled out powershell and got to work. 

This is Sparta...n SQL!: Scripting out statistics

There should really be a tool that clones the schema of a database and the statistics of a database while clearing out the data or subsets of the data.

Thursday, March 29, 2012

"SYLK: File format is not valid" error message when you open file

Before deciding to use ID as a column name, consider this scenario when dealing with CSV files…

"SYLK: File format is not valid" error message when you open file

"SYLK: File format is not valid" error message when you open file

SAP Data Services | A small notes. . .

Some information on SAP Data Services here, and a refresher on Databases and Data Warehousing.

Also, an interesting post on ETL vs ELT.  Would it be faster to use ETL to transform your data before loading, or would it be faster to just copy your data directly to the warehouse and manage it inside?

Perhaps this is the role the new FileTable type in SQL 2012 will play for you.  Why not FTP your CSV files directly into SQL as blobs, and manage the transformations within?

Maybe not the best example, but it does centralize things a bit more…

SAP Data Services | A small notes. . .

Wednesday, March 28, 2012

New programming features in SQL 2012

SQL Server 2012 is released this weekend.  SQL 2012, you can now use select-style statements to return customized results from a single stored procedure. 

EXEC uspGetEmployeeManagers 16 WITH RESULT SETS ( ([Reporting Level] int NOT NULL, [ID of Employee] int NOT NULL, [Employee First Name] nvarchar(50) NOT NULL, [Employee Last Name] nvarchar(50) NOT NULL, [Employee ID of Manager] nvarchar(max) NOT NULL, [Manager First Name] nvarchar(50) NOT NULL, [Manager Last Name] nvarchar(50) NOT NULL ) );

EXECUTE (Transact-SQL)

Custom paging using an ORDER BY clause

-- Skip 0 rows and return only the first 10 rows from the sorted result set. SELECT DepartmentID, Name, GroupName FROM HumanResources.Department ORDER BY DepartmentID OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY;

Database sequence numbers

Throw to negate the need for custom messages with RAISERROR

C#/VB style Format to format dates, times, numbers

CONCAT to join 2 strings

OVER clause for windowed summarization / rolling totals

PARSE to convert strings to dates, currencies, etc

TRY_CONVERT to avoid errors when converting data types

TRY_PARSE to determine if strings = data types

DATEFROMPARTS to take a year, month and day and create a date

CHOOSE to return an index item from a list of selections. (Arrays in SQL?)

IIF for fast conditional checks.

Monday, March 26, 2012

Bob's blog - SQL 2008 R2 MDS Easter Egg - wBob

Some helpful tools coming out of SQL MDS, including regex and XSLT straight from T-SQL.

I've been having a play with SQL 2008 R2 and its new Master Data Services (MDS).  I was pleased to see it adds some additional functions when you create a new MDS database.  These include mdq.Split and mdq.RegexReplace which implement handy Regular Expression functions.  Most of these are documented at: http://msdn.microsoft.com/en-us/library/ee633712(SQL.105).aspx

I also discovered a scalar function called mdq.XmlTransform which allows you to use XSLT:

Bob's blog - SQL 2008 R2 MDS Easter Egg - wBob

Wednesday, March 21, 2012

Microsoft SQL Server 2012 Virtual Labs Available To You Online, plus many more SQL Server 2012 resources - Microsoft SMS&P Partner Community Blog - By Eric Ligman - Site Home - MSDN Blogs

Microsoft SQL Server 2012 Virtual Labs Available To You Online, plus many more SQL Server 2012 resources - Microsoft SMS&P Partner Community Blog - By Eric Ligman - Site Home - MSDN Blogs: "We’ve just come off the exciting SQL Server 201 Virtual Launch Event and the buzz about SQL Server 2012 continues to grow! Following on to my Windows Small Business Server 2011 TechNet Virtual Labs Available To You Online, plus more Small Business Server 2011 resources and SharePoint Server 2010 MSDN Virtual Labs Available To You Online, plus more SharePoint 2010 resources posts, I thought I would keep the momentum going and share out some SQL Server 2012 Virtual Labs that we have available to you online, as well as some additional SQL Server 2012 resources that you can use:"

'via Blog this'

SSMS puts extra lines in the result set in grid view for sp_helptext | Microsoft Connect

SQL 2012 bug and a couple alternatives that might work instead.

SSMS puts extra lines in the result set in grid view for sp_helptext | Microsoft Connect: "SSMS puts extra lines in the result set in grid view for sp_helptext"

http://www.sqlmusings.com/2009/05/17/how-to-get-definition-for-stored-procedures-udfs-and-triggers-using-t-sql/

'via Blog this'

50 Free Social Media Tools You Can't Live Without

Tracking your social media campaign?  Here are some tools to look at.

A couple years ago, Jay Baer wrote a great blog post called ‘The 39 social media tools I’ll use today’ which was an all-in-one toolkit for social media marketers (and still is).

A lot has changed in the two years since that post was published so here is a ’2012 remix’ featuring 50 (mostly free) tools you can use on a daily basis.

Whether you are just starting out in the social media arena or have been at it for a few years, this will hopefully be a handy resource. So, let’s serve ‘em up

50 Free Social Media Tools You Can't Live Without

Tuesday, March 20, 2012

CanvasMap Prototype = HTML5 Canvas + Mapping

Looking for a Windows 8 / HTML5 map control?

CanvasMap is a JavaScript map visualization control (similar to Bing Maps or Google Maps) that is built entirely on top of using the HTML5 Canvas element to render the map. Currently, neither Bing Maps or Google Maps use the Canvas element to render maps, so I thought I’d do this to see how the performance might compare. I know there are a few alternatives already, but I thought I’d have some fun trying to build my own.

CanvasMap Prototype = HTML5 Canvas + Mapping

Monday, March 19, 2012

VDS Technologies - Map Data and GIS Tools

Excellent repository of shape files for maps.

VDS Technologies - Map Data and GIS Tools

The 2012 Microsoft Product Roadmap -- Redmond Channel Partner

One fork in the road may be a single platform for Windows Phone & Windows 8 apps, or the ability to run WP7.5 apps on Win8.

Which would be nice.  Kind of like running iPhone apps on the iPad.

The star of last year's product roadmap was Office 365. Microsoft this year looks likely to pivot from the public cloud to the private cloud, while also shipping staples like SQL Server and releasing major betas -- and possibly final versions -- of blockbusters including Windows 8 tablet, client and server, and Office 15.

The 2012 Microsoft Product Roadmap -- Redmond Channel Partner

Geocoding in Excel

Need a fast way of getting lat/long values for a batch of addresses in Excel?

A lot of folks have to geocode data that they get from various folks in their organization, and it often makes its way in to Excel.  Of course, you can batch geocode using Bing Maps Services, but in some scenarios it is a lot more convenient or reasonable to use Excel.

Westin's Technical Log

Sunday, March 18, 2012

Another Data Mining Blog

Another Data Mining Blog:

Excellent examples of using R with SQL Server, from a milestone winner in the $3 million dollar Heritage Provider Network health prize.

'via Blog this'

Monday, March 12, 2012

Y2K 0229–Azure’s down?

Don’t forget to ensure that your test cases contain time-based events, like leap years and Daylight Savings.   Not that it would have helped much, since it was a cert expiry that caused the issue.  The moral of the story?  Ensure critical apps have a disaster recovery plan that doesn’t include a single source of failure (cloud provider).

When the clocks struck midnight, things quickly got janky, and a cloud-system domino effect took charge. A large number of Western Hemisphere sites and the U.K. government's G-Cloud CloudStore were among the many stopped cold by the outage. Microsoft has been retracing its steps in finding out what exactly happened and hasn't said very much yet, although it did report in an Azure team blog that the problem has "mostly" been fixed.

IT in Canada - Canada's Only Integrated Social Media News Network

Saturday, March 10, 2012

SQL Azure Upgrade

SQL Azure Upgrade Advisor?  Here you go...

Get your SQL Server database ready for SQL Azure! - SQL Server Team Blog - Site Home - TechNet Blogs: "One of our lab project teams was pretty busy while the rest of us were taking a break between Christmas and New Year’s here in Redmond. On January 3rd, their new lab went live: Microsoft Codename "SQL Azure Compatibility Assessment". This lab is an experimental cloud service targeted at database developers and admins who are considering migrating existing SQL Server databases into SQL Azure databases and want to know how easy or hard this process is going to be. SQL Azure, as you may already know, is a highly available and scalable cloud database service delivered from Microsoft’s datacenters. This lab helps in getting your SQL Server database cloud-ready by pointing out schema objects which are not supported in SQL Azure and need to be changed prior to the migration process. So if you are thinking about the cloud again coming out of a strong holiday season where some of your on-premises databases were getting tough to manage due to increased load, this lab may be worth checking out."

'via Blog this'

Based on the recent training course I took using SQL Azure, the backend is a SQL 2005 "Express" style instance.  Lack of some key "newer" features that use the CLR (datatypes like spatial and xml) may quickly exclude some of your more complex databases from being migrated to the cloud.

Million Song Dataset | scaling MIR research

Looking for some Big Data to test out a project?  This one looks like fun.

Million Song Dataset | scaling MIR research: "The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks.


Its purposes are:


To encourage research on algorithms that scale to commercial sizes
To provide a reference dataset for evaluating research
As a shortcut alternative to creating a large dataset with APIs (e.g. The Echo Nest's)
To help new researchers get started in the MIR field
The core of the dataset is the feature analysis and metadata for one million songs, provided by The Echo Nest. The dataset does not include any audio, only the derived features. Note, however, that sample audio can be fetched from services like 7digital, using code we provide.


The Million Song Dataset is also a cluster of complementary datasets contributed by the community:


SecondHandSongs dataset -> cover songs
musiXmatch dataset -> lyrics
Last.fm dataset -> song-level tags and similarity
Taste Profile subset -> user data


The Million Song Dataset started as a collaborative project between The Echo Nest and LabROSA. It was supported in part by the NSF."


'via Blog this'

Friday, March 09, 2012

NoSQL Data Modeling Techniques « Highly Scalable Blog

NoSQL, or DIY SQL is the most common technique for blasting out distributed data stores. 

This article is a great overview of the types of NoSQL designs out there, and ways to model the data.

NoSQL databases are often compared by various non-functional criteria, such as scalability, performance, and consistency. This aspect of NoSQL is well-studied both in practice and theory because specific non-functional properties are often the main justification for NoSQL usage and fundamental results on distributed systems like CAP theorem are well applicable to the NoSQL systems.  At the same time, NoSQL data modeling is not so well studied and lacks of systematic theory like in relational databases. In this article I provide a short comparison of NoSQL system families from the data modeling point of view and digest several common modeling techniques.

NoSQL Data Modeling Techniques « Highly Scalable Blog

Tuesday, March 06, 2012

Top analytic blogs and websites, with trending information - AnalyticBridge

I recently signed up for mailings from Analyticbridge, the social network for analytic professionals.   It seems like daily emails are the norm, which I’ll have to setup a rule to move someplace.  In the meantime, they just released a list of the top analytic blogs and websites.

Worth checking out.

Top analytic blogs and websites, with trending information - AnalyticBridge

Tuesday, February 28, 2012

Microsoft partnering with monitor manufacturers?

I wonder if MS will fix this resolution limitation in the release…

Laurent Bugnion experienced Windows Update hell when he got hit in the middle of a presentation.  The moral of the story?  Run Windows Update before the presentation, or turn off updates altogether.

His presentation looks really interesting.  Taking a Silverlight to WP7 & WP7 to Win8/WinRT.

What was the issue actually?

After reboot I tried again, and the same result happened: No Metro apps could be started. From the room, I got the suggestion that it was a resolution problem. And indeed: You cannot run Metro applications with less than 768 pixels vertical resolution. And because of the projector, I was running at 1280x720. So I thought OK, let’s try it, increased the resolution to 1280x768 and indeed I could start the Metro apps… but then the projector stopped showing the image!

Laurent Bugnion (GalaSoft)

Saturday, February 25, 2012

Windows 7 comes to the iPad with 1 gigabit internet speed

23 seconds to download a CD’s worth of content.  3 minutes to get a DVD?  Wow.

The news today is the new service, called OnLive Desktop Plus. It’s not free — it costs $5 a month — but it adds Adobe Reader, Internet Explorer and a 1-gigabit-a-second Internet connection.

That’s not a typo. And “1-gigabit Internet” means the fastest connection you’ve ever used in your life — on your iPad. It means speeds 500 or 1,000 times as fast as what you probably get at home. It means downloading a 20-megabyte file before your finger lifts from the glass.

You get the same speed in both directions. You can upload a 30-megabyte file in one second.

OnLive Desktop Plus Puts Windows 7 on the iPad in Blazing Speed - State of the Art - NYTimes.com

Wednesday, February 22, 2012

Download: Reporting Services LogViewer - Microsoft Download Center - Download Details

A log analyzer for Reporting Services, also try Log Parser Lizard

In this tool we demonstrate the best way to analyze the Report Server Service Trace Log and Report Server Execution Log that is useful if you are debugging an application or investigating an issue or event.

Download: Reporting Services LogViewer - Microsoft Download Center - Download Details

Thursday, February 09, 2012

History of Sharepoint

Interesting article about the history of Sharepoint here.  Back in the day, I remember us building proof-of-concept digital dashboards in Outlook.  I remember Site Server having a really cool Silverlight-style 3D spiderweb visualization for looking at a web site map, and visualizing one of the largest web sites in Canada, analytics for Bell Canada.  I remember using a product called Corechange Coreport to build out portals that included both web and desktop apps. 

I remember cursing Sharepoint 2001's functionality when we were building out Project Server portals, and fighting with InfoPath xml and painfully slow document creation APIs to build out a workflow system for the largest insurance company in Canada.  Scaling out Sharepoint 2001 was an interesting affair...

Wednesday, February 08, 2012

Sample OData feed

Looking for sample data for PowerPivot?  Although Ocean Drilling isn't high on my list of environmentally friendly exercises, it could be an interesting source of data.

http://data.oceandrilling.org/

Thursday, February 02, 2012

Email Behavior Analyzer

Cool MS Outlook addin – find out how email conversations translate to emotions and behaviours.

Email Behavior Analyzer

Email Behavior Analyzer

Wednesday, January 25, 2012

SCD Type 2 Merge in Oracle

Oracle Slowly Changing Dimension using PL/SQL Merge.

If you require type 2 dimension updates in Oracle, this could help.

http://databobjr.blogspot.ca/2011/04/load-slowly-changing-dimension-type-2.html

Rob Farley : The SSIS tuning tip that everyone misses

Great tips for optimizing SSIS source queries.

Luckily, T-SQL gives us a brilliant query hint to help avoid this.

OPTION (FAST 10000)

This hint means that it will choose a query which will optimise for the first 10,000 rows – the default SSIS buffer size. And the effect can be quite significant.

Rob Farley : The SSIS tuning tip that everyone misses

Monday, January 23, 2012

Debugging SSIS Script Component « Code Reference

If you’re having issues with SSIS 2008 R2 logging in script tasks, it’s most likely because syntax has changed.

See this link on how to use ComponentMetadata.FireProgress instead of Dts.FireProgress.

Debugging SSIS Script Component « Code Reference

Friday, January 20, 2012

reportsync - sync SSRS report between two report servers - Google Project Hosting

reportsync - sync SSRS report between two report servers - Google Project Hosting: "reportsync
sync SSRS report between two report servers

Sync reports between two SSRS servers.
Download rdls from SSRS to local PC.
Upload files to a SSRS server
Attach datasources automatically on upload(Data source name given in the report must exist on server)"

Thursday, January 19, 2012

there is an error in xml document (5, 2) rptproj - Google Search

---------------------------
Microsoft Visual Studio
---------------------------
The report definition failed to load: There is an error in XML document (5, 2)..  Verify the report definition conforms to the correct schema.
---------------------------
OK   Cancel  
---------------------------

If you get this, and are using Subversion for your Reporting Services development, you may have a corrupt data source file.  The xml involved has nothing to do with rptproj, it is actually an item within, in this case the data source.

XML and Visual Studio are my two least favourite things when dealing with bugs.

there is an error in xml document (5, 2) rptproj - Google Search

Wednesday, January 18, 2012

Business Intelligence Edition in SQL 2012 - Prologika (Teo Lachev's Weblog) - Prologika Forums

Teo outlines some of the pros and cons of the new BI edition of SQL 2012.

About $10k per server vs. over $50k for Enterprise Edition is the biggest plus.  It will provide all the features for Analysis Services previously found only in Enterprise, and gives a fully functional PowerPivot, Analysis Services & Reporting Services server to the SME’s.

From what I understand, the key driver for Microsoft making this change would have been the requirement of installing 2 separate instances of SQL for using both PowerPivot and Analysis Services (multidimensional or tabular) data stores.  For those clients wishing to install 2 separate environments for those, the cost will end up being about $20k anyway, not including Sharepoint and DW SQL instances.

One limitation is the 2-socket configuration limit.  Another is the per-user licensing scheme, which could leave companies with 50-200 users in a grey zone.  Enterprise or BI?

The licensing model for private clouds seems a bit more appealing from an ROI perspective.  Invest in SQL Enterprise and deploy an unlimited # of VMs.  The disadvantage is the need to license the entire server farm for SQL Enterprise + Software Assurance.

Business Intelligence Edition in SQL 2012 - Prologika (Teo Lachev's Weblog) - Prologika Forums

Microsoft also seems to be shutting out AMD with the new licensing scheme, and targeting Intel’s Q2 release of their new 2-socket chipset.

http://sqlserverperformance.wordpress.com/2011/11/10/sql-server-2012-licensing-and-hardware-considerations/

Now that MS has moved to a core licensing model + user vs CPU socket, will this change the R&D direction of CPU manufacturers? Will they now focus their efforts more on GPU processing?  What if a PC only had 1 core & 1 socket, but multiple GPUs and something else besides a CPU to handle the more precise calculations? 

Thursday, January 12, 2012

Microsoft to enable Linux on its Windows Azure cloud in 2012 | ZDNet

Actually, it should be any O/S, if VMs are available… an interesting twist on cannibalizing their existing Windows Azure platform.

To date, Microsoft has been balking at customer requests to add persistent VMs to Azure, hoping to get customers to develop Azure apps from scratch instead. But the lack of the ability to host apps like SharePoint and other third-party business applications with persistence was a deal breaker for a number of business users who were unwilling to consider Azure until Microsoft added this support, one of my contacts said.

Microsoft to enable Linux on its Windows Azure cloud in 2012 | ZDNet