Thursday, September 18, 2008

The Data Miner: September 2008

 

Here's a recap of what's new for Data Mining in SQL Server 2008:

  • The Microsoft_Time_Series algorithm has been enhanced to include ARIMA in addition to the existing ARTxp method, and a blending algorithm is now used to deliver more accurate and stable predictions, both short and long term, from a hybrid model. In addition, a new prediction mode allows you to add new data to time series models. (See below for a neat app that will let you explore these features.)

  • Built-in support for holdout has been added. You can easily partition your data into training and test sets that are stored in the mining structure and are available to query after processing.

  • You can now build mining models on filtered subsets of a mining structure's data (e.g. just male customers), which means that you no longer have to create multiple mining structures and re-read the source data for such variations over a dataset.

  • Drillthrough functionality has been extended to make all mining structure columns available, not just columns included in the model. This allows you to build more compact models without sacrificing the ability to producing actionable output reports like targeted mailing lists.

  • The much-requested cross-validation feature has been added, allowing users to quickly validate their modeling approach by automatically building temporary models and evaluating accuracy measures across K folds. The feature is available through a new cross-validation tab under Accuracy Charts in Business Intelligence Development Studio, in addition to being accessible programmatically via a stored procedure call.

The Data Miner: September 2008

No comments: