This post is a follow on from a previous post ‘Complete Guide to SQL 2005 mirroring using the GUI‘
SQL 2005 Mirroring - Automatic Failover « Brad Marsh’s Weblog
A dump of all things SQL Server, SQL Reporting Services, Analysis Services, around Business Intelligence, Business Performance Management and technology.
This post is a follow on from a previous post ‘Complete Guide to SQL 2005 mirroring using the GUI‘
SQL 2005 Mirroring - Automatic Failover « Brad Marsh’s Weblog
Duplicate Records
Duplicate records can occur numerous ways, such as loading source files too many times, keying the same data more than once, or from just bad database coding. Having a primary key on your table (and you always should have one) can will in the removal of the duplicate records, but even w/ a primary key it is never a fun task to have handed to you to complete.
Removing Duplicate Records using SQL Server 2005 | Servers and Storage | TechRepublic.com
Efficient string concatenation in SQL?
We have 2 tables one-to-many. How can we fetch parent table field, and the second field is its children comma separated ? In MSSQL 2000 we could use the following function. But in MSSQL 2005 with the help of FOR XML PATH feature it is a lot easier and the performance of string concatenation is amazing.
SELECT CustomerID
,(SELECT CAST(OrderID AS VARCHAR(MAX)) + ',' AS [text()]
FROM dbo.Orders AS O
WHERE O.CustomerID = C.CustomerID
ORDER BY OrderID
FOR XML PATH('')) AS Orders
FROM dbo.Customers AS C;
A low-cost alternative to BizTalk? Sounds useful…
Generally we consider BizTalk to be over-engineered and awkward to use. It's well placed to solve your problem, and will ultimately deliver a reliable solution - however development is complicated (requiring specialist personnel), debugging is difficult and it can't easily be used straight out of the box. Additionally, relative to SmartsIntegrator, BizTalk is very expensive!
Best practice design tool for MDX queries is now available in MDX Studio.
One last thing to mention: Best practices are exactly what they are – best practices. It means, that most of the time using them you will be better off. However, they are absolute rules to be followed always. Sometimes there are exceptions from the rules. AS is a complex, feature-rich product, and MDX engine is probably the most sophisticated piece in it. Sometimes features interact in such a way, that some best practices might result in worse results. For example, one of the advices that MDX Studio will give is when it sees Filter(CrossJoin(…)), it will recommend using CrossJoin(Filter(…)) instead. It is common sense to reduce the size of the set before crossjoin’ing. But not only this is not always possible, but also there are rare cases where it would result in worse performance. So while you are encouraged to follow the best practices, always use them as a guide, and measure before and after applying the rules to make sure things do improve.
MDX Studio 0.4.0 is released with desktop and online versions simultaneously.
Desktop version can be downloaded from: http://cid-74f04d1ea28ece4e.skydrive.live.com/browse.aspx/MDXStudio/v0.4.0
Online version can be accessed at: http://mdx.mosha.com
As usual – please post your comments, suggestions, bug reports etc at MDX Studio forum.
Microsoft OLAP by Mosha Pasumansky : Analyze MDX with MDX Studio
I’m sure this could be applied to a package in order to build a custom solution for SSIS ETL against multiple tables, without manually creating ETL for each table.
Not to be one to admit defeat (and being too lazy to edit 120 columns by hand) I pulled together some C# to work with the interfaces that I need and then exposed this as PowerShell cmdlets. This both simplified the script and got around the issue of PowerShell not casting to the interfaces I needed.
The following script opens a package, grabs a reference to the columns property of a given csv connection and then loops through the columns, changing all the datatypes and then finally saves the package.
add-PSSnapin powerSSIS
$pkg = get-ssisPackage "C:\temp\Test Package.dtsx"
$fileCon = $pkg.Connections["CSV file"]
$col = $fileCon.Properties["Columns"].GetValue($fileCon)
for ($i = 4; $i -lt $col.Count; $i++)
{
$name = get-ssisflatfileColumnName $pkg "CSV File" $i
if ($name.startsWith("Column"))
{
set-ssisflatfileColumnName $pkg "CSV File" $i "Mth$($i-2)"
$c = get-SSISFlatFileColumn $pkg "CSV File" $i
$c.DataType = [Microsoft.SqlServer.Dts.Runtime.Wrapper.DataType]::DT_Numeric
$c.DataPrecision = 18
$c.DataScale = 5
}
}
set-SSISPackage $pkg "C:\temp\Test Package.dtsx"
Microsoft SQL Server Community Utilities
As well as the independent SQL communities, there is also a Microsoft run one, the Microsoft SQL Server Community Worldwide ( You can tell it’s the Microsoft one by the catchy title). They do tend to assume worldwide is the United States wide or even the Seattle area so you can drop in to Redmond for SQL school. However don’t let that put you off, there is some really good stuff there including a list of utility scripts here as long as your arm for probing the dark corners of your installations:
- Get a list of databases that are not being backed up
- Logins Permissions Report
- FileGroup Listing - Object (Table/Index) listing in filegroup/files
- SQL Server 2005 - Backup, Integrity Check and Index Optimization
- Find out which SQL Agent jobs are currently executing
- Get facts on whether statistics need to be updated
- Run SQL (T-SQL Statements or Stored Proc) on multiple Servers
- SQL Server data/log usage for each drives
- Find out if your local SQL Server is configured to use /3GB switch or not?
- Identify orphan SQL Server user accounts from your local SQL Server.
- See what's happening behind the scenes of TempDB database.
- Script to find out which login has what server role permissions.
- Create Audit Table
- Drop all foreign keys and tables in a database
- Control Database Mirroring Failover and Failback thru script.
- Defrag indexes from one or all databases.
- Auto Failover SQL Jobs during Database Mirroring Session.
- Backup all User and System Databases excluding Mirror Databases.
- How to determine the space used by Tables in the database.
- Quickly get database information
- Detect highest CPU bound queries
- Get SQL Server Agent job configuration information
- Rebuild all indexes in a user database
- How to get current date and time of a system.
- Changing the owner of DTS Packages
- Get a snapshot of My Computer in SQL Server
- Get details about your database
- Get the Facts about your Backups
- SQLAgent.out Size Alert
- Unrestrict database growth for one and for all
- To Index or to Defrag (Smart INDEX)
- Detect Real-time Deadlocks and Blocks
- Rebuild Indexes Online
- Are your indexes useful?
- Check the status of database mirroring session
- Checking Integrity and Consistency of Databases
- Track daily data growth
- Attach and Detach SQL Server Databases
Andrew Fryer's Blog : Microsoft SQL Server Community Utilities
Enterprise & Standard Edition Feature Comparisons:
Business Intelligence features of SQL Server Enterprise Edition:
(that are not in the Standard Edition)
Advanced Analytics
- Linked measures and dimensions
Scalability and Performance
- Partitioned cubes and distributed partitioned cubes
- Auto parallel partition processing
Integration Services
- Data mining query transformation
- Data mining model training destination adapter
- Fuzzy Grouping transformation
- Term Extraction transformation
- Dimension processing destination adapter
- Partition processing destination adapter
Reporting Services
- Infinite clickthrough in ad-hoc reports
- Scale-out operational report configuration**
Data Warehousing
- Data compression**
- Star join query optimizations**
- Change data capture (CDC)**
- Scalable data marts and reporting**
Data Mining
- Parallelism for model processing & prediction
- Advanced Data Mining algorithms with algorithm plug-in API
- Advanced configuration and tuning options for data mining algorithms
** double-star denotes feature of SQL Server 2008
Introduction to New Data Warehouse Scalability Features in SQL Server 2008
One workaround for the notoriously slow insert times from SSIS into Oracle.
SSIS 2005. How to avoid Autocommit when load into OLE DB destination.
An example with using of Oracle OLE DB driver on a destination side.
I have heard from my friend that he is experiencing very slow load into Oracle database. He used SSIS 2005. Which is funny, because Oracle Warehouse Builder is available. Anyway.
The cause of slowness of SSIS, when it loads into non-Microsoft OLE DB target, appeared to be the commit after every insert. Since the ‘Autocommit’ is set to be ‘ON’ by OLE DB standard, a driver propagates this mode to the database. The only way to turn it off, is to switch OLE DB datasource into a Transactional mode by issuing a BeginTransaction() call.
Borrowing liberally from Jack Corbett's scripts will get you a great custom error handler. No more creating and maintaining error tables for SSIS data flows!
Anyone who has used SSIS error handling has been frustrated that it returns an Error Code with no description. I decided that I needed a solution. Googling the problem led me to this blog by Jamie Thomson which had a link to a Script Component that gets the Error Description. Excellent! While this was I good start, I also wanted to see the data that caused the error. I started by including the data and outputting to a text file, but, being a database pro, this did not sit well with me. I started looking for a solution that would allow me to store the data in a table and be useable in each dataflow task, and, once again, the internet provided a solution, this article by jwelch on agilebi.com which I highly recommend reading.
My solution is uses the Input0_ProcessInputRow method and reflection, as shown by jwelch’s article, to loop through the columns, building an XML string (name-value pairs) which I insert into a table. I can then query the table to see the errors and the data. I can then verify that I fixed the errors from earlier loads.
Here is the solution:
Useful way to build custom data flow scripts, with increased performance benefits in Oracle, and access to other custom data providers and APIs.
I would like to call stored procedure inside Script task.
Any example out there?
Also useful when building a custom error script in SSIS.
Address Columns Generically In a Script Component
When writing script components in SSIS, it's often useful to be able to process the columns without knowing exactly what they are. For example, you may want to loop through all the columns to check for a conditional, like NULL values in the columns. Or you may want to take a set of columns and concatenate them into a single string for output, or output each one as an XML element. In any of these cases, you don't necessarily care about the individual column details.
One option is to use Reflection. This is fairly easy to set up, but is not the fastest performing option. To use Reflection, add this to the top of the Script class:
BI Thoughts and Theories : Address Columns Generically In a Script Component
After a bit of frustrations, I got this to work with a SqlClient rather than an OLE DB client by parsing out the connection string and doing a string replace to remove the provider and other invalid information.
I'm using something similar to create a custom error handler that exports to an XML column. No more error tables for each dimension table.
The script is very similar to the one from the previous article, though it is a bit simpler. The XML values are prepared by iterating through the Input Collection metadata, and using the Reflection classes to read the Row object dynamically. The resulting value is put into the XML column on the output.
The OLEDB Destination simply maps the XML column to an XML column on a database table.
This is a good alternate approach to the previous one, particularly if you need to send the XML to a database instead of a flat file.
Gartner has released the 2008 Magic Quadrant for BI Platforms.
Looks like BI has come of age, according to Gartner. We have leveled off and are coasting rather than innovating.
Note that for the 2008 Magic Quadrant, given the maturity of the market, the Innovation criterion was not rated separately. Instead, it was factored into the Market Understanding and Offering (Product) Strategy criteria.
Magic Quadrant for Business Intelligence Platforms, 2008
SAS is the visionary leader, not much surprise here. Microsoft and Cognos are the overall leaders. Panorama, arcplan, and newcomer Board International are all niche players.
Truly the last two years have been one of consolidation. The next few years should see another round of innovation and new product offerings, if the cycles continue.
One interesting product that IBM returned to the mothership was Cognos. It's technology that came out of IBM now returning to the fold.
The one player that's missing from this puzzle? Google.
Companies such as Panorama are now promoting integration with Google technologies, though this article from 2006 seems to promote the idea of Google itself as a BI tool. When I think of BI in terms of the quadrant I think of reporting rather than searching, so Google really doesn't fit into this space just yet - unless you want to count Google Analytics.
It will be interesting to see what the next year brings, with the launch of SQL 2008, PerformancePoint improvements, Microsoft's MDM tools, integration of Cognos and Applix with IBM software, and whatever happens with SAP and their BO.
My first pick would be an easy-to-use tool that brings relational, unstructured and multidimensional cube data together under one umbrella with point & click or easily maintainable scripts for ETL consolidation and data scrubbing, and a place to put planning numbers and napkin-style forecasts. Preferably with some fancy charts, graphs, dashboards, and output to PDF, Powerpoint & Excel (and Visio?). Without any installation or security hassles, java virtual machines, memory leaks, bloatware, or incompatibilities with other software.
What about using the OCZ NIA as an interface and the NVidia Tesla GPU for calculations? Why not add a little e-Ink to help save some trees?
Too much to ask?
There are 3 new certifications out of Microsoft... the "Masters" series.
Microsoft Certified Master Product Overviews and Requirements
Published: June 10, 2008
IT professionals who hold Microsoft Certified Master technology-based certifications are recognized as holding the highest level of technical certification for Microsoft server products. Find out if you have what it takes to become a Microsoft Certified Master by reading the program overviews and prerequisites.
Microsoft Certified Master Product Overviews and Requirements
A few of the more common things that will probably break your SQl 2008 upgrade if you have not run the upgrade advisor for SQL 2005 & corrected them:
Database compatibility < 80
*= & =* joins
SET ROWCOUNT for INSERT, UPDATE, DELETE
RAISERROR syntax
More here...
Deprecated Database Engine Features in SQL Server 2008
Changes that may break SQL 2000 - 2005 upgrade.
http://msdn.microsoft.com/en-us/library/bb510680(SQL.100).aspx
Time intelligence - a missing component of SQL - is available with some scripts like these.
Excellent article about transaction logs in SQL. My best tip: Always backup the transaction log and clear unused entries. If your log hasn't been backed up before, back it up twice to be able to shrink it. Never use the TRUNCATE command. It is much safer to use the backup facility to maintain the log.