Monday, April 30, 2012

SQL Server Security - Site Home - MSDN Blogs

Instead of using CHECKSUM, timestamp, or other methods, hashing data may be a good alternative.

A common scenario in data warehousing applications is knowing what source system records to update, what data needs to be loaded and which data rows can be skipped as nothing has changed since they were last loaded. Another possible scenario is the need to facilitate searching data that is encrypted using cell level encryption or storing application passwords inside the database.

Data Hashing can be used to solve this problem in SQL Server.

SQL Server Security - Site Home - MSDN Blogs

Friday, April 27, 2012

SQL 2012 Labs

 

Some additional training content for SQL 2012, from John Weston’s site

MSDN Virtual Lab: SQL Server 2012: Working with the SQL Server 2012 Availability Groups

MSDN Virtual Lab: SQL Server 2012: Creating SQL Server 2012 Tabular BI Semantic Model with PowerPivot

MSDN Virtual Lab Express: SQL Server 2012: Exploring SQL Server 2012 Reporting Services PowerView

MSDN Virtual Lab: SQL Server 2012: Developing a SQL Server 2012 Tabular BI Semantic Model with Visual Studio 2010

Don’t forget SQL 2012 just RTM’ed get more info here:     http://blogs.technet.com/b/jweston/archive/2012/04/02/sql-2012-rtm-available-for-technet-customers-to-download-today.aspx

SQL 2012: More Free Virtual Labs on MSDN - John Weston Stories, Information, and random thoughts. - Site Home - TechNet Blogs

Single Text Can Wreck Windows Phone 7 Devices | Fox Business

Single Text Can Wreck Windows Phone 7 Devices | Fox Business
Apparently, sending a certain long message to a WP7 device kills the messaging app and requires a factory reset.  No minor glitch here.  Could carriers detect this message and filter it out? 

This could get ugly.  Back up your WP7 phones!

Thursday, April 26, 2012

Twitter Kills The Majesty of Pleasant Conversation


Words are powerful things.  As of late, I have been working on text mining social data feeds, and investigating how Hadoop, R, Azure and SQL Server 2012 fit into the big picture.  I wrote a SQL Integration Services package using a script task that pulls Twitter data from a syndication feed, parses out the words, hash tags, screen names and links, and stores the results in a SQL 2012 (or optionally SQL Azure) database.  Performing some text mining of Twitter status keywords against a selection of “best” and “worst” words in the English language brings back some interesting and slightly depressing results.

I started by entering in the database the twitter screen names to follow.  I targeted a few Canadian bike companies (don’t ask why).  Each time the package is run, it adds the last 20 tweets, and parses screen names of users that are mentioned in each tweet.  And so on.  This recursion builds a very quick six-degrees-of-separation trail and some fairly random discussions. 

Running this process for about 4 days, sometimes 2-3 times per day, produced 5599 tweets.  Originally I was looking at using R and Hadoop to analyze the results, which is a bit like bringing a ballistic missile to a knife fight.  To slice this data with SQL takes only a couple of seconds or less.  Perhaps reading the entire Twitter firehose or analyzing historic tweet data might change the architecture in the future.  For now, things are working pretty well. 

Of a selection of 5599 individual tweets, 9 contain the “best words” and 2135 have the “worst words” as rated by Vocabula Review.  That’s 38% of the sample that have an aura of foolishness or odium, and 0.1% that have an aura of fun and majesty.  The sampling is fairly small, with the top word “valley” only coming up 3 times. 

Another dataset with seeded with a more technology-centric list of twitter users like Robert Scoble some Microsoft folks I follow brought back similar results.  Running this process over the course of a month saved 59,583 tweets containing 796,171 words, links, screen names, emoticons and hash tags.

Of the 796k words, 24,171 came up in the “worst words” filter.  That’s about 30%.  A measly 282 came up in the “best words” filter.  That’s less than 0.001%.

The following Top 5 Words came up.

valley 74
azure 19
simplicity 11
bliss 10
recherche 8

  • Valley makes sense, with Silicon Valley, Napa Valley, and those other west coast valleys being discussed.
  • Azure makes sense, since a steady stream of Windows Azure propaganda continually bubbles from Microsoft.
  • Simplicity comes up a few times when people talk about Apple or iPad.
  • Bliss comes up because of Rob Bliss, a Washington Times columnist, and some comments about cranberry bliss bars.
  • Recherche, well, let’s chalk that up to the fact that some of the best words in the English language are French.  Mon dieu.

With only 140 characters to leverage, you would think that people would use words like “animadversion” or “cachinnation” to provide deep and meaningful expression.  Instead, you get the logorrhea that is the Twitter dataset.

Check out www.vocabula.com to improve your tweets and amaze your followers with fun and majesty.

The BI Software Tool Belt

Excellent list of must-have software for SQL Server and everything else.

My must-have software is:

  • Microsoft OneNote, Live Writer
  • SSMS Tools
  • BIDS Helper
  • WinDirStat

The BI Software Tool Belt

Wednesday, April 25, 2012

SQL Server FineBuild

A wizard for installing SQL?  Yes, it exists…

FineBuild provides 1-click install and best-practice configuration of SQL Server 2012, SQL Server 2008 R2, SQL Server 2008, and SQL Server 2005.

SQL Server FineBuild

Tuesday, April 24, 2012

CQRS & Separation of Church and State

CQRS is a design pattern acronym which stands for Command Query and Responsibility Segregation.  I would call this the R/RW pattern, or the separation of Church and State.  In a reporting application design, it would silo the read queries from the write commands. 

In its simplicity, it is designing two services for each application interface, one that reads and one that writes. 

That is it. That is the entirety of the CQRS pattern.

CQRS, Task Based UIs, Event Sourcing agh! | Greg Young

Since the bulk of applications have highly concurrent reads and minimal or bulk writes, thinking about this pattern during design is very important.  What if you were able to architect a system where you could spin up unlimited services that provide querying, and a few services that provide writes/updates, queuing them up for later?  Would this improve performance and scalability? What if you could split the data for these systems into unlimited read-only models and a few read/write models?

What if the data in the read-only models was effortlessly in-synch with the read/write models, with minimal latency and lazy consistency?

This is one of the tenets behind Azure and cloud computing in general.

You can do this in a SQL Server environment, albeit not so effortlessly and with some gotchas, using Log Shipping, Undo and database snapshots for a read-only picture of the database.  There is some latency, and keeping the databases in synch adds lots of overhead. 

SQL 2012 appears to solve some of the limitations of SQL 2008 R2 log shipping with it’s AlwaysOn capabilities for rolling out read-only, distributed clusters. It still doesn’t seem as effortless as it should be.

Replication also offers a solution, though it can be brittle and has its limitations and maintenance overhead. 

SANs also offer a high-cost solution for fast bulk loads, with some downtime.

You cannot use SAN snapshots of an active SQL Server database file(s) unless you detach them first. This is very intrusive.

I actually do this for a situation where I need 800GB of data from one server to another every night.

  • Detach the database on ServerA
  • SAN snapshot serverA drives F:,G:,H:,I:
  • reattach database on ServerA
  • detach database on ServerB
  • detach drive L,M,N,O on ServerB
  • mount new SAN snapshots L,M.N,O on ServerB
  • attach database files from L,M,N,O on ServerB

It requires an outage on both, but is very fast to virtually move the data to the other server.

http://social.msdn.microsoft.com/Forums/eu/sqldatabaseengine/thread/8f01f55a-66ab-4f96-a7bf-dca10bea64b8

Change data capture is another solution for maintaining multiple concurrent copies of a data stream.  SQL 2012 now has an Oracle CDC feature which improves the integration and reporting story.

The above solutions are very hammer-based approaches to the solution and don’t necessarily relate to CQRS.  Message queuing, eventing, change data capture and pub/sub systems all seem to fall within the CQRS design pattern, though they are really just an extension to the thought process. 

Chad Meyers has an excellent posting on Laws, Rules, Principles, Patterns, and Practices.

Another set of excellent posts on the concept of CQRS.

http://thinkbeforecoding.com/

There doesn’t seem to be a silver bullet to the solution just yet, though I’m sure someone will get it right sooner or later. The CQRS pattern is merely that, a pattern. The devil is in the details.

Monday, April 23, 2012

Ctrl+E not working for executing statements | Microsoft Connect

Upgrading from Denali CTP1?  Your keyboard shortcuts (CTRL-E, CTRL-R) are probably not working.  Use Options – Keyboard – Reset to fix.

Ctrl+E not working for executing statements | Microsoft Connect

Reloading Agent Settings appears in SQL Agent Logs every 90 seconds | Microsoft Connect

One of the things you may want to do as soon as you install SQL 2012 is disable the auto refresh feature of SQL Agent, which will fill up your agent logs with superfluous messages.

We investigated this issue and it's indeed painful behavior from the user perspectives, we will address this issue in nearest Service Pack.
For now I can suggest you the the following workaround:
Go to registry on your SQL Server machine and navigate to corresponding SQL Instance registry key: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\ MSSQL11.YOUR_INSTANCE_NAME\SQLServerAgent
Change registry settings called “AutoRegistryRefresh” from 1 to 0.
The side impact of this: if you change any default settings behavior for SQL Agent, you may need to restart SQL Agent to get it.

Reloading Agent Settings appears in SQL Agent Logs every 90 seconds | Microsoft Connect

Exporting via HTML instead of MHTML - CSS SQL Server Engineers - Site Home - MSDN Blogs

How to configure additional output formats (HTML) for Reporting Services.

There was a question on Twitter about how to display a report in HTML instead of MHTML due to some browser issues.  Based on the fact that it was MHTML, I’m assuming we are talking about exporting a report as the default report view is HTML.  First off, if we look at our export options for a report, we see the following:

Exporting via HTML instead of MHTML - CSS SQL Server Engineers - Site Home - MSDN Blogs

Thursday, April 19, 2012

Jonathan Kehayias | Installing and Configuring SQL Server 2012 Distributed Replay

The distributed replay feature of SQL 2012 allows for load testing a sampled workload against a different SQL server, and spinning up multiple workloads to simulate real activity.

Here’s the install instructions.

Jonathan Kehayias | Installing and Configuring SQL Server 2012 Distributed Replay

Monday, April 16, 2012

SQL Server 2012 Command Line Utilities — DatabaseJournal.com

The MAK talks about command line utilities in SQL 2012.

Microsoft provided many Command line utilities for SQL Server 2012.  In this article I am going to explain the functionality and use of these command line utilities based on the feature it is dealing with.

SQL Server 2012 Command Line Utilities — DatabaseJournal.com

Also, the details on PowerShell cmdlets for SQL

Friday, April 13, 2012

Paging Function Performance in SQL Server 2012 | Ms SQL Girl

Beware the new fetch feature of SQL 2012.  RBAR in disguise?

Although SQL Server 2012 new Paging function is intuitive for the purpose of paging a data set, it comes at a cost for large volume of data. The Top Batching and Direct ID Batching perform significantly better under similar circumstances, i.e. simple Source Data with continuous ID.

Paging Function Performance in SQL Server 2012 | Ms SQL Girl

There is also another bug around identity columns skipping increments after a SQL restart that might affect how paging functions in SQL 2012.  Be sure to test any applications doing server-side paging!

Thursday, April 12, 2012

TechNet Log Parser Studio

 

Log Parser Studio is a utility that allows you to search through and create reports from your IIS, Event, EXADB and others types of logs. It builds on top of Log Parser 2.2 and has a full user interface for easy creation and management of related SQL queries.

TechNet Log Parser Studio

Wednesday, April 11, 2012

Data Warehousing and Business Intelligence

 

Data Warehousing and Business Intelligence

Vincent Rainardi presents an excellent resource for a Business Intelligence project, with plenty of tips and tricks to succeed when building a data warehouse or reporting solution.

Monday, April 09, 2012

Aaron Bertrand : Two bugs you should be aware of

Questioning using SQL 2012 as a production environment?  There are a few critical issues with the current platform that may make you want to think about when SP1 is coming out….

This one is critical for those that think identity should be an auto-incrementing sequential key.

Just to add to your list. There is one more critical bug wherein if you define identity column on a table and insert some values. Thereafter restart SQL server and insert further more values and do a restart again and then when u insert in that table there are large gaps in the identity values.

Aaron Bertrand : Two bugs you should be aware of

This is Sparta...n SQL!: Scripting out statistics

An interesting technique for building a VSDB (Very Small Database) with a very large query plan profile.  If you have a huge database and don’t want to do performance testing of procedures in production, mirroring statistics could be a way to simulate this activity using tiny or even empty tables.

I recently read an excellent post by Kevin Kline regarding scripting out histograms from one database and applying them to another.  What would have made it better is if there was a way to automate it.  And so I pulled out powershell and got to work. 

This is Sparta...n SQL!: Scripting out statistics

There should really be a tool that clones the schema of a database and the statistics of a database while clearing out the data or subsets of the data.