Thursday, October 29, 2015

Microsoft goes All-In on Analytics

It has been a busy few months in the data analytics space, especially at Microsoft.  To say the least.  Next year will be even busier, when many of these solutions converge further.

Microsoft announced General Availability of PowerBI, and brings many of its cloud features to the desktop.

Jen Underwood reveals the splendor in PowerBI Desktop.  PowerBI Desktop can be published to a Pyramid Analytics server for on-prem solutions.  PowerBI isn't just in the cloud anymore.

Microsoft acquires Datazen, and some multidimensional dashboards come out of a DataZen server in SQL 2016.

SQL 2016 with R & JSON integration.  Hold up, Revolution Analytics is coming to SQL?  JSON in SQL?  No more FOR XML madness?

New JSON support in T-SQL, the SQL Server query language, includes OPENJSON to parse JSON text and return tabular data, JSON_VALUE and JSON_QUERY for querying JSON data, and ISJSON for validating JSON-formatted text.
The syntax for the R external functions seems suspiciously familiar to what Apache Hive provides..

PIP INSTALL JUPYTER

ML Blog Team talks about Jupyter Notebooks for Azure ML Studio and Nitin Mehotra gets us started with Zeppelin Notebooks and Apache Spark on Azure HDInsight.

Michael Erasmus gives us an interesting mashup of Python and SQL running in a Jupyter notebook.
http://michaelerasm.us/redshift-and-ipython-bffs/ 

What's nice about this is the inline SQL statement, effectively removing the need for quoting & appending multi-line SQL together.  What else is useful is the ability to capture the results of the SQL query as a data frame for analysis and visualization.

A similar method might be used with MDX queries and Analysis Services OLAP cubes.
http://bkanuka.com/articles/python-connect-to-olap/

Perhaps if you have some web services connected to Excel Services or PowerPivot, this method could get really useful for exposing your data.  I would imagine writing some DAX queries nicely formatted in Jupyter python and returning data frames to R would be of interest for a couple people... well maybe just me. ;)

Speaking of DAX...

Kasper de Jonge writes about SQL 2016 includes over 50 new DAX functions
http://blogs.msdn.com/b/analysisservices/archive/2015/09/02/what-s-new-in-microsoft-sql-server-analysis-services-tabular-models-in-sql-server-2016-ctp-2-3.aspx

Some of the cool ones are Natural Inner Join,

CONCATENATEX(Employees, [FirstName] & “ “ & [LastName], “,”)
Returns "Alan Brewer, Michael Blythe"

SUBSTITUTEWITHINDEX (Left Semi Join in PowerPivot?)

ADDMISSINGITEMS (What do you mean we have no sales for March in Tuvalu? We need to see a March column for Tuvalu on the report!)

Only 46 functions left to check out...


Tuesday, October 13, 2015

Unit Testing and Mocking T-SQL Objects with tSQLt

tSQLt is an open-source framework for unit testing SQL Server code.  You can get it here.  The Red Gate SQL Test application provides an installation, management framework, support, and testing harness for SQL Server Management Studio.  ApexSQL has also announced a competing offering, to be released in 2016.

The most useful functionality, in my opinion, is the ability to mock or fake objects.  See Greg Lucas' post "A mock too far?" for some useful scenarios.  There are many other tSQLt tutorials and examples on the DataCentricity web site for Test Driven Database Development (TD3) using mocks.

What do you mean, mock or fake objects?
When running a test or test suite with a SQL testing framework like tSQLt, you arrange your test, act on the item under test, and assert your results match expectations.  The framework handles the assertions, teardown and rollback of any items changed by the test.  What mocking allows for is the faking of dependent items not central to the purpose of the test.  It provides a shell or template object that is a clone of an existing object without the clutter of reams of data, foreign keys or constraints unrelated to the test.

tSQLt allows you to fake tables.   Behind the scenes, tables are renamed, constraints are removed and all columns are set to nullable.  This allows for you to assert / check only the columns and rows of data required for your test.  Instead of sifting through millions of rows looking for your changes, you can assert that your input row was added to the destination table with a simple count.  You can create an expected table, and use tSQLt's AssertEqualsTable to compare that the columns you are testing contain your expected results.  You can take your expected results from a table, fake the table, then ensure that the results are repeated after execution of the object under test.

It's kind of like Where's Waldo, with only a white page and Waldo sitting in the middle of it.  It becomes very obvious when things are and aren't working, since the noise has been removed from your test.

tSQLt allows you to spy on procedures.  Behind the scenes, procedures are emptied out, and only the shell or "contract" remains.  The parameters of the procedure remain, with an insert into a logging table.  This allows you to inspect what is being sent to procedures to ensure the item under test is passing along the expected parameters.  It also allows you to inject code into dependent procedures, perhaps to test functionality or changes that would exist in the future once that other team gets finished with their deliverable.

These types of actions follow the Test Double pattern.

How can I use this when debugging SQL code?
Fake out and spy on all dependent objects to get a better understanding of how the procedure under test is acting against them.

I have found Faking and Spying are also useful when debugging and understanding poorly documented procedures, since you can interactively view your data during a test, roll back the changes, and visualize what occurred in the destination table.  If you are working with a large, legacy procedure or set of procedures, this tool becomes invaluable for tracing data lineage and transformation logic.   A side effect is you'll probably end up with some automated unit tests on your legacy code.

Martin Fowler describes these mocks or fakes as a kind of Stunt Double, taking the bombardment of your tests while your true actor is sitting on the sidelines.



In the past, I used synonyms for similar purposes, however creating fakes for testing rather than replacing synonyms is much a much easier task with tSQLt.

If you are using SQL Server for more than just a big dumb ORM storage or CRUD layer, it's worth looking at a unit testing framework such as tSQLt and the ecosystem around it.




Sunday, September 13, 2015

Reporting Services 2016

After a couple years hiatus, the SSRS Team (via Riccardo Muti) has published some highlights of SQL Server Reporting Services 2016 CTP 2.3

http://blogs.msdn.com/b/sqlrsteamblog/archive/2015/09/02/what-s-new-in-reporting-services-in-sql-server-2016-ctp-2-3.aspx

Most of the highlights are for usability and appearance. No Metro tiles?
http://sqlkover.com/report-builder-changes-in-sql-server-2016/

Rather than SSRS features, my bigger interests within SQL 2016 Enterprise are Polybase and the SQL-R functionality.  Polybase will allow you to join your Data Lake with the Data Ocean that is your Enterprise Data Warehouse, and all things in between.

"PolyBase only works within PDW for now, but later it might be added to SQL Server (but there are no plans for that).  PolyBase relies on the Data Movement Service (DMS) in PDW, and DMS does not exist in SQL Server"
http://www.jamesserra.com/archive/2014/02/polybase-explained/

Apparently things change.  For those companies having file shares filled with structured, semi-structured, and unstructured data, replicating that data to a Hadoop HDFS cluster can allow you to immediately join it to SQL Server without ETL.

Or so I hear... :)

This might be a game-changer for some organizations.

Andrew Peterson spent some time with Polybase on his blog.
http://realizeddesign.blogspot.ca/

A few years ago I built a solution that utilized nothing but Analysis Services views to access data inside an Analysis Services multidimensional cube solution.  Some of those views actually queried Excel spreadsheets, Access Databases and CSV files instead of SQL Server.  At the time I thought it was kind of a mashup hack, though cool nonetheless due to lack of ETL effort and instant data availability.

Glad to see MS is catching up with this idea of schema-on-read, query-anywhere, anything.  You're still going to need to create external, structured views on each file format to a specific Hadoop cluster, which seems rather linked-server'y.  There are some other limitations to be aware of.  You'll also need access to a Hadoop cluster of course.  You do have one of those, right?

The Hybrid BI story is just beginning...
http://sqlmag.com/blog/what-coming-sql-server-2016-business-intelligence


Wednesday, September 09, 2015

SQL Treeo adds folders to your SSMS

Microsoft, why can't you have packages like Oracle or at least a folder structure in Object Explorer?

SQL Treeo brings sanity to monolithic databases and multi-tenant servers, capturing objects into folders.

http://www.sqltreeo.com/wp/

Saturday, August 08, 2015

SQL Server Collation Settings, Quelquechose de Vraiment Grav

Saw this post about a Windows 10 error message a couple weeks ago on Facebook and laughed my head off.

http://bgr.com/2015/07/30/windows-10-something-happened-error-message/

Microsoft has gone ahead and adopted the Sad Mac philosophy of error handling.
From a consumer's perspective, I guess this yodel is a tad better than Catastrophic Failure and reassures you that something might happen next.

From a developers perspective, you really should have included the bugcheck code, stack trace, or at least some other kind of error identifying the method call you are finally giving up on.

If not, one fix for fellow Canadians installing Windows 10 will be to ensure you meld with our neighbours in the south by selecting English (US) as the regional locale.  You selected French (Canada), didn't you?

http://www.technewstoday.com/25375-how-to-fix-something-happened-error-during-microsoft-windows-10-installatio/

Zut alors.  Didn't SQL have this issue a few years ago?  And Analysis Services?

Sorry MS, as a Windows 10 Insider I did not catch this bug.  Actually, I didn't really catch any bugs.  My Windows 10 install over the last year was a pleasant surprise, and I expect my main desktop to be upgraded soon.  Perhaps my only suggestion would be that if English (Canada) is selected, then error messages should be bilingual.  Seems like a waste of the second line of Something happened.

Instead of upgrading him, a family member who I provide tech support for asked me to just hide the "Upgrade to Windows 10" icon for him instead.  From a support perspective I though that was wise for now....  I used the scheduled tasks and registry edit methods.

http://www.tenforums.com/tutorials/6596-get-windows-10-icon-remove-taskbar-windows-7-8-1-a.html

A few times in the last year I've encountered various collation issues related to Windows Regional Settings and SQL Server installs, and the defaults MS uses for products like Sharepoint and Reporting Services.

To switch over collation of a single database, you can try this at your own risk.

1. ALTER the database to change the collation (after killing all processes that are using the database and dropping all user defined functions with varchar max that bind to the collation)

2. ALTER the columns (this script may need to be modified to include schema id's when dealing with multiple schemas.  This script was tested on SQL 2008 R2.


SELECT 'ALTER TABLE ' + o.NAME + ' ALTER COLUMN '
       + c.NAME + ' ' + t.NAME + '('
       + Cast(c.max_length AS NVARCHAR)
       + ') COLLATE ' + ' Latin1_General_CI_AS ' + CASE WHEN c.is_nullable = 1
       THEN
       'NULL' ELSE 'NOT NULL' END
       + ';'
--select o.name, c.name, t.name, c.collation_name, c.is_nullable, t.name, c.max_length
FROM   sys.objects o
       INNER JOIN sys.columns c
               ON c.object_id = o.object_id
       INNER JOIN sys.types t
               ON t.user_type_id = c.user_type_id
WHERE  c.collation_name IS NOT NULL
       AND o.type = 'u' 

Wednesday, July 29, 2015

Ninite - mass installer

https://ninite.com/ is a pretty awesome installer to build up a new PC.  Fresh install of Windows 10?  Check off Classic Start Menu and whatever optional software you want...

Ninite will

  • start working as soon as you run it
  • not bother you with any choices or options
  • install apps in their default location
  • say no to toolbars or extra junk
  • install 64-bit apps on 64-bit machines
  • install apps in your PC's language or one you choose
  • do all its work in the background
  • install the latest stable version of an app
  • skip up-to-date apps
  • skip any reboot requests from installers
  • use your proxy settings from Internet Explorer
  • download apps from each publisher's official site
  • verify digital signatures or hashes before running anything
  • work best if you turn off any web filters or firewalls
  • save you a lot of time!

Wednesday, February 11, 2015

Kerberos & Web Site

Sometimes all you need is one tool for a tough job.  If you're setting up Kerberos, that tool is the Kerberos Delegation Config & Report.

http://www.iis.net/downloads/community/2009/06/delegconfig-v2-beta-(delegation-kerberos-configuration-tool)

http://blogs.iis.net/brian-murphy-booth/archive/2007/03/09/the-biggest-mistake-serviceprincipalname-s.aspx