Thursday, October 29, 2015

Microsoft goes All-In on Analytics

It has been a busy few months in the data analytics space, especially at Microsoft.  To say the least.  Next year will be even busier, when many of these solutions converge further.

Microsoft announced General Availability of PowerBI, and brings many of its cloud features to the desktop.

Jen Underwood reveals the splendor in PowerBI Desktop.  PowerBI Desktop can be published to a Pyramid Analytics server for on-prem solutions.  PowerBI isn't just in the cloud anymore.

Microsoft acquires Datazen, and some multidimensional dashboards come out of a DataZen server in SQL 2016.

SQL 2016 with R & JSON integration.  Hold up, Revolution Analytics is coming to SQL?  JSON in SQL?  No more FOR XML madness?

New JSON support in T-SQL, the SQL Server query language, includes OPENJSON to parse JSON text and return tabular data, JSON_VALUE and JSON_QUERY for querying JSON data, and ISJSON for validating JSON-formatted text.
The syntax for the R external functions seems suspiciously familiar to what Apache Hive provides..

PIP INSTALL JUPYTER

ML Blog Team talks about Jupyter Notebooks for Azure ML Studio and Nitin Mehotra gets us started with Zeppelin Notebooks and Apache Spark on Azure HDInsight.

Michael Erasmus gives us an interesting mashup of Python and SQL running in a Jupyter notebook.
http://michaelerasm.us/redshift-and-ipython-bffs/ 

What's nice about this is the inline SQL statement, effectively removing the need for quoting & appending multi-line SQL together.  What else is useful is the ability to capture the results of the SQL query as a data frame for analysis and visualization.

A similar method might be used with MDX queries and Analysis Services OLAP cubes.
http://bkanuka.com/articles/python-connect-to-olap/

Perhaps if you have some web services connected to Excel Services or PowerPivot, this method could get really useful for exposing your data.  I would imagine writing some DAX queries nicely formatted in Jupyter python and returning data frames to R would be of interest for a couple people... well maybe just me. ;)

Speaking of DAX...

Kasper de Jonge writes about SQL 2016 includes over 50 new DAX functions
http://blogs.msdn.com/b/analysisservices/archive/2015/09/02/what-s-new-in-microsoft-sql-server-analysis-services-tabular-models-in-sql-server-2016-ctp-2-3.aspx

Some of the cool ones are Natural Inner Join,

CONCATENATEX(Employees, [FirstName] & “ “ & [LastName], “,”)
Returns "Alan Brewer, Michael Blythe"

SUBSTITUTEWITHINDEX (Left Semi Join in PowerPivot?)

ADDMISSINGITEMS (What do you mean we have no sales for March in Tuvalu? We need to see a March column for Tuvalu on the report!)

Only 46 functions left to check out...


Tuesday, October 13, 2015

Unit Testing and Mocking T-SQL Objects with tSQLt

tSQLt is an open-source framework for unit testing SQL Server code.  You can get it here.  The Red Gate SQL Test application provides an installation, management framework, support, and testing harness for SQL Server Management Studio.  ApexSQL has also announced a competing offering, to be released in 2016.

The most useful functionality, in my opinion, is the ability to mock or fake objects.  See Greg Lucas' post "A mock too far?" for some useful scenarios.  There are many other tSQLt tutorials and examples on the DataCentricity web site for Test Driven Database Development (TD3) using mocks.

What do you mean, mock or fake objects?
When running a test or test suite with a SQL testing framework like tSQLt, you arrange your test, act on the item under test, and assert your results match expectations.  The framework handles the assertions, teardown and rollback of any items changed by the test.  What mocking allows for is the faking of dependent items not central to the purpose of the test.  It provides a shell or template object that is a clone of an existing object without the clutter of reams of data, foreign keys or constraints unrelated to the test.

tSQLt allows you to fake tables.   Behind the scenes, tables are renamed, constraints are removed and all columns are set to nullable.  This allows for you to assert / check only the columns and rows of data required for your test.  Instead of sifting through millions of rows looking for your changes, you can assert that your input row was added to the destination table with a simple count.  You can create an expected table, and use tSQLt's AssertEqualsTable to compare that the columns you are testing contain your expected results.  You can take your expected results from a table, fake the table, then ensure that the results are repeated after execution of the object under test.

It's kind of like Where's Waldo, with only a white page and Waldo sitting in the middle of it.  It becomes very obvious when things are and aren't working, since the noise has been removed from your test.

tSQLt allows you to spy on procedures.  Behind the scenes, procedures are emptied out, and only the shell or "contract" remains.  The parameters of the procedure remain, with an insert into a logging table.  This allows you to inspect what is being sent to procedures to ensure the item under test is passing along the expected parameters.  It also allows you to inject code into dependent procedures, perhaps to test functionality or changes that would exist in the future once that other team gets finished with their deliverable.

These types of actions follow the Test Double pattern.

How can I use this when debugging SQL code?
Fake out and spy on all dependent objects to get a better understanding of how the procedure under test is acting against them.

I have found Faking and Spying are also useful when debugging and understanding poorly documented procedures, since you can interactively view your data during a test, roll back the changes, and visualize what occurred in the destination table.  If you are working with a large, legacy procedure or set of procedures, this tool becomes invaluable for tracing data lineage and transformation logic.   A side effect is you'll probably end up with some automated unit tests on your legacy code.

Martin Fowler describes these mocks or fakes as a kind of Stunt Double, taking the bombardment of your tests while your true actor is sitting on the sidelines.



In the past, I used synonyms for similar purposes, however creating fakes for testing rather than replacing synonyms is much a much easier task with tSQLt.

If you are using SQL Server for more than just a big dumb ORM storage or CRUD layer, it's worth looking at a unit testing framework such as tSQLt and the ecosystem around it.