Monday, November 03, 2014

DOS tricks & tips

Because sometimes simplicity is best.

type *.csv >> data.txt - consolidate multiple files to a single file
copy a.csv + b.csv c.csv - consolidate multiple files to a single file 

http://www.powerpivotpro.com/2012/09/combine-multiple-worksheetsworkbooks-into-a-single-powerpivot-table/

Date formatting

Friday, July 04, 2014

The Myth of the Full-stack DBA

Andy Shora recently published a great article about "The Myth of the Full-stack Developer."  From his posts and tweets, I peg Andy as a front-end developer specializing in CSS and Javascript frameworks.

Andy's comment about the 30-point spider diagram really resonated with me.

Where do your web development skills lie?

The term Full-Stack applies to knowing different layers of architecture, of which the database (SQL/NoSQL/SomeSQL) can sit as one of those layers.  However, if you were consider yourself a "Full-Stack" SQL Server DBA, you might know the details of some or more of the following components of SQL:
  1. SQL Server
    1. Transact-SQL
    2. Backup/Restore
    3. Replication
    4. Log Shipping
    5. Query Tuning and Indexing
    6. Policies and Resource Governor
    7. High availability
    8. In Memory Querying
  2. SQL Agent
    1. Jobs
    2. Alerts
  3. Reporting Services
    1. Sharepoint Integrated Mode
    2. Native Mode
  4. Integration Services
    1. Importing/Exporting/Transforming Various Data Sources
    2. Connection Types
    3. Toolbox Components
    4. Project /Package Deployment Methods
  5. Analysis Services
    1. MDX/DMX/DAX
    2. Tabular
    3. Multidimensional
    4. Sharepoint
  6. Powershell Scripting
  7. TFS Integration
And of course the various ways to setup, configure, debug, maintain, monitor, license, upgrade and secure above tools along with the associated hardware and software.  Perhaps your job description may even include mentoring developers on best practices on interfacing with the layer that is the "database".

By no means an exhaustive list and depending on your environment your role as a DBA could include Active Directory maintenance and Voicemail support.

If you're a Microsoft consultant or a partner of Microsoft, you're probably intimately familiar with the old Microsoft Hamburger and its associated toppings.  

(RIP standalone PerformancePoint Server)

If you have expertise in one or more of the grills at the bottom of the stack, (SAP, Oracle, etc) your job just got a bit more important.

For some of those components in the stack, hopefully you're using best-of-breed third party tools from companies like ApexIdera and Red-Gate to improve your experience.  The ecosystem of helper apps is vast and wide.  Many feature limitations of products are considered "partner opportunities" by software companies.  If you want to be a Full-Stack DBA, you'll want to know what tools keep your stack upright.

If you were to consider yourself an expert on the internals of SQL, you might know about all of the components of this diagram, and take great pleasure in using tools like Internals Viewer to see how SQL arranges its storage and understanding the details of how GAM, SGAM & PFS Pages work.



Or you may have had a "crash" course on above internals while troubleshooting why a DBCC CheckDB job failed and your customers can't access their data.  It's best to be proactive about these things, start studying the diagram!

The ScaryDBA's recent post about The Curse Of Relational Databases also comments about The Stack of Information Technology in general, and how deep, wide, and unfathomable it is.  The comments provide  insight into the mindset of the community, and one comment mentions the idea of "Technical Debt".  A topic perhaps worth exploring into detail about in a future post, since this one is getting a tad rambling and epic.

Yesterday I attended a Toronto Hadoop User Group Meetup, with a discussion from the co-author of a book on YARN.  The topic was a deep-dive into the internals and features of the resource scheduler and application manager that is YARN.  Fascinating stuff (I'm not kidding!) that shows how much effort goes into the foundation of a scalable distributed data solution.

Speaking of deep, wide, and unfathomable....

The Big Data Developer Toolkit  

The Apache Open-Source Developer Toolkit

Hopefully your DBA stack expertise includes some Linux knowledge.





Monday, June 23, 2014

Choosing a model - Tabular vs. Multidimensional

Understand the limitations of the Tabular model and when Multidimensional model implementation is more appropriate.

It isn't as simple as Distinct Count = Tabular, Parent-Child = Multidimensional...
http://msdn.microsoft.com/en-us/library/hh994774.aspx
http://msdn.microsoft.com/en-us/library/hh212940.aspx

If you're still using Office 2007 or SQL 2008 R2, it might be a bit easier to make a choice...

Tuesday, June 17, 2014

Turing SQL Server - Is your power model sucking performance?

As stated in this hilarious Knowledge Base article, changing your power options to “High Performance” (an option Microsoft conveniently hides under a “more” menu to prevent you from accidentally finding it) resolves this issue by using more power all the time. In return, you need to accept that global warming is entirely your fault. Amazingly, no performance analyser of SQL server suggests you change this option.

Sometimes the simplest things can drastically change performance.  There should be an interview-style wizard or report for those who are new to setting up a server that would interrogate WMI to find out which settings may be sub-optimal for your usage scenario.

http://www.masterofmalt.com/software-development/blog/why-microsoft-windows-server-2012-produces-slow-and-inconsistent-sql-query-speeds/

cURL Super Powers

cURL is an extremely powerful tool that you can use, available on many platforms including Linux or Windows (via tools like Cygwin). It's powered by libcurl. Wget is another like-minded utility to download files recursively.

Here's an example to parallel download all 150+ Microsoft eBooks available from this site.
curl http://blogs.msdn.com/b/mssmallbiz/archive/2013/06/28/almost-150-free-microsoft-ebooks-covering-windows-7-windows-8-office-2010-office-2013-office-365-office-web-apps-windows-server-2012-windows-phone-7-windows-phone-8-sql-server-2008-sql-server-2012-sharepoint-server-2010-s.aspx | grep -o -E "http://ligman.me/[0-9A-Za-z]+" | uniq | xargs -P 5 -n 1 wget --content-disposition

So what's it doing?
First, download (or curl) the url specified.
Second,search the second URL for a matching regular expression.
Third, uniq(ue) the resultset.
Fourth, xargs to deal with the large parameter list
Fifth, wget to download the resulting URLs in a parallel fashion.

Why use both tools when they appear to do similar things?
http://stackoverflow.com/questions/636339/what-is-better-curl-or-wget
If you're a Windows user, and can't install Cygwin, Powershell can do some of the things that wget can do.
http://superuser.com/questions/25538/what-is-the-windows-equivalent-of-wget
(new-object System.Net.WebClient).DownloadFile('http://www.xyz.net/file.txt','C:\tmp\file.tx‌​t')
Here's a ton of GNU manuals, including wget.

Wednesday, April 02, 2014

Analysis Services 2012 and 2014 Performance Guide

Microsoft has released an Analysis Services 2012 and 2014 Performance Guide, 111 pages of multidimensional performance tuning goodness.

Is there a future for the multidimensional cube at Microsoft?  Sounds like it may stick around for awhile longer.... watch for improvements in SQL vNext.

Performance is important, and not just for improving business processes. Licensing rules for SQL 2014  will probably cause server administrators to try and keep their server utilization at full capacity.  Do more with less.

A similar whitepaper exists for PowerPivot and the tabular model in SQL 2012.

Tuesday, April 01, 2014

SQL 2014 Released, so what's the big deal?

SQL 2014 is released, in 2014 no less!  You can download the bits from Microsoft or get your free Azure trial (with a cap to ensure you don't get charged anything) to try it out.  Perhaps SQL Spade will help you to install it.  Or just browse the new features on Virtual Labs.

Microsoft has officially killed support for SQL 2005 compatibility mode, so before upgrading from prior versions be sure you've upgraded the compatibility mode of your databases.

The database engine has various improvements, fixes and tweaks to query plan optimization and partitioning.  Hopefully the query engine is tuned for the better.

There are some additional options for impersonation and separation of duties within the database.

Reporting Services, Analysis Services, and Integration Services, Data Quality Services, Master Data Management Services are mostly left in the dust.  Sigh....

Some of the cloud integration stories have improved dramatically.  Backup to URL.  Migrate to VM.  Hybrid blob storage.  AlwaysOn to Azure.  Azure's ExpressRoute functionality to move the public cloud to a private direct connection is now available.  This should improve adoption for those not wanting to transfer data over the public internet.  And provide companies with a secure form of offsite backup, DRP and replication to Azure.

Perhaps the biggest story with SQL 2014 is its In-Memory (formerly Hekaton) engine.  Or what I like to call DBCC PINTABLE+++.  This, along with its ability to use solid state storage for the SQL Buffer Pool, should provide for much faster query times when it's used correctly.  Think columnstore indexes but for an entire table, and updateable, and not a separately stored object.  Columnstore indexes in SQL 2014 can now be created as updateable, clustered objects that contain the data itself, in addition to the indexes.  In Enterprise Edition, anyway...

Note for SQL Server 2005:
This (DBCC Pintable) functionality was introduced for performance in SQL Server version 6.5. DBCC PINTABLE has highly unwanted side-effects. These include the potential to damage the buffer pool. DBCC PINTABLE is not required and has been removed to prevent additional problems. The syntax for this command still works but does not affect the server.


Ahhh, memories of data corruption, and arguments with other technology experts that there is no such feature post-SQL 2000 for pinning tables in memory.  "But I typed DBCC Pintable in and it came back with no errors!"   Whenever I want to do something that bypasses the internal workings of a database, I like to visit Ask Tom.   Even if you're not using Oracle, his site is a very fun an informative read.

Now back to SQL and the fine print.

Some of the licensing challenges to be aware of.  One challenge, where can you get some budget for 128 GB of memory for your SQL Server Standard boxes, and 512GB of flash memory for your buffer pool?  How about 4 terabytes on Datacentre with Enterprise edition, and some of those larger tables pinned in memory?

Fun times ahead with in-memory databases.



Friday, March 28, 2014

OCD Programming

A great StackOverflow answer using the railroad analogy to explain Branch Processing and why processors love predictability.

http://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-an-unsorted-array

I think the response that won the votes was because of the picture in the response, and the answer "Consider a railroad junction" which succinctly describes the problem you may have when dealing with unsorted data and path analysis.

I am consistently amazed at how much difference a good path makes in programming, especially with databases.  Ensure your tables all have primary keys.  Performance improves.  Add covering indexes for long running queries.  Performance improves.

These two things can sometimes fix many of the performance issues living within the database layer.

Clean up your code.  Ensure state is kept to a minimum, and conversations with slower systems such as databases and disk are batched up and cached.

And apparently sort anything coming down the pipe before trying to analyze it.

Predictability and patterns in your code, and being obsessive-compulsive when it comes to clean coding seems to mean a lot.  To some compilers, anyway.

Others will ignore all your hard work and try to determine the best way to solve your problem.  Seems to be the norm for the Oracle instance I'm currently using.

If you use a tool such as Windows Sysinternals process explorer, you can see what kind of effect these types of incorrect paths have on system performance.  You'll notice a lot of Access Denied, Key not Found or File not Found messages.  Systems could be so much faster if it just found the correct registry entry, filename, or internet address the first time, without going down multiple paths blindly.   Perhaps predictive analytics will soon solve this issue, so that each time you use your computer it will intelligently speed up your daily tasks.

Or perhaps I'm just being obsessive.

Tuesday, February 25, 2014

Modern IE

Great site from MS to download VMs for IE testing.

http://modern.ie/en-us

As a side effect, you can get VMs for Linux & Mac OS of all major MS operating systems!


Thursday, February 06, 2014

Windows 2012 Command Prompt

Gates Spends Entire First Day Back in Office Trying to Install Windows 8.1 : The New Yorker:

The New Yorker should have done their fact-checking.  From what I have read Bill Gates is known for his trucker mouth when dealing with staff.

I downloaded 4GB VM of Windows Server 2012 trial yesterday to brush up on my Microsoft O/S skills and missed the fact it was the "core" edition.  I was back in 1989 heaven when the DOS prompt came up on startup, after a nice UI for the login prompt.

Took me a few searches to figure out the UI wasn't broken, and some hacker-like Powershell to install the UI components.   Ended up failing since I guess I need to download the install ISO too.

I wonder what Bill Gates would think when his 31-year history of DOS operating systems did a full circle with Win 2012 Server.  Who would have thought in 2014 I would be staring at a command prompt typing "help".



'via Blog this'