Goin' to the Showroom – Matt Zuberko, SQL Server Professional

The decision to start from scratch with new hardware can be uncomfortable, and it’s understandable. Fortunately, the process was made easier by the Wintel and SAN engineers by the virtual of two major projects they were coordinating.

The first was the move to a new data center to take advantage of more economical hosting arrangements. Rather than determine a downtime window to move our current hosts, we opted to leave the current gear as-is and build the proper gear in the new hosting facility. The second was the initiative to utilize a converged infrastructure model. This necessitated changing hardware and storage vendors to a new corporate standard to benefit from economies of scale financially and with pooled knowledge from the engineers.

The remaining, and still the largest, challenge was to properly size the hardware to address current and anticipated end-user needs for the life of the hardware. Fortunately, Fast Track has quantified the approach in a way to help you first determine if your application meets their design model and then configure a solution that will scale up smoothly for the life of the hardware.

What is Fast Track?

There is an enormous amount of literature and research available about the premise behind the Fast Track initiative. You’re one stop to get started would be James Serra’s blog. James, a Microsoft evangelist, has blogged extensively about Fast Track and has links to all versions of the methodology over the years, taking advantage of new SQL Server features in each iteration. He also has links to each major vendor’s design solutions adhering to the Fast Track architecture guidelines.

One thing that is the cornerstone of the initiative is the key premise of a balanced hardware design where CPU, memory and disk subsystem throughput are optimally balanced for the data models and IO patterns of data warehousing.

Unlike OLTP access patterns which gravitate towards normalized data models and hundreds or thousands of transactions per minute with data access that generates random IO patterns, data warehousing lends itself to star-schema or comparable data models based on Ralph Kimball or William Inmon’s philosophies, where large fact tables are scanned for aggregation of data.

The sequential scanning of fact tables into memory, whether as part of an ad-hoc query by a power user or for consumption by Analysis Services in processing an OLAP cube, focuses not on the number of IOPS the disk subsystem can deliver but the throughput.

While random IOs are of the 8k page variety for performing discrete lookups, sequential IOs in scanning a table can go beyond to 64k extents (8 contiguous 8k pages). If you’re running Enterprise edition SQL Server, the read-ahead feature of the DB engine, where the engine anticipates your need to read to the end of a table or index, can increase the throughput to up to 512k per IO.

Hitting throughputs of that velocity requires not only modeling the data to fit the access pattern but also loading it. Fast Track goes into detail about the number of filegroups and data files per database, the use of various trace flags to ensure the largest size extents are allocated for more contiguous access to data, the use of compression to reduce the overall disk and memory footprint for your content, the use of partitioning to allow for easier incremental data loading and better opportunities for focused data retrieval. The lastest release for Fast Track covering SQL Server 2016 covers the increasing benefits and capabilities of columnstore indexes.

The best foundation for your research will be two whitepapers that discuss implementation of the initiative with hosts running SQL Server 2008 and SQL Server 2012. While dated because they are written at the time when rotational storage was still the mainstay, the explanation of the concepts and the testing methodologies are well-defined.

The best visual explanation for what the initiative strives to achieve with regards to hardware configurations comes from the 2008 Introduction…

In sum, the image above describes how a bottleneck can occur on a data warehousing host when one or more components are not in alignment capacity-wise. In one situation, you can have enough cores on your host to process 6 GB of data per second, but have HBA cards to your SAN that can only give you 1 GB per second. In another situation, your CPU and HBAs are in alignment, but your SAN can’t give you the sequential throughput to max them out. Lastly, you can have your HBAs, fiber interconnect and SAN storage be in alignment, but don’t have enough CPU throughput in core count to take advantage of it.

The properly balanced system starts with your core consumption rate. If you have a 16-core system and you’re striving to hit sequential reads of 200 MB/s per core, or up to 3,200 MB/s for the entire instance, you’ll add HBAs which will work together to achieve that throughput, and those HBAs will be connected to a SAN that can achieve that throughput.

The number of cores you need to have is a function of the data model in use, the volume of data in your workload of queries, the target runtime of those queries and the number of concurrent users. Using the 200 MB/s throughput target (in the case of doing a simple aggregate on a single fact table), our math may work out to be like this…

A query on a fact table returns 200 million rows at 50 bytes/per row= 10000 MB = 10 GB.
The query, per SLA target, should finish in 60 seconds.
We anticipate 10 users running comparable queries concurrently.
Our formula: ((amount of data scanned per query/CPU core consumption rate) * Number of Concurrent Sessions)/Target Response Time
Our equation: (10000 MB/200 MB/s) * (10/60s) = 8.33 cores (which we round up to the nearest available option based on socket and core count per processor supported by our vendor.

Based on calculations like these, we determine the minimum number of cores to meet our performance targets for the life of the hardware. These assumptions and projections are critical as it affects the hardware models we get to choose from and also our SQL Server licensing costs. At $7,000/core for Enterprise edition SQL, we’d like get our calculations to be in alignment with user expectations to avoid overspending.

Picking Your Make & Model

At this point, we have a general understanding of how Fast Track works. Were looking for a balanced system where we can maximize the throughput of the entire system when doing sequential scans, like reading a fact table for a cube. Now comes the hard choice of picking the right make and model for the system. Computer hardware manufacturers are just as numerous as automobile manufacturers, and their salespeople can be just as persuasive. Just like it pays to do your homework online before stepping into the showroom to shop for car the same applies for computer hardware. How do you go about searching for the right fast-track hardware?

The big benefit of using a predefined configuration from a vendor is that they lay out the entire installation and then use Microsoft’s official testing regimen to determine how fast it will operate to set an official benchmark. Each vendor has a report at the end of its documentation. It’s a DW FT certification which describes the overall configuration of the host and key metrics for performance. Not all systems will have the same processors, RAM capacity, and disk subsystem configuration, but you can use the key metrics to determine overall performance and use that to compare similar systems side-by-side. A sample of one of their reports can be seen here, found in Microsoft’s Data Warehouse Reference Guide for SQL Server 2016.

Because of the general direction of the organization going to a converged infrastructure with Cisco UCS hosts, paired with Pure Storage’s FlashArray SAN solution, we were able to narrow down the list of systems we could choose from. Cisco and Pure Storage had a system configuration for use with a 20 TB data warehouse on SQL Server 2014 and it became the model for our purchase.