And They're Off! – Matt Zuberko, SQL Server Professional

The day has come. Yesterday, I just got word from management that they’re ready to go live and we’re targeting the next week or two to cut over Tableau dashboards and Crystal Reports templates. My last steps to perform are basically topping off the source data, changing connection strings for the live ETL packages and awaiting user feedback.

This hardware upgrade took nearly 2 years to complete from the first proposal, through multiple debates, with hardware procurement approval only 6 months ago, ending with a month of setup and testing time. It was a high-risk project that required buy-in from multiple stakeholders in the IT organization. You have to come prepared with the right data to explain why you need this particular upgrade. For four rack-mounted servers, SAN controllers and enclosures and SQL licensing, you can easily go to seven figures. It helps to have the following information and data points in place before you jump into the ring.

Know your workload. Capture your PerfMon data for both idle and peak times as observed by the host of your SQL instance.
Know your current hardware. CPU and Memory aren’t enough. Know your disks, know your HBA cards, know your network cards.
Know your pain points. Do you see disk pressure memory pressure, CPU pressure, or network pressure? Is it consistent throughout the day, or just in specific times the day?
Upgrade in-place if you can before you replace. If your server is not out of warranty, see if adding RAM or local solid-state for TempDB can make an impact. Before you upgrade your SAN, see if you can add more HBA cards and access more ports on the switch to improve your throughput.
Partner with your SAN team. A SAN enclosure is the single most expensive item on the shopping list, exceeding even the SQL license. Exchange performance counter information with one another. The host level stats won’t match necessarily what they see on the enclosure level. It could be a small configuration change on their part that could make a significant difference.
Test your new hardware thoroughly. Run your disk tests with DiskSpd and Crystal DiskMark. Run the TPC-H benchmark test. Publish your results and review them with your hardware partners.
Test again with your workload. DiskSpd and TPC-H are controlled tests to determine maximum possible performance. Your workload might not be written to take advantage of the maximum possible performance. Note the difference and let your stakeholders know. It’s the first time they will understand and see how coding decisions will impact the performance of the application.
Adapt you workload to maximize your investment. Re-stripe your data if your partitions aren’t set up properly, see if older design decisions can be revisited (Do I need these temp tables? Do I need to aggregate like I did before? Can I switch to a column store index? …) You may discover that instead of being bottled up because of the disk subsystem being constrained, your server may have enough capacity to handle a more difficult workload.