Monday, August 11, 2014

Microsoft Parallel Data Warehouse :Design SSIS for Performance and Scale

I’d like to start a new series about SSIS design pattern for performance and scale. I already introduced the general methodology of performance tuning in an earlier blog post SSIS Performance Tuning.
The reason why I decided to write about this topic was a customer POC that I did some month ago and I recognized in several trainings and presentations I gave that this field is not well known in the common field. So I decided to wide my audience. The POC was done together with theSQL CAT and I want to take the chance to say thank you to Alexei Khalyakoand Thomas Kejser for their support.
This series will be structured in the following 3 areas:
  • Overview and reference architecture
  • Baseline tests
  • Scale out strategies

Overview and reference architecture

The purpose of this POC was to show how fast can we go with SSIS and to identify design pattern that allow us to scale for performance. Important to mention here is that we were not trying to build the fastest SSIS package for data loading. This area was already touched in the ETL world record byThomas Kejser and Henk van der Valk. Our focus was on the general design and how good different design pattern are able to scale.
Also worth to mention is that we did our testing on a FastTrack system. Why FastTrack? Because we have chosen FastTrack as a potential customer Data Warehouse platform and wanted to understand, how good FastTrack supports scalable ETL processes.
In short words FastTrack is a reference architecture for Data Warehouses build on SQL Server. This architecture includes guidance for a balanced hardware architecture (in order to avoid bottle necks) and SQL Server settings like startup parameters, database layout and design. If you have more questions about FastTrack then have a look here:


imageSome words about the server itself:
  • 32 cores
  • 512 GB RAM
  • a Storage system with a sequential read throughput of 6.7 GB/sec and a sequential write throughput of 3.2 GB/sec (6 MSAs with altogether 126 HHD drives) – measured with SQLIO to simulate SQL Server workload
  • 5 HBAs
  • 2 SAN switches
  • SQL Server 2008 R2
  • a detailed specification can be found here: ProLiant DL580 G7 and P2000 G3 MSA
In summary we have a very powerful server with a balanced architecture which is not I/O bound.


imageBecause we wanted to do our testing in a more scientific way we have chosen a more generic data modelTPC-H. The good thing with the TPC-H data model is that we had also the availability of a data generator and we have now some baseline tests for a reference data model, so that we are able to compare results that we achieved on this platform with platforms from other customers.
We did all our tests with the LINEITEM fact table. We used this table as data source for our ETL processes and loaded the data to a destination table with the same structure as the LINEITEM table but with different designs strategies (with Clustered Index and without, with compression and without, …).


Before we start dig into the details of our test scenarios we need to talk about some preparation steps that I recommend.
imageWe need to make sure our hardware gives its maximum. To do so we need to increase the “Max Jumbo Buffer” Setting of our network card to 8192. This will increase our throughput by 15%.

imageWithin our second step we need to tune the SSIS Connection Managers and Dataflows.
In order to increase the throughput for our database connections we set the network package size to its maximum of 32767 bytes (by default this value is set to 4,096 bytes). Increasing the packet size will improve performance because fewer network read and write operations are required to transfer a large data set. This will give you another 15% more throughput.
Each dataflow has a property called “DefaultBufferSize” which specifies the size of each buffer in our execution trees. Since we have enough memory and a good I/O system that is able to feed our buffers fast enough we can increase this value to its maximum. Per default the value is set to 10485760 bytes (10 MB) and we increase this value to 100 MB (maximum).
For fast ETL wee need to make sure to be minimal logged. Within SSIS we have to apply different settings dependent of our target table design. For more details about minimal logging and the different settings please have a look at the Data Loading Performance Guide.
When we load into a Heap table it is enough to specify the “Fast load” option and the “Table lock” in the OleDB Destination task.
imageFor a clustered index we need to define the trace flag 610 on each destination connection. Please keep in mind the the user that is used for the connection authentication need to have sysadmin permissions in order to be able to specify the trace flag. If this is not possible, you should avoid loading directly into a clustered index and go for partition switching instead.
Additionally if you use the same connection in different OleDB destinations at the same time, SSIS will create one connection per destination task on its own. To make sure that the trace flag is used within all connections we need to create a connection manager per destination task.
As you can see in the screenshot above I added a SQL task before my dataflow that executes the following statement (please do not apply this trace flag as startup parameter for SQL server):
To make sure that you are really minimal logged you need to check the length of the log records in your destination table. You can use the following SQL statement after the loading process:
select operation, context, [log record length], AllocUnitName
from fn_dblog(null, null)
where allocunitname like 'table name'
If the operation is minimal logged the log record length should be around 92, if it is fully logged you will see values of around 264 and higher.


imageNow that we made sure that we gone through all of the preparation tasks we can now have a look on how to design our SSIS packages:
• One stream or multiple streams ?
• MAX DOP high or low ?
• One table or one partition ?
• What is better HEAP or Clustered Index table?

For your Reference

No comments:

Post a Comment