Sunday, May 24, 2015

Merge Join

The Merge Join can be an extremely useful transformation. With relatively low effort it can join together two datasets and provide an output that spans multiple platforms. The transformation sometimes gets a dodgy reputation because of its need for sorted data. But with a little planning the data can often be pre-sorted to avoid this pitfall.
With the scenarios , I didn’t get into any specifics on performance and when you’d choose this method or maybe a Lookup transformation. This question can often be answered by looking at two factors:
If you are pre-SQL Server 2008, then when you have static data for comparisons this method can provide a nice boost over a Lookup transformation.
If the number of rows are in the 10’s of thousands or past a million records you will likely want to use a raw file over a cache for the Lookup transformation. A lot of time can be spent building a cache to have the index and avoid needing to sort data. Test both ways because the Merge Join solution performance may surprise you.

No comments:

Post a Comment