Friday, October 09, 2015

Fuzzy Lookup and Fuzzy Grouping

Fuzzy Lookup enables you to match input records with clean, standardized records in a reference table. The matching process is resilient to errors that are present in the input records. Fuzzy Lookup returns the closest match and indicates the quality of the match. For example, customer information (name and address) that is input during a new sales transaction may not match exactly with any record in the Customers reference table, which consists of all current customers, because of typographical or other errors in the input data. Fuzzy Lookup returns the best matching record from the Customers reference table even if no exact match exists, and provides measures to indicate the match quality.
Fuzzy Grouping enables you to identify groups of records in a table where each record in the group potentially corresponds to the same real-world entity. The grouping is resilient to commonly observed errors in real data, because records in each group may not be identical to each other but are very similar to each other. For example, Fuzzy Grouping is useful for grouping together all records in a Customers reference table that describe a single real customer.
Fuzzy Lookup and Fuzzy Grouping provide easy-to-use solutions to complex, commonly encountered data cleaning problems. While they bear some relation to existing approaches, such as soundex, rule-based systems, edit-distance-based systems, and full-text search, Fuzzy Lookup and Fuzzy Grouping have several advantages:
  • Fuzzy Lookup and Fuzzy Grouping use a custom, domain-independent distance function that takes into account the edit distance (for example, "hits" is distance 2 from "bit"), the number of tokens, token order, and relative frequencies. As a result, Fuzzy Lookup and Fuzzy Grouping achieve much finer discrimination than full-text searches because they capture a more detailed structure of the data.
  • Because they are purely token-driven, Fuzzy Lookup and Fuzzy Grouping do not have a language-dependent component like soundex.
  • Because they use more than just edit distance, Fuzzy Lookup and Fuzzy Grouping are not as easily misled by transpositions and can detect higher level patterns than an approach that uses only edit distance.
  • Fuzzy Lookup and Fuzzy Grouping are tightly integrated with SSIS, which makes them easily usable with little or no custom programming for ETL tasks with SQL Server 2005.

SSIS Interview Questions

No comments:

Post a Comment