Record Linkage: Similarity Measures and Algorithms
Nick Koudas, University of Toronto, Canada
Divesh Srivastava, AT&T Labs-Research, USA
Abstract
This tutorial provides a comprehensive and cohesive overview of the key research results in the area of matching algorithms and methodologies for identifying approximate duplicate records, and available tools for this purpose. It encompasses techniques and methodologies introduced in several communities including databases, information retrieval, statistics and machine learning. It aims to identify similarities and differences across the techniques as well as their merits and limitations.