Record Linkage: Similarity Measures and Algorithms

Nick Koudas, University of Toronto, Canada

Divesh Srivastava, AT&T Labs-Research, USA

Abstract

This tutorial provides a comprehensive and cohesive overview of the key research results in the area of matching algorithms and methodologies for identifying approximate duplicate records, and available tools for this purpose. It encompasses techniques and methodologies introduced in several communities including databases, information retrieval, statistics and machine learning. It aims to identify similarities and differences across the techniques as well as their merits and limitations.


<Back>