Deduplication is the process of identifying redundant records in a data set referring to the same real-world entity and subsequently merging these together. Address data sets often contain slightly different records that represent identical addresses or names. Names of persons, streets, or cities may be written differently, are abbreviated, or misspelled. For example consider the following two addresses:
- Muller Thomas, Karl-Heinz-Ring 3, 80686, Allach
- Mueller Tomas, Karl-Heinz-Ring 3, 80686, Munich Allach
To deduplicate address data sets the records can be matched on a reference address data set in order to normalize their name and address notations.