Address Deduplication

An address data set with duplicates is matched via similarity search on a reference address data set. To each row of the address data set the row of the reference data set with the minimum distance is assigned. The distance between two address rows is the mean of the 2-gram Dice distances of the 'name' and 'address' columns.

