Deduplication of Address Data

An address data set with duplicates is matched via similarity search on a reference address data set. To each row of the address data set the row of the reference data set with the minimum distance is assigned. The distance between two address rows is the mean of the 2-gram Dice distances of the 'name' and 'address' columns.

Deduplication of Address Data

 

Resources

EXAMPLES Server: 50_Applications/13_Address_Deduplication/01_Deduplication_of_Address_Data50_Applications/13_Address_Deduplication/01_Deduplication_of_Address_Data*
Download a zip-archive

References:

 

 


* Find more about the Examples Server here.
The link will open the workflow directly in KNIME Analytics Platform (requirements: Windows; KNIME Analytics Platform must be installed with the Installer version 3.2.0 or higher). In other cases, please use the link to a zip-archive or open the provided path manually