An address data set with duplicates is matched via similarity search on a reference address data set. To each row of the address data set the row of the reference data set with the minimum distance is assigned. The distance between two address rows is the mean of the 2-gram Dice distances of the 'name' and 'address' columns.
EXAMPLES Server: 50_Applications/13_Address_Deduplication/01_Deduplication_of_Address_Data50_Applications/13_Address_Deduplication/01_Deduplication_of_Address_Data*
Download a zip-archive
* Find more about the Examples Server here.
The link will open the workflow directly in KNIME Analytics Platform (requirements: Windows; KNIME Analytics Platform must be installed with the Installer version 3.2.0 or higher). In other cases, please use the link to a zip-archive or open the provided path manually