Deduplication of Address Data

The workflow shows the power of the new distance measurement framework - a high prediction correctness of possible matches is achieved with a minimum number of nodes and without any preprocessing by just aggregating some distances on different attributes. The chosen data set is the "Restaurant data set" from http://www.cs.utexas.edu/users/ml/riddle/data.html comprising 864 restaurant records and 112 duplicates. Each record contains a name, an address, a city, a type and finally a class attribute. Records with an identical value in the class attribute point to the same real-word entity or restaurant in our case.

Deduplication of Address Data

 

Resources

EXAMPLES Server: 50_Applications/13_Address_Deduplication/01_Deduplication_of_Address_Data50_Applications/13_Address_Deduplication/01_Deduplication_of_Address_Data*
Download a zip-archive

References:

 

 


* Find more about the Examples Server here.
The link will open the workflow directly in KNIME Analytics Platform (requirements: Windows; KNIME Analytics Platform must be installed with the Installer version 3.2.0 or higher). In other cases, please use the link to a zip-archive or open the provided path manually