Word Embedding Distance

Here we use word embedding instead of hot encoding, using a Word2Vec Learner node. The hidden layer size is set to 10, therefore producing an embedding with very small dimensionality. Output of the Word2Vec Learner node is a model. Vocabulary Extractor node extracts the words from the model vocabulary and provides their embedding in form of collection. Collection items are isolated using a Split Collection column node and the distances between word emebedding vectors are calculated. At the end, n selected words are visualized on a scatter plot, to show proximity of same semantic words across different embedding coordinates. The String input node allows to insert one selected word and retrieve all word distances from that word. Smaller distances should correspond to closer words in context or in meaning.

Word Embedding Distance

 

Resources

EXAMPLES Server: 08_Other_Analytics_Types/01_Text_Processing/21_Word_Embedding_Distance08_Other_Analytics_Types/01_Text_Processing/21_Word_Embedding_Distance*
Download a zip-archive

Blog:

 

 


* Find more about the Examples Server here.
The link will open the workflow directly in KNIME Analytics Platform (requirements: Windows; KNIME Analytics Platform must be installed with the Installer version 3.2.0 or higher). In other cases, please use the link to a zip-archive or open the provided path manually