Deeplearning4J Integration - Textprocessing
The Textprocessing Extension for the KNIME Deeplearning4J Integration adds the Word Vector functionality of Deeplearning4J to KNIME. This means that you can create so called Neural Word Embeddings which are very useful in many applications.
Installation details can be found here.
Generally, a Neural Word Embedding is just a numerical representation (vector of real numbers) that represents a word or even a whole document. This can be done using methods called Word2Vec or Doc2Vec that train a neural network which learns the structure of the text. One nice property of this network is that the size of the output layer can be user specified which directly correspond to the size of the resulting vector. Hence, we can choose what dimensionality our output should have. Therefore, we can effectively describe text in a low dimensional vector space.
The following Nodes are contained in the Textrocessing Extension for the KNIME Deeplearning4J Integration:
Word Vector Learner
The Word Vector Learner Node is used to train a Word2Vec or Doc2Vec model. The input of the Node can either be a String column or a Document column from KNIME Textprocessing. If the Node is configured to learn a Doc2Vec model the Node expects a additional String column containing a class attribute for each document. The output of the node is a trained word vector model which can be used by other nodes of this extension. All learning specific parameters like the size of the output layer can be adjusted in the Node Dialog.
The Vocabulary Extractor Node takes in a word vector model and outputs the learned words with corresponding word vectors as a table. Thereby, the output contains one column holding the word and one column holding the word vector as a collection column.
Word Vector Apply
The Word Vector Apply Node is used to apply a learned word vector model to documents. This is used to convert a document into a word vector representation using a previously trained model. By default the Node replaces all word contained in the input document by the corresponding word vector from the model. Word which are not contained in the model will be skipped. This results in a collection of word vectors for each input document. Additionally, the Node has a option to calculate the mean of all word vectors which will result in one singe vector which represents the document.
Examples Workflows can be found on the public Example Server.
Deeplearning4J IntegrationTextprocessing and GPU
Due to the implementation of the textprocessing functionality in the Deeplearning4J library, it is recomended to execute all Deeplearning4J Integration -Textprocessing nodes using the CPU. Performance on GPU may be sigificantly lower.
The KNIME Deeplearning4J Integration is available under the same Licence as KNIME.