Topicriver Red-Riding-Hood

The workflow builds a text stream visualization of a story, where we can see the frequency fluctuation of each character's mention as the stoy progresses. The Stacked Area Chart is used for the visualization and the story is one of the most popular written by the Grimms brothers, called Little Red Riding Hood.

Lemmatizer Preprocessing

This workflows shows a simple example on how to lemmatize terms in documents using the Stanford Lemmatizer node and also to show what exactly the Lemmatizer does to the input document terms, in comparison to other preprocessing nodes, for example the Snowball Stemmer.

Word Embedding Distance

Here we use word embedding instead of hot encoding, using a Word2Vec Learner node. The hidden layer size is set to 10, therefore producing an embedding with very small dimensionality. Output of the Word2Vec Learner node is a model. Vocabulary Extractor node extracts the words from the model vocabulary and provides their embedding in form of collection. Collection items are isolated using a Split Collection column node and the distances between word emebedding vectors are calculated.

Fuzzy String Matching

This workflow demonstrates how to apply a fuzzy matching of two string. The string matcher was designed exactly for this task, but is limited to the levenshtein distance. You can edit the parameters of the levenshtein distance in the configuration dialog.

Subscribe to Text Processing

What are you looking for?