Document clustering

This workflow shows how to import textual data, preprocess documents by filtering and stemming, transform documents into a bag of words and document vectors, and finally cluster the documents based on their numerical representation.

Document clustering

 

Sentiment Classification

This workflow shows how to import text from a csv file, convert it to documents, preprocess the documents and transform them into numerical document vectors. Finally a predictive model is trained on the vectors to predict the sentiment class of the documents.

Sentiment Classification

 

Sentiment Classification with NGrams

This workflow shows how to import text from a csv file, convert it to documents, preprocess the documents and transform them into numerical document vectors consisting of single word and 2-gram features.
Finally two predictive models are trained on the vectors to predict the sentiment class of the documents. The two models are then compared via a ROC curve.

Fuzzy String Matching

This workflow demonstrates how to apply a fuzzy matching of two string. The string matcher was designed exactly for this task, but is limited to the levenshtein distance. You can edit the parameters of the levenshtein distance in the configuration dialog.

Subscribe to Text Processing