Discover Secret Ingredient

On one side we have a list of cookie recipes saved in a Word Document on the local machine. On the other side we have a web page with another new recipe available through web crawling.We are exploring the ingredient lists on one side and on the other to discover the secret ingredient for the ultimate Christmas cookie. ... and yes! They blend.

Hierarchical Clustering Visualization

This workflow shows how to build a hierarchy of clusters and visualize the hierarchy using the Sunburst Chart. It reads text data from a table. The data is taken from the 20 newsgroups dataset, divided into two categories, politics and sport. The data are first converted into documents, then they are preprocessed, i.e. tagged, filtered, lemmatized, etc, and later converted into document vectors. The next step is the clustering. A distance matrix is calculated using the cosine distance measure. Based on that, the documents are clustered hierarchically.

Tika Parsing

This workflow shows how to parse files of various formats as well as their attachments, if exist, using Tika parser nodes and detect the languages of the content using Tika language detector. Based on the detected langauge a filtering is applied to keep only English texts which are finally POS tagged.

Subscribe to Text Processing

What are you looking for?