Just KNIME It!

Prove your KNIME knowledge and practice your workflow building skills by solving our weekly challenges.

Here is how the challenges work:

     We post a challenge on Wednesday
     You create a solution with KNIME
     Upload it to your public KNIME Hub Space
     Post it in the KNIME Forum

Our solution to the challenge comes out on the following Tuesday.

Challenge 18: Categorizing Notes

Level: Medium

Description: A common problem in mechanical and medical data is associating notes with a category. In this challenge, you will automate the process of sorting mechanical notes into their correct category. Here’s an example:

INPUT

  --List of Categories--

  1. Scratch
  2. Crack
  3. Defect

  --Notes--

  1. The product was defective.
  2. A crack was caused by client.
  3. Many scratches noted.

OUTPUT

  Note                                                   Category

  1. The product was defective.       Defect
  2. A crack was caused by client.  Crack
  3. Many scratches noted.             Scratch

Don't worry about using fancy machine learning or natural language processing models. This problem can be handled reasonably well using a total of 5 nodes (simple solution), and a more refined solution involves just 8 nodes (complex solution). Also don't worry about getting 100% accuracy.

Author: Victor Palacios

Datasets: Datasets with Mechanical Notes and Categories in the KNIME Hub

Solution Summary: After reading the inspection notes and the categories, a simple solution consisted of running a similarity search between categories and inspection notes to find which of the former best corresponded to each of the latter. A more complex solution involved lowercasing both categories and notes to improve matching, and then running a regular expression matcher to find all categories that correspond to each note (instead of just one category). Note: We did not implement any spellchecking in our solution, which would further increase matching quality.

Solution Details: (Simple solution) After reading the inspection notes and categories with two Excel Reader nodes, we used the Similarity Search node to find the category that best matched each note. Next, we used the Joiner node to actually perform the match between most similar categories and notes, and then used the Column Filter node to remove temporary columns. (Complex Solution) We started by reading and lowercasing the inspection notes and categories with two Excel Reader and two String Manipulation nodes. Note that lowercasing both inputs was a cheap way of finding more matches between categories and notes. Next, we used the GroupBy node to create a concatenated regular expression containing all categories, and used the Table Row to Variable node to convert this regular expression into a variable. We then used shared component Regex Find All to find all categories that corresponded to each inspection note, and finally used Split Collection Column node to put each matched category into a separate column.

See our solution in the KNIME Hub
A series of post-it notes in KNIME colors

Never miss a challenge! Sign up for weekly reminder e-mails.

10 Challenge Club

Congratulations to the KNinjas who have aced 10 “Just KNIME It” challenges!

The 10 Challenge Club celebrates "Just KNIME It!" participants who have completed at least 10 challenges. How many challenges have you solved?

10 Challenge Club
Previous Just KNIME It! Challenges

Here is how the challenges work:

     We post a challenge on Wednesday
     You create a solution with KNIME
     Upload it to your public KNIME Hub Space
     Post it in the KNIME Forum

Our solution to the challenge comes out on the following Tuesday.

Congratulations to the KNinjas who have aced 10 “Just KNIME It” challenges!

lelloba KNIMEST
martinmunch MEPivnenko
eamendola ersy
AnilKS gonhaddock
berti093  

The 10 Challenge Club celebrates "Just KNIME It!" participants who have completed at least 10 challenges.

LinkedInTwitterShare

What are you looking for?