At the beginning of this year, we sent out a “Help us to Help you with KNIME” survey to the KNIME community. The idea behind the questionnaire was to listen to what the KNIME community wanted and incorporate some of those suggestions into the next releases.
There were a few questions about how people are using KNIME Analytics Platform, and also questions designed to help us understand what kinds of new nodes and features people dream about. We additionally promised that we would select one dedicated node - the node most mentioned - and make sure that it would be part of our next major release.
In this post we present this "community node" and we've also put together five tips & tricks garnered from other answers given in the survey.
So, the node most requested by the community is [drum roll] the Duplicate Row Filter! And it was implemented in KNIME Analytics Platform 4.0 (you'll find a full list of the features released in 4.0 here). We're sure you've already noticed this node's new existence in the node repository and have played around with it already.
Introducing: Duplicate Row Filter
Category: Manipulator
Feature: Easily detects duplicate rows
Extension: KNIME Core
With the Duplicate Row Filter, you can detect duplicate rows and decide what you would want to do with them: you can remove duplicates based on a selected criteria, or you can select a flag method. For instance, the user can select if she wants to flag a row as unique, duplicate or as the chosen duplicate to keep. She can also choose to create a column listing which rows are duplicates while also indicating which rows they duplicate. Or she can simply remove some of the duplicate rows and keep the others. To select which rows to keep, the Duplicate Row Filter allows the user to select a row-keeping criteria: now it is easier to select which row to keep based on the minimum or maximum value of a feature, or based on the order of appearance of the duplicate row.
You can try out the Duplicate Row Filter node yourself in this example workflow, which can be downloaded from the KNIME Hub here.
Fig. 1. The workflow demonstrating how the Duplicate Row Filter works. It's available on the KNIME Hub here.
The nice thing about the survey is that we have found out a lot about what people want and wish for in KNIME. We have kept everyone’s suggestions and these will be taken into consideration when planning new features so there are chances you will see your suggestions implemented in future versions of KNIME.
The survey not only gave us new ideas for nodes and features -- but also insights on some important tips and tricks we could share with you, to help you do what you want to do in KNIME. So here are some answers to some of the questions you sent us, which can already be solved using KNIME.
I'd like to be able to perform multiple string manipulations and mathematical operations in a single node. Is that possible?
Introducing: Column Expressions node
Category: Manipulator
Feature: Adds or replaces columns with custom expressions. And its streamable
Extension: KNIME Expressions
Did you know that the Column Expressions node allows you to perform multiple operations in different columns? This node lets you add or replace columns with custom expressions, which can mix string manipulation, math formulas, as well as your own set of rules with if-else statements using JavaScript! There is no limit on how simple or complex the statements can be. If you are curious about this node and want to know more, there is more information about it in this video on KNIME TV.
I still haven't found what I'm looking for. Why I can’t find the node I want in the node repository?
Introducing: KNIME Hub
Category: Collaboration
Feature: The place to find find and collaborate on KNIME workflows and nodes.
Have you ever wondered why you can’t see that node everyone is talking about in your node repository? Well, maybe it's part of a KNIME Extension you haven't installed yet. Installing KNIME Extensions is a simple process. Go to File > Install KNIME Extensions and select the desired extension by checking its name on the list. The screenshots below show the process for installing the KNIME Expressions extension. This is the extension that includes the Column Expressions node. Here we typed “expressions” into the search field, meaning that every type of extension including the word "expressions" is listed. You then click the extension(s) you want and click “next” until the installation starts. Don't forget that for your changes to take place, you need to restart KNIME. The section on the website "Install Extensions and Integrations" provides a lot more information about this topic.
A new alternative for installing KNIME extensions is now provided by the KNIME Hub. You can just search for the desired extension in the KNIME Hub, select it, and then drag-and-drop it into your KNIME workbench to install it. The whole procedure can be seen in the gif. Yes, it is easy as that!
Fig. 2. Installing extensions directly from the KNIME Hub
I'd like to append an Excel sheet to an Excel file using KNIME. Can I?
Introducing: Excel Sheet Appender node
Category: Sink (Writer)
Feature: Not only reads and creates Excel files but modifies existing ones
Extensions: KNIME Excel Support
KNIME not only reads Excel files and creates new Excel files, but it can also modify existing ones. Appending sheets to your excel file is an easy task in KNIME: Just try the Excel Sheet Appender node! It is a "sink" or "writer" node (the type of node that only writes data out of KNIME and that does not require any additional extension) and it works with xls, xlsx and xlsm files. In the meantime, there are also community extensions that allow you to format exported Excel sheets. You can find them on the KNIME Hub.
Is there a helper node that suggests related nodes to use in my workflow?
If you would like to find out new nodes that are related to nodes you are currently using, you can by benefiting from the Wisdom of the Crowd using the KNIME workflow coach! When you start KNIME for the first time, you're asked if you would like to send anonymous information about your node usage to us. This community information is then used to create a recommendation system, which computes what is the most likely node to follow another one you are using in your workflow. This is great for KNIME beginners who are still exploring all the interesting node possibilities. And it is important to remember that the information we receive is completely anonymous: it only concerns the nodes you are using (we receive no information at all about your data or identity).
If you are already using KNIME but you are not seeing/finding the workflow coach, you can enable it by going to file > preferences > workflow coach and ticking the box that says “Node Recommendations by the community”.
It would help my work if I could rename columns based on a dictionary. Is there a way to do this?
Introducing: Insert Column Header node
Category: Manipulator
Feature: Updates column names of a table according to mapping in second dictionary table.
Extensions: KNIME Core
Column headers (names) can be easily converted based on a dictionary by using the Insert Column Header node! This node has two input ports: the first port should receive your data table and the second port receives the dictionary. The dictionary need to contain a column with the old column names and another column with the new names! Once this is set, you can run the node will automatically convert the column names for you.
And that is all everyone. I hope you all enjoyed the new Duplicate Row Filter and found these tips and tricks useful. If you have any type of comments, feedback, want to share your impressions about the new Duplicate Row Filter node or which Tips and Tricks you are still looking for then join in the discussions on the Forum!