FAQ for KNIME Node Development

Common questions regarding node development. For FAQ on general KNIME usage, refer to the KNIME usage FAQs.

FAQ for Developers

How do I implement my own node?
Where do I find the KNIME source code?

You can find some of the source code already on Github and Bitbucket. Once you've created your own node, the generated plugin will contain dependencies on the KNIME core code. You can browse the source code by either right clicking any KNIME core class that is referenced by the generated java class (e.g. in the generated NodeModel) and then choosing "Open Declaration". Alternatively, you can browse the source code in the "Package Explorer" under the "Plug-in Dependencies" folder. If you want to browse additional KNIME code, for instance the code underlying the chemistry extensions, you will need to add a dependency on the desired plugin. The source itself is always automatically downloaded if you install an extension. It is contained in the plugins-directory in files such as org.knime.base.source_2.2.0.xxx.jar. (This will likely change in the next version of KNIME. Then it will not be installed automatically any more but can be downloaded separately via the Update Site by selecting one of the Source-Features.)

Which version of Java / Eclipse do I need?

Starting with version 4.4.0 KNIME AP requires java 11, earlier versions require java 8. Take a look at the KNIME Analytics Platform SDK Setup for the requirements.

Where do I find the source code for the KNIME Report Designer?

The KNIME Report Designer comes with a different KNIME license, which is not open source. Actually, an open source release of the code is not possible in general. BIRT is covered by EPL and KNIME Reporting is based on both, BIRT and KNIME, so a release of the combination under GPL is not possible.

Where do I find the KNIME SDK?

With the release of KNIME Analytics Platform 3.6, we retired our KNIME SDK in favor of a more flexible setup. You can now find all information you need to get started with the development of new extensions for KNIME Analytics Platform on GitHub or BitBucket.

My new node appears at the top level of the node repository. How do I put it into a (new) category?

A new category is defined using the extension point org.knime.workbench.repository.categories. Use the Eclipse editor of the "plugin.xml", which is in the plugin project, to register this extension point. Then go to the Extensions tab and click the "add..." button to add the org.knime.workbench.repository.categories extension (if not already there). On the new list entry, make a right click and follow "New", "category" as shown in the following screenshot:

How can I link to local files in node descriptions?

Sometimes you want to distribute additional information to one of your nodes. You can either insert external links (http:) into the node description or reference local files. Since KNIME's installation directory is not known in advance there are two ways to use relative links in node descriptions (starting with KNIME 2.4). You can either put your additional material into the plugin and reference it from the plugin's root directory by using the bundle: protocol (e.g. href="node:plugin.xml") or you put it into a node's Java package and reference it relative to this package by using the node: protocol (e.g. href="node:MyFactory.xml").

What is exactly the idea behind the node model, node dialog and the node view? Where are the differences between the node dialog and the node view? Why doesn’t the node dialog write directly to the node model?

The underlying design follows the Model-View-Controller Concept. The Dialog really acts as a Controller rather than a view on the model, which is why they are treated quite differently.

  • The views: they require access to the entire model as we do not know (and cannot know) which pieces of the (node) model are needed to display it properly. In some instances (Scatterplot...) it can be very useful to see different plots at the same time (imagine looking at three different combinations of variables).
  • The dialog: it changes the settings of the (node) model that control its operation (what to compute...). It has nothing to do with what is subsequently contained in the model itself; this is determined during execution, based on the settings that were changed by the dialog. So why do we not enable the dialog to write directly into the (node) model? There are two reasons:
    1. We want to be able to store those settings when the workflow is saved. If we make sure everything is transported from the dialog to the model in a clearly defined container (the NodeSettings object) we can serialize this object and be sure that nothing is lost.
    2. More importantly: we want to be able to cancel a dialog and check before writing everything to the model that the new settings are correct. In order to do this we cannot have the dialog write directly into the model because then we would not be able to reverse to the previous settings. This is also the reason for the separate validate and apply methods. When a dialog wants to apply the new settings, the model validates them first (and rejects them if incorrect) and only afterwards are the entire settings written to the model.
  • The (node)model: in order for us to not only load the configuration (nodes, connections, and node settings) but also the content of the models and the data itself, we need to store what the node created during execution. Since we don't really have control over what happens inside the model during execute() we leave it to the user to write this out to a specific directory. For some nodes it may be sufficient to do nothing because all information is already contained in the NodeSettings (which are stored automatically) - for instance a column filter or a node computing some properties. Also the data provided at the data outport is stored automatically. For other nodes (such as a decision tree) we do need to store the entire tree. Note that this is not necessarily the same as what is transported to the Model-Ports - the tree inside the node also needs to remember which rows to hilite when a branch is selected. In almost all instances you only need to worry about writing and reading NodeContents if that node provides views. To summarize:
    • The validateSettings(), loadValidatedSettings()and saveSettings() methods have to be implemented if the node has a dialog (and therefore settings).
    • Use the load/saveInternals()if the node provides views.
    • If the node has a model outport implement the load/saveModelContent().
When should the 'modelChanged()' function be called explicitly?

The modelChanged()function is essentially the notification to all views that a model has changed (reset or execute) inside the MVC-model and is called internally by the framework. Therefore there is no need to call it explicitly.

Why are there different types (for example in the chem plugin the SMILES data type, etc.)? Wouldn’t it be easier to have only strings and the node to care about the content of the string?

We believe that whenever we have a string that actually represents something else and we want a subsequent node to only operate on strings representing this particular type, we should add this as a specific type X and either have a string-to-X converter (parser...) node or a file-reader that reads only files containing X.

My node generates values based on the values of the input data. Should I add this information or simply output the new values?

Nodes which produce additional columns based on information already existent in the table should, by default, attach this information to the table as a new column. If the node converts the information in one column to another format (parser, binner, ...) it should offer a checkbox (by default disabled): replace original column.

How do I handle errors and exceptions during execution of the node model?

There are basically two ways to handle exceptions and errors occurred during execution:

  • If the error is so severe that no data can be provided at the outport, throw an exception. Then the node stays unexecuted and an error icon with the message of that exception is displayed.
  • If something unusual happened or you want to inform the user about some implicitly made decisions you can set a warning with setWarningMessage(String message) in the execute method. The node will be executed but with a warning icon displaying the text of the warning.
How do I set the progress bar correctly?

If the progress, for example, depends on the number of rows and there is only one task to do then it could be set in the execute method with: exec.setProgress(currentRowNr/(double)numberOfRows, "Processing row nr: " + currentRowNr); If the task to be done is divided into some subtasks then you can create a subprogress with the fraction of the whole task. With two equally long subtasks the code would be:

ExecutionMonitor exec1 = exec.createSubProgress(0.5);  
ExecutionMonitor exec2 = exec.createSubProgress(0.5);
task1(input, exec1);
// and task two
exec2.createBufferedDataTable(result, exec2);  
Where do I set default values for my user settings?

One good place is the NodeModel's configure method. There you can look at the incoming table spec and the current user values to decide if you can create default values. They will also appear in the node’s dialog as default settings. Sometimes you can’t just guess useful default settings, but you still need to show something, when the dialog opens. In this case the dialog’s loadSettings method is probably the appropriate place to insert these values. If the model has no (default) values, it will not write values into the settings object (in its saveSettings method), thus the dialog will miss these values when it tries to load settings. In this case it needs to set some default (or initial) values to be displayed in its components.

What's the difference between the 'configure' and the 'validateSettings' method?

In validateSettings you perform basic checks on the new values. In configure you check whether the node can run with the current settings and the values are consistent with the incoming table spec. In validateSettings settings are rejected only if required values are missing or values are obviously invalid (e.g. you read a negative number when you know the value must be positive or you get a null or empty string for a column name). You can also check the consistency of the values to each other (like a lower bound value should be smaller than an upper bound value). At this point in time it is not possible to check the consistency of the settings with respect to the incoming data table. This is done in the configure method. Here you complain if a chosen column name doesn’t exist in an incoming table spec. Or a selected column is of the incorrect datatype. If configure goes through, the node will be in the executable state.

Why is 'validateSettings' and 'loadValidatedSettings' split into two methods. Isn’t that duplicating code?

Sometimes the implementation of both methods indeed looks very similar. It is split into two methods to ensure that the implementation will either take over the full set of new settings, or reject them entirely. It would be dreadful, if, during load settings part of the settings would be taken over (by assigning them to the internal variables), just to realize half way through that some values are invalid - and then ending it with an exception. Separating the validation step from the assigning (loading) step adds robustness to the application.

How can I show debug messages for selected packages only?

KNIME currently uses Log4j for logging. Inside the .metadata directory of your runtime workspace (not your developing workspace), there is a subdirectory called knime containing the default log4j configuration (log4j.xml). Inside the file there is a small comment about how to enable debug messages for selected packages only. However, enabling debug messages in that way only affects the output written to stdout which will show up in the Console of your Eclipse IDE, but not in the KNIME Console.

If I use the 'javax.swing.JFileChooser' or KNIME's 'DialogComponentFileChooser', it takes a long time to open the dialog, or the dialog does not open at all causing KNIME to hang.

This is a known bug in the Java Runtime Environment. The problem occurs if you initialize one of those classes within the constructor or as a class member during class creation of your derived NodeDialogPane. Two possible solutions are:

  1. Initialize the file chooser on demand, that is, the first time you need to access the file system, or
  2. Add the creation to the event dispatching thread by using SwingUtilities.invokeAndWait(new Runnable() { ... });.
How do I include and use external Java libraries in my new KNIME plugin?

Follow these steps:

  • Create a lib directory in your KNIME plugin.
  • Copy the file(s) into the lib directory. (Java libraries are packed either as a zip or jar archive.)
  • Edit the META-INF/MANIFEST.MF file with the "Plug-in Manifest Editor".
  • Go to the "Runtime" tab and add all necessary libraries to the "Classpath" list on the bottom-right corner using the "Add..." button.
  • Go the to "Build" tab and add the files to the list contained in the section "Extra classpath entries".
  • Make sure that the lib directory is selected in  the "Binary Build" list (in the same tab).
  • (Please note, adding jar files to the plugins build path, i.e. project context menu -> "Java Build Path" ->"Libraries" is not necessary.)
  • Native libraries such as Windows dll's need to be copied into plug-in as well and must be included into the manifest and build properties.

You should now be able to use the libraries within your node implementation.

Why do I get a SIGSEV error whenever I try to start KNIME-AP from Eclipse on my macbook?

This occurs on macbooks with a touchbar, whenever the touchbar is active (caused by this eclipse bug). To work around this issue, follow these steps:

  • In Eclipse select Run → Run Configurations... ( or `Debug Configurations...).
  • Select the Run/Debug configuration you want to launch.
  • Select the `Arguments` tab.
  • Add the argument "-nosplash" to the Program arguments

You should now be able to launch KNIME-AP from Eclipse.

I'm running Linux and I am not even able to start the SDK package. Instead it creates an error log file and crashes.

KNIME 2.0 is based on Eclipse 3.3.2, which has a critical bug related to the web browser detection. The error file that is generated looks similar to this one. Eclipse (and KNIME) use the system's web browser to render specific pages (such as the eclipse's welcome page or the node descriptions in KNIME). See the full eclipse bug report here. In order to fix this problem, you have to instruct eclipse to use a specific xulrunner (i.e. the system's web browser library, typically available in packages called 'xulrunner' or 'mozilla-xulrunner'). You do so by adding a line (as last line) to the 'eclipse.ini' file located in the eclipse directry. The line reads as follows:


whereby you have to fix the path to reflect the xulrunner path of your system (e.g. /usr/lib/xulruner- ).


What are you looking for?