For Developers: Integration of Custom Tag Sets

This tutorial is about how to create and integrate a custom tag set into the KNIME Textprocessing feature and make use of it in tagging and filtering nodes. The tutorial is organized as follows. In the first section it is described how to set up KNIME SDK, install all required features and create a KNIME project to get ready implementing and integrating a custom tag set. In the second section the TagSet extension point is described and how to make use of it in order to integrate a tag set. The last section is about how to use the tag set in existing KNIME Textprocessing nodes.

KNIME Setup

This section describes how to set up KNIME SDK, install all required features and create a KNIME project to integrate a custom tag set. For this tutorial KNIME SDK version 2.8 was used.

First download and install the KNIME SDK from the download section of the KNIME website. The KNIME SDK version is required, since tag sets, like nodes have to be integrated via a KNIME extension point. This means that Java classes have to be created, implementing certain Interfaces in order to integrate custom Java code into KNIME. Once KNIME SDK has been downloaded and installed, start KNIME SDK, if necessary with customized arguments in the knime.ini.

Second install the KNIME Textprocessing feature. KNIME features or plug-ins may be installed via the eclipse update mechanism, which is described in detail here: http://www.knime.org/downloads/update. The Textprocessing feature can be found under the KNIME Labs Extensions section.

KNIME Project

Once the Textprocessing feature has been installed and KNIME SDK has been restarted a new KNIME project has to be created. Therefore right click in the eclipse Package Explorer view, click New->Other and select Create a new KNIME Node-Extension. It is recommended to use the Node-Extension also for creating a new TagSet extension. Click Next to come to the Create new KNIME Node-Extension view. Specify the project name, a name for the tagger node to create, a Java package name, node vendor name and a description. Uncheck Include sample code in generated classes at will. The Node-Extension wizard will create classes for the new node. These classes are not necessarily needed in order to create a new tag set and can be removed later on if you don’t want to implement a tagger node. Finally click Finish to generate the project. The figure below shows a filled Node-Extension wizard view. In this example a sentiment tag set is created, thus the project is named SentimentTagging.

 

New KNIME Project

In this example the new project contains a Java package org.knime.example as well as the classes SentimentTaggerNodeDialog, SentimentTaggerNodeFactory, SentimentTaggerNodeModel, SentimentTaggerNodePlugin, and SentimentTaggerNodeView. Furthermore there exists a file SentimentTaggerNodeFactory.xml containing the node description, which is shown in the Node Description view of KNIME. Again, these classes are not required to integrate a custom tag set and can be removed, all but the SentimentTaggerNodePlugin class which is required to load the plugin and all contained resources, such as the tag set. Do not remove the class SentimentTaggerNodePlugin! The figure below shows the new project in the eclipse Package Explorer view, with the only required class SentimentTaggerNodePlugin emphasized by a red frame.

 

Package view of new KNIME project.

In this tutorial the classes are not removed, since it is shown how to create a custom tagger node later on.

To check if the project was created properly and the new node has been registered, if you haven’t deleted its classes, run KNIME and see if the new node appears in the Node Repository view. To run KNIME click Run->Run Configurations and create a new run configuration by double clicking Eclipse Application. Select Run a product and choose org.knime.product.KNIME_PRODUCT from the drop down menu. In the Arguments tab Java parameters can be specified, e.g. Xmx and Xms. The figure below shows the run configuration of the sentiment example.

 

Finally click run to start KNIME. In the Node Repository view the new node shows up if everything is working as it should, show in the figure below.

 

If KNIME is running and the node shows up the project was set up properly, the node has been registered and can be used. Now everything is prepared to create and integrate a custom tag set. Close KNIME to proceed.

Integration of a Tag Set

In this section it is described how to implement a custom tag set and integrated it into the KNIME Textprocessing feature via the TagSet extension point.

Similar to the Node-Extension, which is use to integrate custom nodes a TagSet-Extension exists to integrate custom tag sets. Therefore at least one Java class needs to be implemented. To separate the Java classes of the new node and those of the tag set it is recommended to create a new package. In this tutorial the package org.knime.example.tagset is created in the sentiment project to store the tag set related classes.

To register a class as tag set at the TagSet-Extension point the class must implement the interface org.knime.ext.textprocessing.data.TagBuilder. This interface provides all required methods, which of course must be implemented by the underlying class, for the KNIME Textprocessing feature in order to integrate and make use of the tag set. But first of all the values of the tag set have to be defined. Therefore it is recommended to use an enum, but this is not mandatory. In this example an enum SentimentTag is created in the org.knime.example.tagset package. The enum contains the three values POSITIVE, NEGATIVE, and NEUTRAL.

public enum SentimentTag {
    /** Positive tag */
    POSITIVE,
    /** Negative tag */
    NEGATIVE,
    /** Neutral tag */
    NEUTRAL;
}

Now the corresponding TagBuilder has to be created, which is registered at the TagSet-Extension point later on to integrate the tag set. Therefore a new class needs to be created implementing the Interface org.knime.ext.textprocessing.data.TagBuilder. In this example the class is named SentimenTagSet. As well as the enum SentimenTag it is located in the package org.knime.example.tagset.

  public class SentimentTagSet implements TagBuilder  

To resolve and import the Interface org.knime.ext.textprocessing.data.TagBuilder, make sure that all required dependencies are specified in the plugin.xml file of the project. Therefore double click the plugin.xml file and select the tab Dependencies. If org.knime.ext.textprocessing is not in the list of required plug-ins, click the Add button, select org.knime.ext.textprocessing as plugin to add, click ok and finally save the plugin.xml file. Now the dependencies are set and all textprocessing classes can be imported and used. The figure below shows the Dependencies tab of the plugin.xml editor and the list of required plug-in dependencies.

 

Now the class SentimentTagSet needs to be implemented. First of all the fields of the class are described. Usually three fields are enough to cover all needs for a tag set. The type of the tag set, the default value and a map storing the string values of all tags in combination with the tag instances.

/** Tag type. */
public static final String TAG_TYPE = "SENTIMENT";
     
/** Default tag value. */
public static final String DEFAULT_TAG = SentimentTag.NEUTRAL.toString();
 
/** Map storing all tags and their related values. */
private Map m_tagMap;

The tag type is shown in the dialog of nodes, e.g. the Dictionary Tagger for which a certain type can be specified. Furthermore in a data table containing TermCells, such as a bag of word data table the tag value of a tagged term in combination with the corresponding type is displayed. In this case the type is SENTIMENT. For convenience the value of the default tag is stored in an extra field, which is NEUTRAL in this case.

The constructor of the TagBuilder class has to be public, otherwise it can’t be instantiated when loading the tag set extension. In order to provide all tags necessary for this tag set the map needs to be initialized and filled.

/**
 * Constructor has to be public!
 */
public SentimentTagSet() {
    m_tagMap = new HashMap();
    // fill maps with all tag values and their related tags.
    for (SentimentTag tagValue : SentimentTag.values()) {
        m_tagMap.put(tagValue.toString(), new Tag(tagValue.toString(), TAG_TYPE));
    }
}

All org.knime.ext.textprocessing.data.Tag instances, provided by the tag set are created at this point. A Tag contains a value and a type, which are both String fields and can be accessed via the corresponding getter methods.

The method getType() of the TagBuilder simply returns the specified type of the tag.

/* Returns the type of tag.
 * @see org.knime.ext.textprocessing.data.TagBuilder#getType()
 */
@Override
public String getType() {
    return SentimentTagSet.TAG_TYPE;
}

The method asStringList() returns all tag values as a list of strings. This method is called, e.g. by node dialogs which display all available tags of a tag set.

/* Returns the tags as list of strings.
 * @see org.knime.ext.textprocessing.data.TagBuilder#asStringList()
*/
public List asStringList() {
    List list = new ArrayList(m_tagMap.size());
    list.addAll(m_tagMap.keySet());
    return list;
}

The method getTags() returns a set of all provided tags of the tag set.

/* Returns a set of all tags of the tag set.
 * @see org.knime.ext.textprocessing.data.TagBuilder#getTags()
 */
@Override
public Set getTags() {
    Set tagSet = new HashSet(m_tagMap.size());
    tagSet.addAll(m_tagMap.values());
    return tagSet;
}

The last method to implement is buildTag(String), which gets a string as parameter and returns the corresponding Tag instance to it. Therefore the map can be used. If there exists no Tag instance for a given String value the default tag should be returned, in order not to return null.

/* Creates the corresponding Tag instance of the given string.
 * @see org.knime.ext.textprocessing.data.TagBuilder#buildTag(java.lang.String)
 */
@Override
public Tag buildTag(final String value) {
    Tag tag = m_tagMap.get(value);
    if (tag == null) {
        tag = m_tagMap.get(DEFAULT_TAG);
    }
    return tag;
}

Finally the new class SentimentTagSet has to be registered as tag set extension. Therefore double click the plugin.xml file again, to open it in the eclipse editor and select the Extensions tab. In the panel All Extensions, all registered extensions are listed, such as nodes, tag sets, etc. If the extension point org.knime.ext.textprocessing.TagSet is not listed, as shown in the figure below, click the Add button. The New Extension dialog will open. Then select the tab Extension Points and type org.knime.ext.textprocessing.TagSet into the Extension Point filter text field. Select the corresponding list entry and click Finish, to add the extension point.

 

Once the extension point is shown in the list new classes can be registered at this point. Therefore right click on the list entry and click New->TagSet, as shown in the figure below.

 

At the Extension Element Detail dialog on the right the class implementing the TagBuilder interface need to be specified. In this case the class to register is SentimentTagSet. Click the Browse button to open the dialog and select the class, as it can be seen in the next figure. Then click ok and save the plugin.xml file.

 

Usage

This section shortly describes how to test whether a registered tag set has been properly added and can be used or not.

To check if  the tag set was properly registered and can be used, run KNIME out of the KNIME SDK. Create a new workflow and use, e.g. the Dictionary Tagger node to tag a list of documents. In the dialog of the Dictionary Tagger all registered tag types and their corresponding values are listed. Open the Tag type drop down menu in the dialog of the node to check if the newly registered sentiment tag set with type SENTIMENT is listed. Select the tag type and open the Tag value drop down menu, to check if all three values POSITIVE, NEGATIVE, and NEURTAL are listed, as shown in the next figure. If so, the tag set is properly registered and can be used as any other existing tag set.