Transformation

Before regular KNIME nodes can be applied on the texts the data has to be transformed into numerical data. This can be done by the Document vector or Term vector node. These nodes create a binary or numerical vector representation for each term or document, based on the filtered bag of word input data table. The vector representation can then be used by standard data mining methods, such as e.g. clustering or classification nodes. Usually the transformation into numerical vectors is the final step which is done by the Text Processing plugin.

In addition to the Term and Document vector creator nodes there are other transformation nodes, which allow for the transformation of tags to strings, terms to strings and vice versa. The Term to Structure node converts terms that have been recognized as chemical compounds into, e.g. SIMLES structures, which can then be rendered 2D later on. The Bow creator node, which transforms a list of documents into a bag of words is available in the Transformation node category as well. A very useful transformation node is the Strings to Documents node. This node requires an input data table consisting of several string columns. For each row a document is created and each of the contained columns is used as a certain text field in the created documents. It can be specified which columns contain the abstracts, the full texts or the titles of the documents to create. This node comes handy when the textual data is available only in e.g. csv format.