Changes from v1.3.5 to v2.0

Changes from v1.3.5 to v2.0

The most dramatic changes (i.e. under the hood) have been made to KNIME's core classes and the workflow engine in order to address future requirements such as workflow loops, flow variables and specialized port types. The changes that are most apparent to the end user are briefly listed in the following:

New features (subset)

  • New port types (providing API for own port type definition)
  • New Database functionality (using dedicated database port)
  • PMML support (Predictive Model Markup Language; model exchange format for standard mining nodes such as for k-means, regression, deciscion tree construction), consumption of PMML models created by 3rd party vendors is in beta stage
  • Improved meta node handling (using wizard); see meta node usage guide
  • New view for favorite, last and most frequently used nodes
  • Workflow loops (disabled unless KNIME is started with expert flag being set)
  • Workflow variables (also disabled by default)

New nodes

  • General Usage
    • Column Resorter
    • Nominal Value Row Filter
    • Regex Split
    • Set Operator
    • Rule Engine
    • Lift Chart
    • Cluster Assigner
    • Cell Splitter By Position
    • MDS (Multi-dimensional scaling, replacing old MDS Pivot)
    • MDS Projection
    • Create Collection Column (using special collection cells)
    • Split Collection Column
    • CDK Substructure Search
  • Database (using new database port API)
    • Database Connection Reader
    • Database Connection Writer
    • Database Query
    • Database Connector
    • Database Column Filter
    • Database Row Filter
    • Database Looping (BETA)
  • PMML Support (currently BETA)
    • PMML Reader (BETA)
    • PMML Writer (BETA)
    • R To PMML (Local) (BETA)
  • Loop and flow variable support (BETA, disabled by default unless "expert flag" is set)
    • Generic Loop Start
    • Counting Loop Start
    • Row To Variable Loop Start
    • Loop End
    • Variable Condition Loop End
    • Variable Based File Reader
    • Extract Variables (Data)
    • Extract Variables (Database)
    • Inject Variables (Data)
    • Inject Variables (Database)
    • TableRow To Variable
    • Variable To Column
    • Variable To TableRow
    • Variables Loop (Data)
    • Variables Loop (Database)
  • About 70 new WEKA nodes for Clustering, Classification, Regression, Bagging, Boosting, Feature Selection, etc.

Noteworthy (for developers)

  • Nodes with model ports will not work with 2.0 (Nodes created the wizard not affected!), specifically the following methods/constructors have been removed:
    • Constructor NodeModel#NodeModel(int, int, int, int)
    • Method NodeModel#saveModelContent(int, ModelContentWO) and NodeModel#loadModelContent(int, ModelContentRO)
  • Specialized DataCell implementations need revision of DataCellSerializer interface (simple change of method signature)
  • Row IDs are based on a plain String, not a DataCell anymore, i.e. the method RowKey#getId() is not supported anymore
  • Implementations of HiliteListener need to be fixed in order to comply with the new RowKey concept. The methods in the class HiliteHandler were modified accordingly.
  • The classes NodeFactory and NodeView are now parameterized using the NodeModel to avoid type casting and ease node development (existing node implementations will still function, though the compiler will print a warning).
  • Redesigned bit vector cells / fingerprints (see Appendix (A))
  • Re-org of NodeView (see Appendix (B))
  • Changed Node.dtd for factory xmls: modelIn/predParamsIn and -Outs are not supported any more, use generic inPort/outPort
  • PreferencePages are divided into two parts one for general preferences, one for GUI preferences (see Appendix (C))

Unresolved Problems

Appendix (A) - BitVector redesign

We have, as a result of our discussions at the last workshop, re-implemented bitvectors. (Thus the old stuff is deprecated. Not removed though, to be able to load old workflows.)

The new implementations are in

DenseBitVector and SparseBitvector, DenseBitVectorCellFactory and SparseBitVectorCellFactory to create the corresponding DataCells that hold bitvectors.

Differences of sparse and dense vectors are:

  1. Dense:
    • MemUsage: length div 8 (one bit per vector bit, independent of bit value)
    • Get/Set: direct access (O(1))
    • nextSetBit: iterates over bits (O(n))
    • max length: (MAX_INT - 1) * 64 = 137438953344
    • max number of 1s: = max length
  2. Sparse:
    • MemUsage: 64bit per 1 (set vector bit)
    • Get/Set: binary search (O(log n))
    • nextSetBit: binary search (O(log n))
    • max length: MAX_LONG = 9223372036854775807
    • max number of 1s: MAX_INT = 2147483647

We have implemented what we thought people may need. And we were hoping to get feedback and add more functions as people request them. With the current implementation you must decide upfront what kind of vector you want (sparse or dense). Also, you should always be aware of what kind of vector you have at hand before you do any operations on them (like AND or OR). This might not be the case (or might not be easy), thus we thought of adding a utility class that provides these operations and determines the result type automatically (this class is not in the current alpha release though).

If you are using our old BitVector stuff, there is (hopefully) not much pain involved in migrating to the new implementation. You just don't operate on Java BitSets anymore but on DenseBitVectors (which have a similar interface). And you use the factory to create the BitVectorCell after defining the bitvector.

In 2.0 our nodes that create BitVectors (from Hex/Binary/Index strings) create the new dense bitvector cells. If you have nodes that expect columns of type BitVectorValue they will not accept bitvectors from our (new 2.0) nodes until you have migrated them to our new BitVectorValue interface.

Appendix (B) - NodeView re-organization

The way/sequence methods of a node view implementation are called has changed. Previously the construction, registration (for change notifications), and updates where not consistent. The new sequence is as follows:

  • If users open a view:
    1. NodeView construction
    2. onOpen()
    3. register view with corresponding NodeModel (for update notification)
    4. updateModel
    5. relayout
    6. visible = true
  • User closes view:
    1. unregister view from NodeModel
    2. onClose()
    3. visible = false
    4. dispose

The re-org should not require any code changes. It may solve some problems (as, for example, update is not called until NodeView construction is finished).

Appendix (C) - Preference Pages

The preferences are divided into two parts. General runtime properties also necessary for batch execution (headless preferences) and preferences only relevant if also the KNIME GUI is started (GUI preferences).

The headless preference page has the ID: org.knime.workbench.ui.preferences

The PreferenceStore is defined under the org.knime.workbench.repository plugin, i.e. KNIMECorePlugin.getDefault().getPreferenceStore() provides access to it. The constants for the entries are defined by org.knime.workbench.preferences.HeadlessPreferencesConstants.

The GUI preference page has the ID: org.knime.workbench.ui.preferences.gui

The PreferenceStore is maintained by the org.knime.workbench.ui plugin, i.e. KNIMEUIPlugin.getDefault().getPreferenceStore() provides access to it with the constants defined in org.knime.workbench.ui.preferences.PreferenceConstants.


What are you looking for?