The New Iris Data - Modular Data Generators

Overview

This feature contains multiple nodes for generating data. Even though each node provides very basic functionality, by combination one can create various kinds of complex, heterogeneous data. There is the possibility to create simple categorical and numerical values, combine tables randomly, enrich tables, insert rules and many more.

For more information we refer to our paper "The New Iris Data : Modular Data Generators".

Installation

To get access to the data generation tool, download first the the current version of KNIME. Afterwards use the update mechanism ("File -> Install KNIME extension...") to include the data generation nodes. They can be found in the KNIME Labs category.

Examples

There are already a lot of examples on how to plug those nodes together. They can all be accessed on the public KNIME workflow server. Starting with version 2.2, KNIME provides a new view called "Server Workflow Projects" which can be found on the right side of the KNIME workbench. After clicking the connect button a listing of the server repository is displayed. The folder "007_ModularDataGeneration" contains the data generation example flows which can be downloaded by right clicking on the desired flow and choosing "Download" in the context menu.

A detailed description on how to download those example flows can be found here.

complete node documention

Contained Nodes

In Categorical:

  • Conditional Label Assigner: Assigns the classes based on the probabilities to the rows.
  • One Rule Inserter: Inserts one specific rule to the given data set.
  • Random Item Inserter : Assigns the labels based on the probabilities to the rows.
  • Random Label Assigner: Assigns the labels based on the probabilities to the rows.
  • Random Label Assigner (Data): Assigns the labels based on the probabilities to the rows. 

In Numerical:

  • Beta Distributed Assigner: Assigns a value based on the class column. This value is beta distributed.
  • Gamma Distributed Assigner: Assigns a value based on the class column. This value is gamma distributed.
  • Gaussian Distributed Assigner: Assigns a value based on the class column. This value is Gaussian distributed as defined in the configuration by its mean and standard deviation.
  • Random Number Assigner: Assigns a value based on the class column. This value is uniformly distributed between given min. and max.

In Misc:

  • Empty Table Creator: Creates an empty table: simply lines and RowKeys (no columns).
  • One Row to Many: Creates duplicates of the rows, based on an integer column.
  • Random Matcher: Assigns the information from the second table randomly to the rows of the first table
  • Stresser: Adds stress (outliers) to the values.
LinkedInTwitterShare