Missing Value

This node helps handle missing values found in cells of the input table. The first tab in the dialog (labeled "Default") provides default handling options for all columns of a given type. These settings apply to all columns in the input table that are not explicitly mentioned in the second tab, labeled "Individual". This second tab permits individual settings for each available column (thus, overriding the default). To make use of this second approach, select a column or a list of columns which needs extra handling, click "Add", and set the parameters. Click on the label with the column name(s), will select all covered columns in the column list. To remove this extra handling (and instead use the default handling), click the "Remove" button for this column.
Options marked with an asterisk (*) will result in non-standard PMML. If you select such an option, the warning label in the dialog will become red and a warning will be shown during execution of the node. Non-standard PMML uses extensions that cannot be read by other tools than Knime.

Dialog Options

Missing Value Handler Selection
Select and configure the missing value handler to be used for data types or columns. Handlers that do not produce valid PMML 4.2 are marked with an asterisk (*).

Mean
Calculates the mean value of all non-missing cells in a column and replaces the missing values with this mean. This missing value handler produces valid PMML 4.2.

Moving Average*
Calculates the mean of all values that are within the window given by the lookahead and lookbehind and replaces missing values with this mean. This missing value handler does not produce standard PMML 4.2! The number of cells to take into account before and after the current cell can be set using the options lookbehind and lookahead respectively.

Fix Value (Double)
Replaces missing values with a double given by the user. This missing value handler produces valid PMML 4.2.

Maximum
Finds the column's largest value and replaces all missing values with it. This missing value handler produces valid PMML 4.2.

Rounded Mean
Calculates the mean value of all non-missing cells in a column and replaces the missing values with this mean. This missing value handler produces valid PMML 4.2.

Fix Value (Integer)
Replaces missing values with an integer number given by the user. This missing value handler produces valid PMML 4.2.

Minimum
Finds the column's smallest value and replaces all missing values with it. This missing value handler produces valid PMML 4.2.

Most Frequent Value
Calculates the most frequent value in a column and replaces the missing values with it. This missing value handler produces valid PMML 4.2.

Previous*
This missing value handler replaces missing values with the last encountered non-missing value in the column it is configured for. When dealing with tables that have a large number of rows but not too many columns that need missing value replacement, the option to use disk backed statistics avoid flooding of the main memory. This should be used with caution, at is generally much slower than in-memory statistics. This missing value handler does not produce standard PMML 4.2!

Remove Row*
This missing value handler removes rows that have a missing value in the column it is configured for. This missing value handler does not produce standard PMML 4.2!

Median
Finds the column's median value and replaces all missing values with it. For large tables this might be computationally expensive because the table needs to be sorted to find the median. This missing value handler produces valid PMML 4.2.

Linear Interpolation*
This missing value handler replaces missing values with the linear interpolation between the last encountered and next non-missing value. The column 1 2 ? ? 5 6, for example, would be interpolated to 1 2 3 4 5 6. This missing value handler does not produce standard PMML 4.2!

Linear Interpolation*
This missing value handler replaces missing values with the linear interpolation between the previous and next encountered non-missing value in the column it is configured for. When dealing with tables that have a large number of rows but not too many columns that need missing value replacement, the option to use disk backed statistics avoid flooding of the main memory. This should be used with caution, at is generally much slower than in-memory statistics. This missing value handler does not produce standard PMML 4.2!

Fix Value (String)
Replaces missing values with a string given by the user. This missing value handler produces valid PMML 4.2.

Average Interpolation*
This missing value handler replaces missing values with the average value of the previous and next encountered non-missing value in the column it is configured for. When dealing with tables that have a large number of rows but not too many columns that need missing value replacement, the option to use disk backed statistics avoid flooding of the main memory. This should be used with caution, at is generally much slower than in-memory statistics. This missing value handler does not produce standard PMML 4.2!

Fix Value
No description provided.

Ports

Input Ports
0 Table with missing values
Output Ports
0 Table with replaced missing values
1 Table with PMML documenting the missing value replacement
This node is contained in KNIME Core provided by KNIME GmbH, Konstanz, Germany.