Continental Nodes for KNIME — XLS Formatter Nodes

The XLS Formatter nodes make it possible to add formatting and advanced settings to .xlsx files.

For details regarding the individual nodes, please also refer to the node descriptions within KNIME. This documentation page is rather intended as an overview, not as a repetition or replacement of the node descriptions.

XLS Formatting Concept

Figure 1 shows the concept behind the XLS Formatting nodes based on an example. The logical flow of this explanation follows the numbers (1) to (11) in the graphic.

An input table (1) needs to be written to an .xlsx file first (2). Our XLS Formatting concept only adds formatting on top of existing files that contain all content. The XLS Formatter (apply) node fulfills this task and applies collected formatting instructions to a target file (3). The flow variable connection between (2) and (3) can serve the purpose to transmit file destinations, but it is particularly important to establish a precondition: The XLS Formatter (apply) node may only start execution when its predecessor node has successfully written the file.

The inport of XLS Formatter (apply) is an own port type, displayed as a green square. This XLS Formatter type carries formatting instructions (4). You can imagine them as a stack of paper, holding a different instruction each. These instructions are generated by diverse XLS Formatting nodes, such as the XLS Border Formatter (5) in this example. Its optional 2nd inport takes a previously defined XLS Formatter instruction set (6) and enriches it by an additional instruction regarding border formatting. Hence, this special port type is enriched node by node until it is ultimately passed to the XLS Formatter (apply) node to actually modify the target file.

XLS Formatting Concept

Fig. 1: XLS Formatting Concept

The node dialogs of most XLS Formatting nodes specify the kind of format (such as colors, styles, properties, etc.). The information which parts of the destination XLS sheet to apply these formats to, comes via so called XLS Control Tables ((7) and (11)). These have column names A, B, C, … and row IDs 1, 2, 3, … – just as the table would look when laying it on top of the XLS sheet. An XLS Control Table’s columns are all of type String and the cell contents consist of (potentially comma-separated lists of) tags. These tags don’t represent commands in itself but are freely chosen by the user. As mentioned, they are the basis for the node dialogs of many XLS Formatter nodes to know where to apply certain formatting instructions to. E.g. in Figure 2, you would see the tag header being selected, just like defined in its incoming XLS Control Table (7).

XLS Border Formatter Dialog

Fig. 2: XLS Border Formatter Dialog (note the entry of header in the tag field which restricts the below border settings to only those XLS Control Table cells containing this tag)

XLS Control Tables can be generated in two different ways. The easiest is via the XLS Control Table from Cell Range node (8). It takes XLS-like ranges such as A1:B1 and assigns defined tags to these (e.g. tag header which would yield the result table (7)). While being straight-forward to configure, this strategy lacks the flexibility to handle changes of the underlying data table that shall be written and formatted in a KNIME workflow. To address this particular strength of KNIME was one of the design principles behind our extension. Hence, the XLS Control Table Generator (9) provides this flexibility. For our input table (1), its standard mode is shown in Fig. 3, its unpivoting mode is explained in Fig. 4.

XLS Control Table Generator - standard mode

Fig. 3: In its default mode, the XLS Control Table Generator takes the input data table, converts all column types to String, and adjusts column names & and row IDs to XLS like syntax.

Especially the unpivoting mode is very useful: In combination with more standard KNIME nodes (esp. Rule Engine) (10), it allows dynamic and flexible tagging of different parts of the input table ((1) to (11)).

XLS Control Table Generator - unpivoting mode

Fig. 4: In its unpivoting mode (shown here with the additional columns option activated), the XLS Control Table Generator creates an output table row per input table cell.

Control Table Nodes

XLS Control Table Generator Node

This node generates an XLS Control Table based on a input table.

The option write column header to first row usually matches the corresponding setting in the XLS writing operation. If activated, the generated XLS Control table’s first row consists of the input table’s column header, just like the XLS file would typically contain this information in row 1. The original data table’s first row would be found in the 2nd result table’s row, etc.

If the unpivot option is not chosen, this node only converts all column types to String and assigns XLS like column names & row IDs (cf. Fig. 3 above).

Most often, the unpivot option would be activated, also with the additional columns option. In this case, one result table row is created for every input table cell. The extra columns state various cell addressing variants and the original table’s corresponding column name and row ID. The Column (comparable) column is always 0-padded to three characters in order to be comparable and lexicographically sortable for tables holding more than 26 columns. Note that otherwise, the natural order of cells A, B, …, Y, Z, AA, AB would be lexicographically resorted as A, AA, AB, B, …, Y, Z. Hence, it is a best practice to rather use the Column (comparable) than the Column, esp. when pivoting the long table back into a wide XLS Control Table. Note that all XLS Formatting nodes allow the 0-padded version of column names when expecting an input table to be a XLS Control Table.

XLS Control Table Merger Node

This node merges two XLS Control Table nodes into one. In its append mode, cells which have tags in both tables are preserved by combining them as a comma-separated list of tags. In its overwrite mode, the 2nd input table wins in case both tables hold tags for a specific cell.

XLS Control Table from Cell Range Node

This node creates a XLS Control Table from a defined XLS cell range and assigns a single specified tag to each of the cells in this range. Instead of a range (such as A1:B2), a single cell entry (such as A1) is also allowed.

Formatter Nodes

Within the XLS Formatting plugin, the node section Formatter Nodes provides those nodes that add formatting instructions to our XLS Formatter port type. The green optional inport offers the chance to enrich a prior format.

XLS Background Colorizer Node

This node adds static background colors to cell backgrounds. The color can be defined in the node dialog and be matched to a certain tag in the incoming XLS Control Table.

In case there are many different colors to assign (e.g. in a custom heatmap), you would need many of these nodes manually chained. To avoid that the node also offers a direct mode. In this case, it does not expect tags to be present in the incoming XLS Control Table, but direct color codes per cell (or the missing value, where no color is to be set in the corresponding XLS cell):

XLS Background Colorizer in direct mode

Fig. 5: Background Colorizer example (direct mode)

In this mode, color codes in red-green-blue (RGB) are expected in either hex syntax #RRGGBB or decimal syntax R/G/B.

Note the related node XLS Conditional Formatter node, which assigns color rules to the XLS file which the spreadsheet application itself later processes and assigns according to the content of a number cell. In contrast, the XLS Background Colorizer can also add color to an empty cell’s background.

XLS Border Formatter Node

This node assigns borders to cells according to a specific tag in the incoming XLS Control Table.

XLS Border Formatter

Fig. 6: Example of applying the XLS Border Formatter twice.

Figure 6 shows an example XLS Control Table on the left and the application of two XLS Border Formatter nodes on the right. The first node would be configured to add a thick outer border in all four dimensions for tag x. Note that in cell C3, the tag x is missing, hence the sourrounding cells have an outer border towards C3. The second node to reach this example would set a standard border for both inner dimensions, also for x. As shown in this example, a XLS Control Table can hold multiple tags (even in one cell via a comma-separated list) and can hence be re-used for multiple XLS Formatter nodes.

Regarding the applies to all tags options, see section Advanced Features below.

XLS Cell Formatter Node

This node provides cell text formatting options, such as * horizontal alignment (left, center, right, justify), * vertical alignment (top, middle, bottom), * text rotation angle, and * word wrap.

In the dialogs lower half, it also allows to convert cells that were written by KNIME as String cells to an XLS cell type other than String (i.e. numeric or boolean). The additional date/time options offered are a hybrid, as date/times are stored in cell type numeric in the XLS format specification, but a default text format is added in order to see a date, not the calendar days since Jan 1st, 1900. Note that with this feature, you can create columns of mixed types: Write the original, unformatted XLS file with String columns and later on convert the non-String cells back to their natural type via this node.

Besides the default date/time XLS text format, there is of course the option to specify custom ones, e.g. for currency, decimal places, etc. You can choose and learn from a dropdown menu to its right. The syntax is defined in the XLS standard, not by us. Hence you can configure your favorite text format in your spreadsheet application and copy the customized syntax String to the XLS Cell Formatter node from there.

Note that some conditional formatting like behavior can be achieved with the text format feature: The pre-defined text format #,##0.00;[Red](#,##0.00) will display positive values in black and negative values in red.

XLS Cell Merger Node

This node combines cells in a (rectangular) range into one. It must appear last in the chain of XLS Formatter nodes. In order to format the resulting big cell, set corresponding (and consistent) formats of the underlying cells before merging. The cell content is taken from the top-left most cell of the merge range. All other cells’ content is ignored, it is however a typical best practice to have the very same content in all cells of the merge range.

Unlike in the XLS Border Formatter and its example in Figure 6, merging cells can only occur on strictly rectangular tag areas. Thereby, two rectangular areas for a single tag may not share any cell border (but sharing a cell corner is allowed).

Regarding the applies to all tags options, see section Advanced Features below.

XLS Conditional Formatter Node

This node defines a color scale based on numeric threshold values and is applied to cells via tags. The node’s dialog offers to either define a 2-step or 3-step color scheme with corresponding numeric thresholds. All cells with a value below or equal the minimum threshold will take the corresponding minimum color (and vice versa with maximum). All values inbetween the two (or three) thresholds will take a gradual intermediate color between the defined ones in the RGB color model.

Note that this feature works only with corresponding content written to the target XLS file, which is out of scope of the XLS Formatting plugin. Note also that you can convert String-typed cells to numeric type via the XLS Cell Formatter node (see above), should they be written as a String column and not take the intended conditional color automatically.

While the tag range detection logic of the XLS Conditional Formatter implementation can handle jagged areas, it is advisable to rather define few rectangular ranges for conditional formatting. There is no need to avoid too big conditional formatting ranges, as on String or empty cells it would not have an effect anyways.

Note the related XLS Background Colorizer node, which assigns static colors to cell backgrounds.

XLS Font Formatter Node

This node allows to format the cell font in respect to * bold, italic, underline, * font size, and * font color.

XLS Row and Column Sizer Node

This node allows to change row heights and column widths. In order to change a specific row or column, place a corresponding tag in any of this row’s / column’s cells in the incoming XLS Control Table.

The numeric values attempt to follow the logic used in most spreadsheet applications, they are subject to some variation though. Especially column width is dependend on default fonts and can hence appear to vary from system to system. Column widths can be chosen to be auto-sized, meaning they match the longest content in its current formatting. Hence, it is advisable to chain this node after e.g. any XLS Font Formatter nodes.

Regarding the Control Table Style section and its size from control table option, see the section Advanced Features | Direct Values From Input Table Instead of Tags below.

XLS Sheet Properties Node

This node exposes sheet-level property settings:

The freeze sheet option finds the top-left-most tagged cell and freezes the view at its top-left cell corner. When scrolling the XLS document, all content above and to the left will always remain in view.

The auto-filter option expects exactly one rectangular range matching the provided tag and defines an auto-filter on top of it. It is up to the user to address only the header row or the entire data table within the XLS sheet.

The hide columns option hides all columns that have a tagged cell.

The hide rows option hides all rows that have a tagged cell.

XLS Formatter (apply) Node

The XLS Formatter (apply) node takes a set of instructions which has been gathered over potentially dozens of other XLS Formatting nodes. It then opens the unformatted input file, adds the corresponding formatting, and saves this as the provided output file.

Without this node, all other XLS Formatting nodes do not have any effect. It is always the last part of a XLS Formatting workflow that actually applies the desired formatting.

Note that pre-formatted input files are not supported and will most likely generate an error message. (Technically, this warning mechanism is tolerating a very small number of prior defined styles, in order to compensate for KNIME’s XLS writing strategy of date/time types.) The XLS Formatting plugin’s recommended usage follows the strategy depicted in Figure 1: first write the XLS file in KNIME, then apply formatting to it.

Advanced Features

The concept of using the XLS Formatter plugin is consistent until this point: XLS Control tables define via tags where to apply a certain, dialog-configured formatting to, which itself is chained until final application in the XLS Formatter (apply) node.

For typical application scenarios, this concept would require that some nodes are repeated very often. In order to avoid this overly manual duplication of logic, we added advanced features to some of our nodes.

‘All tags’ Option

Both XLS Border Formatter and XLS Cell Merger offer an ‘all tags’ option. It is a convenient way to apply a formatting to all tags present in the incoming XLS Control Table separately. Figure 7 shows an example, where the original data table contains location abbreviations in row 2, each occuring twice. The desired formatting is to merge each pair of location cells and to draw a thick border around it. Instead of either needing three nodes each or inventing complex logic to mark these cells with alternating own tags (we would need something like x, x, y, y, x, x in row 2 here), the all tags option re-applies the node’s instructions to every different tag found in the XLS Control table.

All tags option

Fig. 7: Example of ‘all tags’ option for alternating tags based on original table content.

This even works with the original table’s cell contents directly, and no tag syntax checks are executed, meaning special characters are allowed, but also comma-separated lists of tags are not supported. In Figure 7, of course row 3 would need to be deleted first before feeding this control table into the XLS Border Formatter and XLS Cell Merger nodes, as otherwise its content would qualify as tags as well.

Direct Values from Input Table Instead of Tags

The XLS Background Colorizer and XLS Row and Column Sizer nodes are often applied with a large number of different settings. To avoid manual repetition of dozens of these nodes, we offer more flexibility in the incoming XLS Control Tables here.

When the XLS Background Colorizer is configured in the direct color codes in RGB format mode, the incoming XLS Control Table does not contain tags, but color values directly (see node explanation above). Correspondingly, all dialog elements are deactivated as the color settings come via the input table, not via the user interface.

Similarly, the XLS Row and Column Sizer has the option size from control table, which deactivates the user interface. Here, the user cannot actively switch between both modes, but they are automatically detected based on the input table’s specification. In case it is a valid XLS Control Table specification (column names A, B, C, … and all String-typed columns), the default tag-based mode is active. In case the column names are the same, but all columns are of type Double, the direct values mode is activated, and the user only needs to choose the dimension to modify. In this case, row sizes are only allowed in column A and column widths in row 1.