On the previous page, we introduced KNIME's built in reporting functionality using BIRT. We also introduced the Austin Restaurants dataset, from which we can produce tables and charts of interest. On this page, we'll present some advanced BIRT concepts, which you can use to generate more sophisticated visualizations of your data. These include:
- BIRT breadcrumbs, for easily selecting elements in a report
- A method for looping and grouping using KNIME workflow functionality
- A method for grouping using BIRT's built in capabilities
- Creating a native BIRT chart, and including dynamic text in BIRT using KNIME flow variables
Advanced Report Workflow
The workflow for this exercise is presented below. In it, we read in data and filter it by ZIP code, using global flow variables. We also perform a quick aggregation using a GroupBy node to obtain the most recent inspection date for each facility, and join that information back with the score data. The processed data is used to perform three tasks:
- In the green box, we present grouping Method 1. This method uses a KNIME Group Loop to produce, for each ZIP code, a list of the top 10 scoring locations, along with a histogram of scores for all locations. It then passes two separate tables to BIRT - one containing the top restaurants, and the other containing the histograms.
- In the blue box, we present grouping Method 2. Here we produce a table of locations with the highest and lowest restaurant scores in each ZIP code, and pass that table directly to BIRT.
- In the pink box, we present a method to create a box-and-whisker plot using BIRT's built in chart capabilities. For each ZIP code, the workflow calculates the minimum, maximum, and selected quartiles of the restaurant scores and passes that information to BIRT via a table.
Each of these methods is described in further detail in the sections that follow. The workflow and associated dataset is available at EXAMPLES/05_Reporting/01_BIRT/08_BIRT_Example_AdvancedEXAMPLES/05_Reporting/01_BIRT/08_BIRT_Example_Advanced*. You should download the workflow from the EXAMPLES server and execute it in full before continuing with the report creation steps below.
Sometimes, particularly when editing complex BIRT report layouts, it can be difficult to select the exact feature you need just by clicking. For instance, you might need to select an entire table, but instead you find yourself activating a grid cell or table row when you click. An easy way to avoid this problem is by using the breadcrumbs feature in BIRT.
You can turn breadcrumbs on or off by clicking the breadcrumbs button next to the zoom percentage dialog, as shown below. When you do, the breadcrumbs bar will appear at the top of the editor. Now, you can click directly on the breadcrumbs bar itself to drill down to the level of detail you need to select in the editor. The more complicated your reports are, the more often you will find yourself using breadcrumbs!
Grouping Method 1 - Loop Nodes
Since it's the most straightforward approach, let's start with Method 1 first in our report. After executing the workflow as described above, go ahead and switch over to the BIRT environment by clicking on the BIRT icon. To begin, let's create a grid of 2 columns by 2 rows at the top of the layout page. Set the left cells to be about one-third of the page width, and the right cells about two-thirds. Next, from the Data set view pane on the left-hand side of the screen, drag the Top 10 table into the top left grid cell. From the dialog that appears, keep the ZIP Code, Restaurant Name, and Score fields. Go ahead and set the header fields to be left-aligned and bold, and widen the Restaurant Name field within the table so that it can be read more easily.
Now that you've done some simple layout work, let's involve grouping. Select the table you've created - you can do this either by mousing over the table and clicking on the tooltip that appears, or by using the breadcrumbs as described above.
This will allow selection of the Groups tab in the Property Editor at the bottom of the screen. Click Add... to bring up the New Group dialog.
In the New Group dialog, under Group On, select Zip Code. In addition, under the Page Break section, change the After dropdown to Always. This will ensure that the Top 10 table you imported into BIRT will be displayed with a different Zip Code on each page. After clicking OK, our table will be displayed with a subgrouped header for ZIP code in the layout view - this header can be deleted.
Now that you've added the grouped Top 10 table, let's bring in the histogram images generated in KNIME. Similar to what was done above, drag the Hists by Zip table to the top right grid cell, keeping only the Zip Code field. Select the table and add a group on Zip Code, setting the Page Break After to be Always. Go ahead and remove the subgroup header for Zip Code. In fact, this time remove the Zip Code itself and all headers, leaving a blank table structure.
Into this blank grouped table, drag in the Image icon from the Report Items panel. The Edit Image Item dialog will appear. In this dialog, choose Dynamic Image, and click the Select Image Data button.
This will open a separate Select Data Binding dialog. Here, choose First(Image), then click OK. Click Insert on the previous dialog to return to the layout editor. This produces a red X in the grouped table of the layout editor, but that's all right - the image should display properly in our final report. At this point, the layout editor should look something like this:
This is all the setup that is needed for the Grouping Method 1 report. Before we continue, let's preview the report in the web browser to make sure everything is configured appropriately. Do this by clicking the View Report button in the KNIME toolbar. Note that when you do, you see a dialog in the web browser that looks like this:
This dialog is asking to confirm the lower and upper bounds for the ZIP codes that were defined, using global workflow variables, in the original KNIME workflow. If you wanted to produce a report based on a different range of ZIP codes, you could change that here. As it is, continue with the default values by clicking OK. The first page of the report should look something like this:
This page presents the top 10 scores for ZIP code 78750, along with a histogram of all scores for that ZIP. If you want to see information for the other ZIP codes, you can navigate the report using the arrows in the top right of the browser window.
Making a Box Plot Using the Chart Editor
Now let's make a simple box plot (also known as a box-and-whisker plot) based on our data. Box plots are used to identify, at a glance, how data is distributed. The box displays the Interquartile Range (IQR), which is the range between the 25th and 75th percentiles of the data, while the whiskers represent the range of the minimum and maximum values. (Note that are other ways that whiskers can be used to represent outlying data points; see https://en.wikipedia.org/wiki/Box_plot for more information on this topic.)
To build such a plot, we first calculate for each ZIP code the 25th and 75th percentiles of restaurant scores, along with maximum and minimum scores, using a GroupBy node. We then pass this data to BIRT to build the plot using BIRT's built-in Chart Editor.
To begin, drag in a Chart icon from the Report Items panel into the blank area below the 2x2 grid. This will open the Chart Editor GUI. On the initial Select Chart Type page, you need only select the Stock chart type from the bottom left panel - you'll use defaults for all of the other options. Click Next.
On the subsequent Select Data page, you'll define the data you wish to use, and how the variables should be represented in the box plot. First, under the Select Data section, click the Use Data from radio button and select SimpleBoxPlot from the dropdown. For the Category (X) Series, choose Zip Code. For the Value (Y) Series, you'll notice that four inputs are available: Open, High, Low, and Close. For each of the inputs, select the 75th quantile, Maximum, Minimum, and 25th quantiles, respectively. The Data Preview panel will now show color highlights to reflect your choices; if you click the Show data preview check box, you can review the actual calculated values from the SimpleBoxPlot dataset. Click Next.
On the final Format Chart page of the Chart Editor, you will see several options for changing the way the chart will be displayed in the BIRT report. Let's first provide a name for the horizontal axis by clicking on X-axis in the left-side pane, clicking the Visible checkbox to toggle display of the axis, and set the name to Zip Code.
You change the name of the chart by making a similar edit on the Title options - let's set it to Score Distributions by ZIP Code. Since you only have a single series to plot here, let's get rid of the legend by unchecking the Visible checkbox on the Legend pane as well. Let's also change the background color to a light green by using the Background dropdown on the Chart Area panel. Click Finish.
You should now see a dummy version of your chart displayed in the layout editor - the data displayed won't be correct, but the formatting options roughly should be. Drag the bottom corner of the chart to the bottom right to make it larger, and add a page break (After: Always) via the Property Editor as you have done before. At this point, the layout editor should look something like this:
Let's now take a look at our report in the web viewer. Click on the double arrows to advance to the final page in the report, and you should see something similar to this:
Here we see the final box plot as generated by BIRT. At a glance, you can see that although the extreme minimum values vary somewhat by ZIP code, overall the middle portion of the score distribution does not vary much.
Grouping Method 2 - Nested BIRT Tables, plus Dynamic Text
As a final exercise, let's create a table featuring the highest and lowest scoring locations in a given ZIP code, along with a simple bar chart that displays the relative scores. This time, though, we'll approaching grouping with a bit of a trick - by nesting a table inside another table, and using some creative filtering.
To begin, drag the Extremes by Zip dataset from the Data set view panel. When the Data Set Binding dialog appears, select only the ZIP Code column. Clear both the header and the cell content, leaving an empty table. Now for the tricky part: drag the same dataset into the empty table you just created. This time, in the Data Set Binding dialog, let's include all fields except rowID, Facility ID, Address, and Process Description (both First and Last). For the inner table you just created, add a filter by selecting the appropriate tab in the Property Editor and clicking Add....
Note. Use breadcrumbs to make sure you've selected the inner table, and not the outer table.
This brings up the New Filter Condition dialog. Here you should select ZIP Code from the dropdown on the left-hand side, and Equal to as a condition. Next you'll create an expression for the right-hand side of the dialog - you can do this either by clicking the ƒx button, or by selecting Build Expression... from the dropdown menu.
When you do, a separate Expression Builder dialog appears. Here you want to build a string that represents the grouping of the outer table. From the Category heading, choose Available Column Bindings, and from the Sub-Category, choose the bottom Table. Then, double-click on Zip Code to create the string in the Expression field. If you like, you can also just type this string manually into the Expression Builder, but you must be careful to get the syntax precisely correct.
Click OK in both dialogs to navigate back out. You've now created a filter for the data to be displayed in the inner table, that is actually based on the outer table.This has the practical effect of displaying the data for each individual ZIP code.
Let's do a bit of clean up on the layout to make things more legible by renaming the column headers. Also, go ahead and add a page break (After: Always) on the inner table by using the appropriate options in the Property Editor, as you've done before. Next, let's prepare to make space for our bar chart. Select the inner table, and right-click on the yellow data row icon, choosing Insert → Row → Below.
Shift-click to select all of the cells in the newly created row, and choose Merge Cells. You should now have a single blank cell, spanning the width of the table, into which you can drag a Chart icon from the Report Items panel. This will be a simple bar chart that displays the minimum and maximum scores for each ZIP Code, so navigate to to the Select Data tab in the New Chart dialog. The Select Data radio button will already be set to Inherit Columns and Groups, which is what you need. For the X series, choose ZIP Code. For Y Series 1, choose the Max score. Then, add a new series using the dropdown, and for Y Series 2, choose the Min Score.
Under the Format Chart tab, turn off the legend and title using methods shown above - you'll provide our own title dynamically in a moment. Let's fix the Y-axis scale to values between 0 and 100. Otherwise, the scale of the plot may change from page to page, which would be confusing to readers. Do this by clicking on the Scale button associated with the Y-axis, and defining the scale in the dialog that appears.
Click Finish to close the New Chart dialog and place the chart in the Layout Editor. Click-drag the corner of the chart to enlarge it, filling about two-thirds of the empty cell.
Finally, let's add a title for our report based on BIRT's dynamic text feature. This will allow you to change the text in the title based on the ZIP Codes defined in the global workflow variables used to build the report. To do this, drag a Dynamic Text icon from the Report Items panel below the chart you just made. This opens an Expression Builder dialog. Here you will use BIRT's concatenate function to combine some static text with the values of the workflow variables. Notice that the static text is contained within quotes, while the workflow variables are referenced as parameters within square brackets. Enter the text shown in the figure below into the Expression builder, and click OK.
Now that the layout editing is finished, you should have a layout editor that looks something like this:
If you preview the results of the report in your web browser and skip to the last page, this is the result:
Here you can see the table of minimum and maximum scores created, along with the associated bar chart, and the dynamically generated text. Results for each ZIP Code are presented on different pages.
Wrapup and a Final Caveat
At this point, you've been able to prepare some grouped reports a few different ways, using charts generated in both KNIME and BIRT's built-in Chart Editor. But there is an additional thing to be aware of before you go - please note that some exports formats from BIRT work better than others. We've primarily used the Web Viewer to preview the results of reports in the exercises above, but PDF and PPTX formats work well too. BIRT provides several other options to generate reports, but some of these may produce unexpected results. On top of that, a report that looks fine in one format may look very different in another, so always double check your exported reports to make sure you're getting the results you want.
* The link will open the workflow directly in KNIME Analytics Platform (requirements: Windows; KNIME Analytics Platform must be installed with the Installer version 3.2.0 or higher)