Figure 1. You can find all date and time nodes in the Time Series category of KNIME Analytics Platform
For the KNIME Analytics Platform 3.4 release we did a full rewrite of our Date and Time support. This blog post will get you introduced to the new Date & Time integration and its features.
I will start with the new Date and Time column types, how they differ from the old Date and Time column type, and how they can be converted. Afterwards, I will talk about the features which were included into the new integration.
Before going into any details, I want to share my personal list of highlights with you. They are (in no particular order):
- Time zone support
- Multiple columns support
- Higher flexibility for measuring time differences
- Auto-guessing of string formats
- Flow variable support
The new Date & Time column types
Before this release we had a single column type that could be used for either date only, time only, or date and time. Now, we have a dedicated column type for each of these uses. In addition to these three column types, we now have a column type for date and time with a time zone. We thus have four new column types for representing date and time:
- Date (e.g. the first of November 2017)
- Time (e.g., 9:30 A.M.)
- Date &Time (e.g., the first of November 2017, 9:30 A.M.)
- Date & Time with zone (e.g., the first of November 2017, 9:30 A.M. in Europe/Berlin ).
The first three represent exactly what the old column types did; the fourth is adding new functionality.
Figure 2. The four new Date & Time column types
Highlight #1. Time Zone Support. The long awaited time zone support is now available. Especially useful for all dealing with databases.
Converting from one Date & Time column type to another
There are three nodes for converting from one date and time format to another, or for changing date, time, or time zone. The diagram below shows how they work.
For example, you can convert a Zoned Date Time into a Date & Time by removing the time zone with the Modify Time Zone node. Or you can remove the time from a Date & Time with the Modify Date node. All three nodes can be used to add, modify, or delete options.
Figure 3. The four new date and time column types can be converted to each other with the Modify nodes
Note. The legacy DateTime nodes still work. There is no need to replace them in your old workflows.
If you want to convert from a legacy Date and Time format to a new format, you can do so with the new “Legacy Date&Time to Date&Time” node. To convert in the other direction, you can use the new “Date&Time to legacy Date&Time” node.
Figure 4. The two new converter nodes
The “Legacy Date&Time to Date&Time” node will auto-guess the best matching date format based on the first selected cell. In the old Date and Time integration the time zone was always UTC. When converting from legacy Date&Time to new Date&Time, you can now specify the time zone.
Highlight #2. We have reached the second one of my highlights: auto-guessing. Everyone who has ever had to read date and time strings knows the problem of figuring out what the correct format is. To make this easier, we now have auto-guessing implemented in the String to Date&Time node.
Figure 5. The String to Date&Time node now supports converting multiple columns at once and automatically detects date format and type
Highlight #3. In the dialog of the String to Date&Time node above you see my next highlight: multiple column support. All Date & Time nodes now support multiple columns. Since most of the time you have only one timestamp per row, this usually doesn’t matter, but there are times when this feature saves you a node or two.
The Time Difference node also got a major rewrite. It still calculates the differences between two Date and Time columns, between a column and the execution time, or between two consecutive rows. In addition, you can now change how those differences are represented. You can:
- Choose a granular representation in terms of hours, days, weeks, or months by clicking on the “Granularity” option,
- or – and this is new – calculate duration in terms of hours and minutes.
The default visualization of the new date cell duration is “Short Duration”, but you can change it to “Long Duration” in the “Available Renderers” configuration/context menu.
Figure 6. Duration represents a timespan between two Date and Time cells. You can change the renderer to see a human-readable version of the duration cell
Highlights #4. Here is my 4th highlight: higher flexibility in measuring time differences. You can now also measure the differences with durations.
Extracting Time Windows
The durations are also very handy when you want to filter date and time cells. Suppose you want to have the last 5 months in your data. We have added a new node for filtering dates and times: the Date&Time-based Row Filter node. It can be configured with:
- a fixed start and end date
- the start date can also be set to the execution datetime
A duration from a start date using a duration of 5 months and current execution time as start date will automatically return the last 5 months of data. In Figure 7 you see how you need to configure the node for this task.
Figure 7. Filtering dates and times has become more flexible with the rewrite of the node. It is now possible to filter based on a duration or a number of timespans
Generating Fixed Time Intervals
An often-requested feature is the generation of fixed time intervals. Here durations are useful as well. The Create Date&Time Range is the new node for generating all kinds of date and time cells.
Figure 8. The Create Date&Time Range node has also been rewritten
Flow variable support
There is one of my highlights left, and some might claim I saved it for last in order to make you read the whole post.
Highlights #5. Yes, it is true: you are now able to use flow variables for date and time configurations in KNIME.
I want to illustrate this with a small use case of my own. I needed to filter the last full week of data from my data set. To do this, I had to know the date of the last Sunday before execution of my workflow. I used the Create Date&Time Range node to generate the last 7 days before executing the node. I then extracted the “day of the week” with the “Extract Date&Time Fields” node, selected Sunday with the Row Filter node, and added the time to the date cell with the Modify Time node.
Finally, I could convert the date into a flow variable. The flow variable was used in the Date&Time-based Row Filter node. In addition, I also used the duration, to get every timestamp before last Sunday, but at most one week before last Sunday.
You can find the workflow on our example server: 02_ETL_Data_Manipulation/06_Date_and_Time_Manipulation/04_Filter_TimeSeries_Data_Using_FlowVariables02_ETL_Data_Manipulation/06_Date_and_Time_Manipulation/04_Filter_TimeSeries_Data_Using_FlowVariables*
Figure 9. The workflow for filtering the last full week of data
These were my personal highlights from our new date and time integration. What about yours?
You can email your personal list of favorite date&time changes to firstname.lastname@example.org
Stay tuned to KNIME News to learn more about the upcoming KNIME Analytics Platform release!
* The link will open the workflow directly in KNIME Analytics Platform (requirements: Windows; KNIME Analytics Platform must be installed with the Installer version 3.2.0 or higher)