Productionize

Combining the power of KNIME and H2O.ai in a single integrated workflow

October 14, 2020 — by Paul Treichler &  Stephen Rauner

Expanding Partnership by adding KNIME H2O Driverless AI support

Today, we’d like to look at how customers of both H2O.ai and KNIME can benefit from a new integration that enables H2O Driverless AI to be used in KNIME. KNIME users can leverage Driverless AI in a workflow to provide automatic feature engineering, model validation, model tuning, model selection, machine learning interpretability, time-series, NLP, computer vision, and automatic pipeline generation for model scoring. H2O Driverless AI provides companies with a data science platform that addresses the needs of various use cases for every enterprise in every industry. 

We have just announced that we have expanded our partnership and collaboration. The new partnership means that you can now seamlessly use H2O Driverless AI in KNIME via a new KNIME Driverless AI extension available from the KNIME Hub. “The integration of Driverless AI offers KNIME users a strong, additional option to automate machine learning out of the box with a huge range of powerful algorithms. We believe that flexibility of choice brings most value to our users and customers, and H2O is a great addition to the mix.” Michael Berthold, CEO, and co-founder of KNIME.

The aim of this article is to provide you more details about the integration, how to get started, how various personas can leverage this integration, access to a sample workflow, and pointers to further resources.

Content

Early Adopter Feedback

We have been working with a few early adopters to get their feedback. The response has been overwhelmingly positive and a feeling of excitement about the integration and productivity gains. Vision Banco has been a long term user of H2O.ai and KNIME. The data science team is looking forward to the improved simplification and even more rapid development of data science projects. Below is a quote by Alejandro Lopes, the Data Science Leader at Vision Banco on how he thinks it will help them:

We have been using KNIME and H2O Driverless AI for years, and we are very excited about this new integration and the automation and simplification that it will bring to our data science workflow.” Alejandro Lopez, Data Science Leader of Vision Banco

New to KNIME?

Learn more from the KNIME product page.

Combining the power of KNIME and H2O.ai in a single workflow

Fig. 1 Overview of KNIME Software

New to H2O Driverless AI?

Explore the product page or tutorials.

Combining the power of KNIME and H2O.ai in a single workflow

The KNIME H2O Driverless AI Extension

In order to use H2O Driverless AI within KNIME Analytics Platform, all you need to do is install the H2O Driverless AI extension, and you’re ready to go. Check this video, if you do not know how to install a KNIME extension.

The integration of H2O Driverless AI in KNIME offers an extensive number of nodes and encapsulating functionalities of the H2O Driverless AI automatic machine learning (AutoML) platform, making it easy to use H2O Driverless AI autoML capabilities from a KNIME workflow without touching any code - each of the H2O Driverless AI nodes looks and feels just like a normal KNIME node - but the workflow reaches out to the high-performance libraries of H2O during execution.

Combining the power of KNIME and H2O.ai in a single workflow

Fig. 2 The H2O Driverless AI nodes in KNIME

Use Cases By Persona

This new integration between H2O Driverless AI and KNIME helps various personas in the data science life cycle. Below will provide a short overview of key personas and how this new integration improves the workflow and productivity.

Data Engineers

For Data Engineers, this solution enables seamless data preprocessing connected into DriverlessAI using the popular, easy to use, and free KNIME Analytics Platform. You can also use KNIME Server to provide additional deployment capabilities, automation, collaboration, cloud execution, and IT administration. With the new KNIME to H2O.ai connectors, customers can do data blending with hundreds of data sources, including Salesforce, Sharepoint, Oracle, SAP, SAP Hana, Snowflake, Spark, DataBricks, Hadoop, Tibco, Tableau, PowerBI, AWS, Azure, and GCP.

Data Scientists

For data scientists and model operation teams, this solution provides additional flexibility by enabling a mix and match of automated and custom machine learning approaches. Data scientists can now collaborate with business stakeholders, gaining valuable input to achieve the optimal result. Upon initial model creation, they can ensure that it is streamlined using Integrated Deployment from KNIME and the Driverless AI AutoML and MOJO deployment artifacts. The addition of Driverless AI natively within a KNIME workflow now provides data scientists an integrated visual drag and drop ability to create such a pipeline. Data Scientists can now leverage the industry-leading AutoML in Driverless AI to quickly train high quality and explainable models that are production-ready in less time.

Deployment Teams

For Deployment Teams, there is now additional flexibility in how and where the H2O Driverless AI trained models are automatically deployed as workflows, from visualization to being deployed as RESTful services, to web applications, to BI dashboards, to 3rd party tools, and all with a no-code approach. Teams will now be able to automatically and continuously deploy and update models including automated data access, preparation, and pre-processing of workflows, ensuring that there is no loss in translation between the creation and deployment of the model and that ideal compute resources are utilized for ongoing deployment.

Data Science Team Leaders

For Leaders of Data Science teams, this solution enables you to make the best use of your people, time, and technology resources in order to meet the needs of both the team and the enterprise. It provides an environment which empowers your data science team to use best in class AutoML with other best in class approaches and to collaborate on complex projects with the granular permissions and logging needed for team and project management. Productionize data science applications and services in a way that is transparent, secure, and able to be audited and governed as needed. The deployment and management functionalities make it easy to productionize data science applications and services and deliver usable, reliable, and reproducible insights for the business.

Line of Business Leaders

This solution provides Line of Business Leaders to have insight into the entire process and data lineage so that you can understand how and why decisions are made from data access to deployment and bring your domain expertise to bear in the process. This allows you to mitigate risks and ensure the best results are delivered quickly and at scale to drive the desired business outcome.

Four Steps to Getting Started

The 4 steps to get started with the KNIME Analytics Platform and H2O Driverless AI integration are:

  1. Get the tools
  2. Get KNIME Extension
  3. Configure KNIME to connect to H2O Driverless AI server
  4. Start Building your workflow

Below we will provide a quick overview of each step.

1. Get the tools

If you are interested in trying the Driverless AI integration with KNIME Server please email partners@knime.com.

2. Get the H2O Driverless AI KNIME Extension

Download and Install Driverless AI KNIME Extension from the KNIME Hub, by dragging and dropping the extension directly to your installation of KNIME Analytics Platform.

Combining the power of KNIME and H2O.ai in a single workflow

Fig. 3. Installling the H2O Driverless AI extension from the KNIME Hub.

3. Configure KNIME to connect to H2O Driverless AI

You are almost ready to start, now you just need to enter the Driverless AI license key and configure KNIME to connect to H2O Driverless AI. Follow these instructions.

Combining the power of KNIME and H2O.ai in a single workflow

Fig. 4 Configuring KNIME to connect to H2O Driverless AI.

4. Start Building your workflow

Once you have successfully installed the Driverless AI Extension, restart KNIME Analytics Platform and you should see the following nodes in the node repository under KNIME Labs:

Combining the power of KNIME and H2O.ai in a single workflow

Fig. 5 The H2O Driverless AI nodes in the Node Repository.

Get an overview of how to starting building your flow below and follow the KNIME H2O Driverless AI Integration User Guide

Combining the Power of KNIME and H2O in a Single Workflow Example

In this section, we will walk through an example of the major steps of an end-to-end data science workflow using KNIME Analytics Platform and Driverless AI.

Step 1: Import the Driverless AI license

In order to utilize the H2O Driverless Al nodes, you will need to import an H2O Driverless Al license file into your KNIME preferences.

  • You will find the Driverless AI license key typically under the following path:
    • /opt/h2oai/dai/home/.driverlessai/license.sig
  • Copy this file to where your KNIME Analytics Platform is installed.
  • Import this file into KNIME by navigating to File -> Preferences -> KNIME-> H2O Driverless Al and, as shown in Figure 6.
  • Now upload the .sig file provided by H2O.ai.
Combining the power of KNIME and H2O.ai in a single workflow

Fig. 6: Upload Driverless AI license to KNIME

Step 2: Importing Data

KNIME supports a wide array of data types. From flat files to dynamic Spark connections, KNIME can make it simple to read disparate data types and make them work together for use in machine learning algorithms. In the example below, joining a CSV file, two database tables, and a KNIME table is a simple drag and drop process. 

Combining the power of KNIME and H2O.ai in a single workflow

Fig. 7 Joining a CSV file, two database tables and a KNIME table is a simple drag-and-drop process.

Step 3: Data Preparation

KNIME provides a rich set of data source connectors and data preparation nodes with a no-code drag and drop canvas to simplify data access and preparation. This empowers data analysts, data engineers and data scientists to quickly build data preparations flows to prepare, wrangle, clean, join and filter the data and get it ready for machine learning. Once the data is prepared it can be connected to Driverless AI to build the machine learning models within the same drag and drop canvas.

Combining the power of KNIME and H2O.ai in a single workflow

Fig. 8 Data source connectors and data preparation nodes are connected via a no-code drag and drop canvas to simplify data access and preparation.

Step 4: Building Models with Driverless AI

In order to send KNIME data tables to Driverless AI, connect your workflow to the “Send to Driverless AI” node: Right-click the node and select Configure… from the context menu.

Combining the power of KNIME and H2O.ai in a single workflow

Figure 9: Example workflow to push data from KNIME Analytics Platform to H2O Driverless AI

Before you push the data to Driverless AI you need to configure the connection.

Combining the power of KNIME and H2O.ai in a single workflow

Fig. 10 Configuring the connection to H2O Driverless AI

After you send the data to Driverless AI you can right click the “Send to Driverless AI” node and select “Interactive View: H2O Driverless AI Experiment View” to bring up the Driverless AI and use this interface to build an experiment, view AutoReport and generation Machine Learning Interpretability (MLI) metrics and graphs.

Combining the power of KNIME and H2O.ai in a single workflow

Fig. 11 Opening the interactive view to bring up the Driverless AI and use this interface to build an experiment, view AutoReport and generation Machine Learning Interpretability (MLI) metrics and graphs.

Below is what the Driverless AI UI looks like within KNIME

Combining the power of KNIME and H2O.ai in a single workflow

Fig. 12 H2O Driverless AI user interface in KNIME

Step 5: Deploy Model and Score New Data

KNIME can build Machine Learning production workflows to consume the models that were trained. H2O.ai provides production ready low latency models and pipelines in the MOJO deployment artifact. MOJO (stands for Model Object, Optimized) is a standalone, low-latency model object designed to be easily embeddable in production environments. Add an H2O Driverless AI MOJO Predictor node to score data within a KNIME Workflow via drag and drop interface.

Combining the power of KNIME and H2O.ai in a single workflow

Conclusion

The expanded integration between H2O.ai and KNIME brings together all-encompassing, intuitive, automated machine learning from H2O.ai with the guided analytics from KNIME. Customers of H2O.ai and KNIME can now:

  • Develop an integrated data science workflow in KNIME Analytics Platform and KNIME Server, from data discovery, data preparation to production-ready predictive models
  • Deliver the power of automatic machine learning to business analysts, enabling more citizen data scientists with H2O Driverless AI
  • Reduce model deployment times, leveraging H2O Driverless AI and KNIME Server for reliably managing workflow, the model creation process, and production deployment

Additional Resources

Blog Articles

KNIME H2O.ai Extensions

KNIME Example Workflow

Community

Docs

Partner Pages

You may also like