KNIME and AWS Machine Learning Service Integration

Thu, 01/09/2020 - 10:00 jfalgout

Author: Jim Falgout (KNIME)

Organizations are using cloud services more and more to attain top levels of scalability, security, and performance. In recent years, the Amazon Web Services (AWS) cloud platform has released several services that support machine learning (ML) and artificial intelligence (AI) capabilities to enable developers and data scientists to quickly and easily build, train, and deploy machine learning models at any scale.

In our latest release of KNIME Analytics Platform, we added more functionality to our KNIME Amazon Machine Learning Integration. Think of KNIME as a quick and powerful start to AWS Services now enabling greater interaction between the various AWS services.

Today we want to focus on integrating AWS ML services with KNIME Analytics Platform to provide a simple, out of the box way to get started with AWS. The specific services we want to discuss in this article are:

  • Comprehend - ML based text processing
  • Translate - translating free text from a source to a target language
  • Personalize - a recommendation service using Amazon technology

The Comprehend, Translate, and Personalize Services already have an integration within KNIME. We’ll concentrate on those first and then show how to use the Python integration in KNIME to invoke other AWS ML Services such as Amazon Rekognition.

Using AWS Comprehend and Translate

KNIME Analytics Platform offers powerful text analysis capabilities. Integrating those capabilities with the AWS Comprehend and Translate services allows you to use them together quickly.

For example, KNIME supports Comprehend Syntax and Comprehend Entities functions to tag words in text. Text tagging is a common way to further analyze text by using parts of speech tags, entity tags, and other tag types for natural language processing.

Text analysis of RSS feeds relating to travel alerts

This type of analysis could be used - for example - to put together a travel risk map for corporate safety. Due to the increasing internationalization of business, more and more employees are becoming globally mobile. Providing reliable information and protecting workers abroad is the employer’s duty. This means assessing risks and implementing risk management. 

You can download and try out our example workflow, the Travel Risk Map using AWS Comprehend and Translate, from the KNIME Hub here.

The following screenshot shows a fragment of our Travel Risk Map workflow. The workflow integrates multiple AWS ML Services. It captures RSS feeds relating to travel alerts. The alerts may be in different languages depending on the source. Alerts for the selected country are gathered and translated using Amazon Translate. Amazon Comprehend is then used to tag entities and discover key phrases within the alert text. Amazon Comprehend is also used to determine the sentiment of each alert.

The final result of the workflow is a word cloud with key phrases and entities from the alerts. Selecting a word in the word cloud provides the list of alerts for the selected word or phrase with its sentiment.

KNIME on AWS

Fig. 1 Example workflow showing how to preprocess data for AWS ML Services and post process the results.

Several other Comprehend functions along with Translate are integrated in the current KNIME release. A quick way to find these integration nodes and any sample workflows that use them is on the KNIME Hub. Go to http://hub.knime.com and search for “Amazon” to see the full breadth of AWS integrations in KNIME. You can even drag and drop a node or component into KNIME to start using it. Any needed extensions are installed automatically.

AWS Personalize Service

The Amazon Personalize Service supports importing users, items, and interaction data into a dataset group within the service. Once the data are imported, a personalization solution can be built using the data.

Typical solutions include the capability to recommend items for a user, find items related to a particular item, and rank items by preference for a user. To use a solution, a campaign is launched that provides a scalable interface. We’ll demonstrate using KNIME Analytics Platform how to invoke a personalization campaign to provide item recommendations for a user.

Movie recommender

The KNIME workflow below demonstrates a full lifecycle of the Personalize Service. The first set of nodes load data into a dataset group within the service. The data used in the workflow are from the public Movielens dataset. The users, movies (items) and rankings (interactions) are loaded into Personalize. After loading the data, the dataset is used to create a solution. The solution uses the user personalization recipe. This recipe supports predicting which items a particular user will prefer. Once the solution is created, it is then deployed as a personalization campaign. The campaign provides an interface that supports passing in a set of users. The outcome is a set of recommended items for each user.

KNIME on AWS

Fig. 2. Workflow demonstrating usage of the Amazon Personalize service through the Amazon Personalize Movie Recommender workflow.

Once a campaign is deployed within the Amazon Personalization service it can be used over and over again to make user recommendations. Combining this with KNIME Server’s ability to invoke workflows with a REST API extends the usage of the recommendation model into production scenarios.

Using KNIME and Boto3

Boto3 is the Amazon Web Services software development kit (SDK) for Python. The Boto3 library enables Python developers to write software that can make use of any of the AWS Services. KNIME Analytics Platform includes a group of nodes providing integration with Python. When used in conjunction with the Boto3 library, the Python nodes in KNIME can be used to build interactions with AWS Services. 

Earlier in this blog we discussed using KNIME nodes that interface with the Amazon Comprehend, Translate, and Personalize services. But what if you want to use a service such as Amazon Rekognition in KNIME? That’s where utilizing the KNIME Python nodes and the Boto3 library together can help.

The Amazon Rekognition service supports image analysis such as facial recognition. The facial recognition capability returns information about each face recognized such as gender, predicted age, whether glasses are being worn, position within the image, and other detailed information. Using a Python node in KNIME, code can be written in Python that invokes the Rekognition service to invoke facial recognition for an image. The information returned from Rekognition can be gathered and output by the Python node. At that point, any other KNIME node can be used to process the information. It’s up to your imagination then as to how that information is used!

Using KNIME and Python to utilize the Amazon Rekognition service is just one example. As new services roll out from Amazon, the combination can be used to quickly prototype using the new service.

KNIME as a quick and powerful start to AWS Services

KNIME Analytics Platform helps users get going with their data analysis quicker because of its visual development environment. Users put together workflows piece by piece or in KNIME-speak "node by node". The process involves limited coding or no coding at all. As a result, KNIME lowers the threshold for users across fields to participate in the building of AI environments and enables the integration of multiple fields.

KNIME is a great quick start with AWS Services and enables customers to achieve greater speed to value. And with greater functionality now included in the integration to AWS Machine Learning and Artificial Intelligence Services, users have increased interaction between KNIME and Amazon Web Services. 

Tune in to more articles in this series on KNIME and cloud connectivity. We are excited to hear what you think about it. Send your comments to blog@knime.com 

References: