Blending different data sources
In order to get a comprehensive overview of a company’s current offers on the market and focus on customer reviews, various data sources must be integrated and merged: business data, publicly available customer ratings, news data, journalist reports, and press releases. By crawling these data from a variety of different public sources using KNIME and an integrated Python script and enriching them with company and other relevant data, a comprehensive database is created, which can be fed into any dashboard-solution software.
Deploying a Guided Analytics application
A KNIME workflow is built, which crawls all these sources. After the data has been crawled and saved using a REST API interface, it is imported from the JSON files and converted into a KNIME table format using filtering, mapping, and data encoding. Then a set of pre-defined dictionaries is used to ensure that the customers’ ratings are correctly mapped to the specific ocean carriers, the correct ships, even the cabin they stayed in. From these data, high-level data aggregations can be formed to create a compact set of rating dimensions. The workflow is then deployed in an Azure cloud environment as a Guided Analytics Application, making vast computational resources available to deploy in-depth descriptive analysis on data integrated from various resources, enabling alerts and notifications to company managers for improving/deteriorating products.
With this analytical application, companies can manage their product more in a more informed and intelligent way due to:
- Comprehensive data integration generated by KNIME
- Ease-of-use production system, which is automated for continuous data integration
- Notification and alerts for improving/deteriorating company products
Why KNIME Software
KNIME provides the tools and resources to easily blend data from different sources in one visual workbench. The vast selection of nodes that are available, plus the procedures that are possible, make it easy to create solutions such as this. In this case, JPython Function, JSON to Table, String Manipulation, and Chunk/List Loops nodes were heavily used. Furthermore, deploying the workflow in a Microsoft Azure cloud environment, provides additional computational resources when needed.