KNIME news, usage, and development

25 Sep 2017RolandBurger

In this blog series we’ll be experimenting with the most interesting blends of data and tools. Whether it’s mixing traditional sources with modern data lakes, open-source devops on the cloud with protected internal legacy tools, SQL with noSQL, web-wisdom-of-the-crowd with in-house handwritten notes, or IoT sensor data with idle chatting, we’re curious to find out: will they blend? Want to find out what happens when IBM Watson meets Google News, Hadoop Hive meets Excel, R meets Python, or MS Word meets MongoDB?

Follow us here and send us your ideas for the next data blending challenge you’d like to see at willtheyblend@knime.com.

Today: SugarCRM meets Salesforce. Crossing Accounts and Opportunities

The Challenge

Businesses use Customer Relationship Management (CRM) systems to keep track of all their customer related activities – creating leads and opportunities, managing contacts and accounts, sending quotes and invoices, etc. As long as it is somehow related to the stream of revenue, it is (or at least should be) stored in a CRM system.

Since there is more than one CRM solution on the market, there is a distinct chance that your organization uses multiple CRM platforms. While there might be sound reasons for this, it also poses a significant challenge: How do you combine data from several platforms? How do you generate a single, consolidated report that shows you how well the sales activities of your whole company are going?

One option is to export some tables, fire up your spreadsheet software of choice, and paste the stuff together. Then do the same thing next week. And the week after. And the week after that one (you get the point). Doesn’t sound too enticing? Fear not! This is KNIME, and one of our specialties is to save you the frustration of doing things manually. Fortunately, both SugarCRM and Salesforce allow their users to access their services via REST API, and that is exactly what we are going to do in this blog post.

There are a couple of prerequisites here. First of all, you obviously need accounts for SugarCRM and Salesforce. If you don’t have them but still want to try this yourself, you’ll be happy to see that both companies offer free trial licenses:

https://info.sugarcrm.com/trial-crm-software.html?utm_source=crmsoftware&utm_medium=referral&utm_campaign=crmsoftware-review

https://developer.salesforce.com/signup

You can learn more about how to use the REST APIs of SugarCRM and Salesforce here:

http://support.sugarcrm.com/Documentation/Sugar_Developer/Sugar_Developer_Guide_7.9/Integration/Web_Services/v10/

https://developer.salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/intro_what_is_rest_api.htm

Topic. Get a consolidated view of all customer data from two separate platforms

Challenge. Query data from SugarCRM and Salesforce via their APIs

Access Mode. KNIME REST Web Services

Read more


18 Sep 2017berthold

We all know that just building a model is not the end of the line. However, deploying the model to put it into production is often also not the end of the story, although a complex one in itself (see our previous Blog Post on “The 7 Ways of Deployment”). Data scientists are increasingly often also tasked with the challenge to regularly monitor, fine tune, update, retrain, replace, and jump-start models - and sometimes even hundreds or thousands of models together.

In the following, we describe, in increasing complexity, different flavors of model management starting with the management of single models through to building an entire model factory.

Step 1. Models in Action: Deployment

We need to start with actually putting the model into production, e.g. how do we use the result of our training procedure to score new incoming data. We will not dive into this issue here, as it was covered in a separate blog post already. To briefly recap: we have many options such as scoring within the same system that was used for training, exporting models in standardized formats, such as PMML, or pushing models into other systems, such as scoring models converted to SQL within a database or compiling models for processing in an entirely different runtime environment. From the model management perspective, we just need to be able to support all required options.

It is important to point out that in reality very often the model alone is not very helpful unless at least part of the data processing (transformation/integration) is a part of the “model” in production. This is where many deployment options show surprising weaknesses in that they only support deployment of the predictive model alone.

To get a visual analogy started that we will use throughout this post, let us depict what this simple standard process looks like:

Read more


11 Sep 2017jonfuller

Introduction

The aim of this blog post is to highlight some of the key features of the KNIME Deeplearning4J (DL4J) integration, and help newcomers to either Deep Learning or KNIME to be able to take their first steps with Deep Learning in KNIME Analytics Platform.

Useful Links

If you’re new to KNIME, here is a link to get familiar with the KNIME Analytics Platform:
https://www.knime.com/knime-online-self-training

If you’re new to Deep Learning, there are plenty of resources on the web, but these two worked well for me:
https://deeplearning4j.org/neuralnet-overview
http://playground.tensorflow.org/

If you are new to the KNIME nodes for deep learning, you can read more in the relevant section of the Node Guide:
https://www.knime.com/nodeguide/analytics/deep-learning

With a little bit of patience, you can run the example provided in this blog post on your laptop, since it uses a small dataset and only a few neural net layers. However, Deep Learning is a poster child for using GPUs to accelerate expensive computations. Fortunately DL4J includes GPU acceleration, which can be enabled within the KNIME Analytics Platform.

If you don’t happen to have a good GPU available, a particularly easy way to get access to one is to use a GPU-enabled KNIME Cloud Analytics Platform, which is the cloud version of KNIME Analytics Platform.

In the addendum at the end of this post we explain how to enable KNIME Analytics Platform to run deep learning on GPUs either on your machine or on the cloud for better performance.

Read more


04 Sep 2017rs

The latest release of KNIME Analytics Platform 3.4 has produced many new features, nodes, integrations, and example workflows. This is all to give you a better all-round experience in data science, enterprise operations, usability, learning, and scalability.

Now, when we talk about scalability, the cloud often comes to mind. When we talk about the cloud, Microsoft Azure often comes to mind. That is the reason why KNIME has been integrating some of the Azure products and services.

The novelty of this latest release consists of the example material. If you currently access (or want to access in the future) some Microsoft products, on the cloud, in your KNIME workflow, you can start by having a look at the 11_Partners/01_Microsoft folder in the EXAMPLES server and at the following link on the KNIME Node Guide https://www.knime.com/nodeguide/partners/microsoft.

 A little note for the neophytes among us. The KNIME EXAMPLES server is a public KNIME server hosting a constantly growing number of example workflows (see YouTube video “KNIME EXAMPLES Server”). If you are new to a topic, let’s say “churn prediction”, and you are looking for a quick starting point, then you could access the EXAMPLES server in the top left corner inside the KNIME workbench, download the example workflow in 50_Applications/18_Churn_Prediction (50_Applications/18_Churn_Prediction/01_Training_a_Churn_Predictor50_Applications/18_Churn_Prediction/01_Training_a_Churn_Predictor*), and update it to your data and specific business problem. It is very easy and one of the most loved features in the KNIME Analytics Platform.

Read more


25 Aug 2017Vincenzo

We built a workflow to train a model. It works fast enough on our local, maybe not so powerful, machine. So far.

The data set is growing. Each month a considerable number of new records is added. Each month the training workflow becomes slower. Shall we start to think of scalability? Shall we consider big data platforms? Could my neat and elegant KNIME workflow be replicated on a big data platform? Indeed it can.

The KNIME Big Data Extensions offers nodes to build and configure workflows to run on the big data platform of choice. The cool feature of the KNIME Big Data Extensions consists in the nodes GUI. The configuration window for each Big Data node has been built as similar as possible to the configuration window of the corresponding KNIME node. The configuration window of a Spark Joiner node will look exactly the same as the configuration window of a Joiner node.

Thus, it is not only possible to replicate your original workflow on a Big Data Platform, it is also extremely easy, since you do not need to learn new scripts or tools instructions. The KNIME Big Data Extensions brings the ease of use of KNIME into the scalability of Big Data.

This video shows how we replicated an existing classical analytics workflow on a Big Data Platform.

The workflows used in the video can be found on the KNIME EXAMPLES server under 50_Applications/28_Predicting_Departure_Delays/02_Scaling_Analytics_w_BigData50_Applications/28_Predicting_Departure_Delays/02_Scaling_Analytics_w_BigData.knwf*

Read more


21 Aug 2017gnu

Here's a familiar predicament: you have the data you want to analyze, and you have a trained model to analyze them. Now what? How do you deploy your model to analyze your data?

In this video we will look at seven ways of deploying a model with KNIME Analytics Platform and KNIME Server. This list has been prepared with an eye toward where the output of the deployment workflow goes:

  • to a file or database
  • to JSON via REST API
  • to a dashboard via KNIME's WebPortal
  • to a report and to email
  • to SQL execution via SQL recoding
  • to Java byte code execution via Java recoding
  • to an external application

Once you know these options, you will also know which one best satisfies your needs.

The workflows used in the video can be found on the KNIME EXAMPLES server under 50_Applications/27_Deployment_Options50_Applications/27_Deployment_Options*.

Read more


14 Aug 2017rs

Do you remember the Iron Chef battles

It was a televised series of cook-offs in which famous chefs rolled up their sleeves to compete in making the perfect dish. Based on a set theme, this involved using all their experience, creativity, and imagination to transform sometimes questionable ingredients into the ultimate meal.

Hey, isn’t that just like data transformation? Or data blending, or data manipulation, or ETL, or whatever new name is trending now? In this new blog series requested by popular vote, we will ask two data chefs to use all their knowledge and creativity to compete in extracting a given data set's most useful “flavors” via reductions, aggregations, measures, KPIs, and coordinate transformations. Delicious!

Want to find out how to prepare the ingredients for a delicious data dish by aggregating financial transactions, filtering out uninformative features or extracting the essence of the customer journey? Follow us here and send us your own ideas for the “Data Chef Battles” at datachef@knime.com.

Ingredient Theme: Energy Consumption Time Series. Behavioral Measures over Time and Seasonality Index from Auto-Correlation.

Author: Rosaria Silipo
Data Chefs: Haruto and Momoka

Ingredient Theme: Energy Consumption Time Series

Let’s talk today about electricity and its consumption. One of the hardest problems in the energy industry is matching supply and demand. On the one hand, over-production of energy can be a waste of resources; on the other hand, under-production can leave people without the basic commodities of modern life. The prediction of the electrical energy demand at each point in time is therefore a very important chapter in data analytics.

For this reason, a couple of years ago energy companies started to monitor the electricity consumption of each household, store, or other entity, by means of smart meters. A pilot project was launched in 2009 by the Irish Commission for Energy Regulation (CER).

The Smart Metering Electricity Customer Behaviour Trials (CBTs) took place during 2009 and 2010 with over 5,000 Irish homes and businesses participating. The purpose of the trials was to assess the impact on consumers’ electricity consumption, in order to inform the cost-benefit analysis for a national rollout. Electric Ireland residential and business customers and Bord Gáis Energy business customers who participated in the trials, had an electricity smart meter installed in their homes or on their premises and agreed to take part in research to help establish how smart metering can help shape energy usage behaviors across a variety of demographics, lifestyles, and home sizes. The trials produced positive results.  The reports are available from CER (Commission for Energy Regulation) along with further information on the Smart Metering Project. In order to get a copy of the data set, fill out this request form and email it to ISSDA.

The data set is just a very long time series: one column covers the smart meter ID, one column the time, and one column the amount of electricity used in the previous 30 minutes. The time is expressed in number of minutes from 01.01.2009 : 00.00 and has to be transformed back to one of the classic date/time formats, like for example dd.MM.yyyy : HH.mm. The original sampling rate, at which the used energy is measured, is every 30 minutes.

The first data transformations, common to all data chefs, involve the date/time conversion and the extraction of year, month, day of month, day of week, hour, and minute from the raw date.

Topic. Energy Consumption Time Series

Challenge. From time series to behavioral measures and seasonality

Methods. Aggregations at multiple levels, Correlation

Data Manipulation Nodes. GroupBy, Pivoting, Linear Correlation, Lag Column

Read more


07 Aug 2017thor

If you are a KNIME Server customer you probably noticed that the changelog file for the KNIME Server 4.5 release was rather short compared to the one in previous releases. This means by no means that we were lazy! Together with introducing new features and improving existing features, we also started working on the next generation of KNIME Servers. You can see a preview of what is there to come in the so-called distributed executors. In this article I will explain what a distributed executor is and how it can be useful to you. I will also provide some technical details for the geeks among you and finally I will give you a rough timeline for the distributed executors' final release.

Read more


31 Jul 2017greglandrum

As part of the v3.4 release of KNIME Analytics Platform, we rewrote the Python extensions and added support for Python 3 as well as Python 2. Aside from the Python 3 support, the new nodes aren’t terribly different from a user perspective, but the changes to the backend give us more flexibility for future improvements to the integration. This blog post provides some advice on how to set up a Python environment that will work well with KNIME as well as how to tell KNIME about that environment.

The Python Environment

We recommend using the Anaconda Python distribution from Continuum Analytics. There are many reasons to like Anaconda, but the important things here are that it can be installed without administrator rights, supports all three major operating systems, and provides all of the packages needed for working with KNIME “out of the box”.

Get started by installing Anaconda from the link above. You’ll need to choose which version of Python you prefer (we recommend that you use Python 3 if possible) but this just affects your default Python environment; you can create environments with other Python versions without doing a new install. For example, if I install Anaconda3 I can still create Python 2 environments.

Read more


24 Jul 2017amartin

In this blog series we’ll be experimenting with the most interesting blends of data and tools. Whether it’s mixing traditional sources with modern data lakes, open-source devops on the cloud with protected internal legacy tools, SQL with noSQL, web-wisdom-of-the-crowd with in-house handwritten notes, or IoT sensor data with idle chatting, we’re curious to find out: will they blend? Want to find out what happens when IBM Watson meets Google News, Hadoop Hive meets Excel, R meets Python, or MS Word meets MongoDB?

Follow us here and send us your ideas for the next data blending challenge you’d like to see at willtheyblend@knime.com.

Today: A Recipe for Delicious Data: Mashing Google and Excel Sheets

The Challenge

Don’t be confused! This is not one of the data chef battles, but  a “Will they blend?” experiment - which, just by chance, happens to be on a restaurant theme again.

A local restaurant has been running its business relatively successfully for a few years now. It is a small business. An Excel Sheet was enough for the full accounting in 2016. To simplify collaboration, the restaurant owner decided to start using Google Sheets at the beginning of 2017. Now (2017 with Google Sheets) she faces the same task every month of calculating the monthly and YTD revenues and comparing them with the corresponding prior-year values (2016 with Microsoft Excel). 

The technical challenge at the center of this experiment is definitely not a trivial matter: mashing the data from the Excel and Google spreadsheets into something delicious… and digestible. Will they blend?

Topic. Monthly and YTD revenue figures for a small local business.

Challenge. Blend together Microsoft Excel and Google Sheets.

Access Mode. Excel Reader and REST Google API for private and public documents.

Read more


Subscribe to KNIME news, usage, and development