Table of content

Will They Blend? Experiments in Data & Tool Blending. Today: BIRT meets Tableau and JavaScript. How was the Restaurant?

Mon, 06/11/2018 - 10:00 ScottF

In this blog series we’ll be experimenting with the most interesting blends of data and tools. Whether it’s mixing traditional sources with modern data lakes, open-source devops on the cloud with protected internal legacy tools, SQL with noSQL, web-wisdom-of-the-crowd with in-house handwritten notes, or IoT sensor data with idle chatting, we’re curious to find out: will they blend? Want to find out what happens when IBM Watson meets Google News, Hadoop Hive meets Excel, R meets Python, or MS Word meets MongoDB?

Follow us here and send us your ideas for the next data blending challenge you’d like to see at willtheyblend@knime.com.

Today: BIRT meets Tableau and JavaScript. How was the Restaurant?

The Challenge

How would your favorite restaurant perform in a health inspection? Are you sure you are eating in one of the most reliable and safest restaurants in town? If you live in Austin, Texas, we can check.

Over the last three calendar years, data.austintexas.gov has published a dataset of restaurant inspection scores for Austin (TX) restaurants. Inspection scores range between 0 (theoretically) and 100. An inspection score lower than 70 requires corrective measures; repeated low scores may even necessitate the closure of the restaurant.

We will use this dataset to visually explore:

  • How many restaurants have been inspected in each ZIP code area;
  • Of those restaurants how many have scored between 70 and 80, 80 and 90, and 90 and 100, as well as perfect 100 scores;
  • And finally, the average scores for ZIP code locations in the Austin area.

For each one of these goals we will use a:

  • pie chart;
  • grouped bar chart;
  • geolocation map.

So far so good. Now we need to choose the graphical tool for such representations.

Since data manipulation within a reporting environment might be clunky, many KNIME users prefer to run the data manipulation comfortably from KNIME Analytics Platform, and later export the data to their preferred reporting tool.

There are many ways to produce graphics in a report with KNIME Analytics Platform. Today we will consider three: with native graphics in BIRT, with JavaScript based images exported to BIRT, with native graphics in Tableau.

BIRT (Business Intelligence Reporting Tool) is a reporting tool, which to a certain extent, is distributed as an open source tool. The KNIME Report Designer extension integrates BIRT within KNIME Analytics Platform. The “Open Report” button in the KNIME tool bar takes you to the BIRT report environment. The “KNIME” button in the BIRT report environment takes you back to the KNIME workbench.

Tableau Desktop is a reporting tool, which requires a commercial license. 14-day trial licenses are also available at the Tableau Desktop site. The KNIME Tableau extension allows you to communicate with Tableau Desktop via TDE files or directly with the Tableau Server.

It is also possible to produce the graphic image in the KNIME workflow via the JavaScript nodes and subsequently export the image to the reporting tool of your choice.

In today’s Will They Blend, we want to highlight the process of generating a few simple graphics using BIRT, JavaScript, and Tableau. Can we produce three simple charts using those tools? What might that involve? Let’s find out!

Topic. Data Visualization using BIRT, Tableau, and JavaScript

Challenge. Create a Pie Chart, a Bar Chart, and a geo-location map using BIRT, Tableau, and JavaScript

Access Mode. KNIME Tableau extension, Report Designer extension, JavaScript Views extension, Open Street Map Integration extension

Will They Blend? Experiments in Data & Tool Blending. Today: The GDPR Force Meets Customer Intelligence – Is It The Dark Side?

Tue, 05/22/2018 - 15:17 admin

In this blog series we’ll be experimenting with the most interesting blends of data and tools. Whether it’s mixing traditional sources with modern data lakes, open-source devops on the cloud with protected internal legacy tools, SQL with noSQL, web-wisdom-of-the-crowd with in-house handwritten notes, or IoT sensor data with idle chatting, we’re curious to find out: will they blend? Want to find out what happens when IBM Watson meets Google News, Hadoop Hive meets Excel, R meets Python, or MS Word meets MongoDB?

Follow us here and send us your ideas for the next data blending challenge you’d like to see at willtheyblend@knime.com.

Today: The GDPR Force Meets Customer Intelligence – Is It The Dark Side?

Authors: Rosaria Silipo and Phil Winters

The Challenge

The European Union is introducing the General Data Protection Regulation on May 25 2018. This new law has been called the world’s strongest and most far-reaching law aimed at strengthening citizens' fundamental rights in the digital age. The new law applies to any organization processing personal data about a citizen or resident in the EU regardless of that organizations’ location. It is definitely a force to be reckoned with as the fines for non-compliance are extremely high.

Many of the law’s Articles deal with how personal data are collected and processed and include a special emphasis, guidelines and restrictions on what the EU calls “automatic profiling with personal data”. So what has that to do with us data scientists? Many of us use methods and machine learning to create new fact-based insight from the customer or personal data that we collect so that we can take decisions, possibly automatically. So the law applies to us. But that Customer Intelligence we create is fundamental for the successful running of our organizations business.

Does this mean this new force will put limits on what we do? Or even worse, does it mean that we have to go over to the dark side to help our businesses? Absolutely not! The new law defines that we must gain permission, perform tests, and document. In no way does it restrict what honest data scientists can do. If anything, it provides us with opportunities.

Topic. GDPR meets Customer Intelligence

Challenge. Apply machine learning algorithms to create new fact-based insights taking the new GDPR into account

Will They Blend? Experiments in Data & Tool Blending. Today: Chinese meets English meets Thai meets German meets Italian meets Arabic meets Farsi meets Russian. Around the world in eight languages

Mon, 02/19/2018 - 09:26 admin

In this blog series we’ll be experimenting with the most interesting blends of data and tools. Whether it’s mixing traditional sources with modern data lakes, open-source devops on the cloud with protected internal legacy tools, SQL with noSQL, web-wisdom-of-the-crowd with in-house handwritten notes, or IoT sensor data with idle chatting, we’re curious to find out: will they blend? Want to find out what happens when IBM Watson meets Google News, Hadoop Hive meets Excel, R meets Python, or MS Word meets MongoDB?

Follow us here and send us your ideas for the next data blending challenge you’d like to see at willtheyblend@knime.com.

Today: Chinese meets English meets Thai meets German meets Italian meets Arabic meets Farsi meets Russian. Around the world in eight languages

Authors: Anna Martin, Hayley Smith, and Mallika Bose

The Challenge

No doubt you are familiar with the adventure novel “Around the World in 80 Days” in which British gentleman Phileas Fogg makes a bet that he can circumnavigate the world in 80 days. Today we will be attempting a similar journey. However, ours is unlikely to be quite as adventurous as the one Phileas made. We won’t be riding Elephants across the Indian mainland, nor rescuing our travel companion from the circus. And we certainly won’t be getting attacked by Native American Sioux warriors!

Our adventure will begin from our offices on the Lake of Constance in Germany. From there we will travel down to Italy, stopping briefly to see the Coliseum. Then across the Mediterranean to see the Pyramids of Egypt and on through the Middle East to the ancient city of Persepolis. After a detour via Russia to see the Red Square in Moscow, our next stop will be the serene beaches of Thailand for a short break before we head off to walk the Great Wall of China (or at least part of it). On the way home, we will stop in and say hello to our colleagues in the Texas office.

Like all good travelers, we want to stay up-to-date with the news the entire time. Our goal is to read the local newspapers … in the local language of course! This means reading news in German, Italian, Arabic, Farsi, Chinese, Russian, Thai, and lastly, English. Impossible you say? Well, we’ll see.

The real question is: will all those languages blend?

Topic. Blending news in different languages

Challenge. Will the Text Processing nodes support all the different encodings?

Access Mode. Text Processing nodes and RSS Feed Reader node

Will They Blend? Experiments in Data & Tool Blending. Today: A Recipe for Delicious Data – Part 2: The new Google Sheets Nodes

Mon, 01/08/2018 - 12:32 admin

In this blog series we’ll be experimenting with the most interesting blends of data and tools. Whether it’s mixing traditional sources with modern data lakes, open-source devops on the cloud with protected internal legacy tools, SQL with noSQL, web-wisdom-of-the-crowd with in-house handwritten notes, or IoT sensor data with idle chatting, we’re curious to find out: will they blend? Want to find out what happens when IBM Watson meets Google News, Hadoop Hive meets Excel, R meets Python, or MS Word meets MongoDB?

Follow us here and send us your ideas for the next data blending challenge you’d like to see at willtheyblend@knime.com.

Today: A Recipe for Delicious Data – Part 2: The new Google Sheets Nodes

Authors: Rene Damyon and Oleg Yasnev

Post Update!

This is the updated version of the original blog post “A Recipe for Delicious Data: Mashing Google and Excel Sheets”, using the new Google Sheets nodes available in KNIME Analytics Platform 3.5.
 

The Challenge

Remember this blog post from July 2017?

A local restaurant has been keeping track of its business on Excel in 2016 and moved to Google Sheets in 2017. The challenge was then to include data from both sources to compare business trends in 2016 and in 2017, both as monthly total and Year To Date (YTD) revenues.

The technical challenge of this experiment was then of the “Will they blend?” type: mashing the data from the Excel and Google spreadsheets into something delicious… and digestible. The data blending was indeed possible and easy for public Google Sheets. However, it became more cumbersome for private Google Sheets, by requiring a few external steps for user authentication.

From the experience of such a blog post, a few Google Sheets dedicated nodes have been built and released with the new KNIME Analytics 3.5. A number of new nodes indeed are now available to connect, read, write, update, and append cells, rows, and columns into a private or public Google Sheet.

The technical challenge then has become easier: accessing Google Sheets with these new dedicated nodes and mashing the data with data from an Excel Sheet. Will they blend?

Topic. Monthly and YTD revenue figures for a small local business.

Challenge. Retrieve data from Google Sheets using the new Google Sheets nodes available in KNIME Analytics Platform 3.5.

Access Mode. Excel Reader node and Google Sheets Reader node for private and public documents.

Will They Blend? Experiments in Data & Tool Blending. Today: SparkSQL meets HiveQL. Women, Men, and Age in the State of Maine

Mon, 12/11/2017 - 10:20 admin

In this blog series we’ll be experimenting with the most interesting blends of data and tools. Whether it’s mixing traditional sources with modern data lakes, open-source devops on the cloud with protected internal legacy tools, SQL with noSQL, web-wisdom-of-the-crowd with in-house handwritten notes, or IoT sensor data with idle chatting, we’re curious to find out: will they blend? Want to find out what happens when IBM Watson meets Google News, Hadoop Hive meets Excel, R meets Python, or MS Word meets MongoDB?

Follow us here and send us your ideas for the next data blending challenge you’d like to see at willtheyblend@knime.com.

Today: SparkSQL meets HiveQL. Women, Men, and Age in the State of Maine

Authors: Rosaria Silipo and Anna Martin

The Challenge

After seeing the foliage in Maine, I seriously gave a thought of moving up there in the beauty of nature and in the peace of a quieter life. I then started doing some research on Maine, its economy and its population.

As it happens, I do have the sampled demographics data for the state of Maine for the years 2009-2014, as part of the CENSUS dataset.

I have the whole CENSUS dataset stored on a Apache Hive installation on a Cloudera cluster running on the Amazon cloud. It could then be processed on Apache Hive or on Apache Spark using the KNIME Big Data Extensions.

News!!! KNIME Big Data Extensions have been open sourced with the last release of KNIME Analytics Platform 3.5. All Big Data nodes in the Node Repository now require no license to run. Check the “What’s new in KNIME 3.5” page for more details on the new release.

KNIME Big Data Extensions offer a variety of nodes to execute Apache Spark or Apache Hive scripts. Hive execution relies on the nodes for in-database processing. Spark execution has its dedicated nodes. However, it also provides an SQL integration to run SQL queries on the Apache Spark execution engine.

We set our goal here to investigate the age distribution of Maine residents, men and women, using SQL queries. On Apache Hive or on Apache Spark? Why not both? We could use SparkSQL to extract men’s age distribution and HiveQL to extract women’s age distribution. We could then compare the two distributions and see if they show any difference.

But the main question, as usual, is: will SparkSQL queries and HiveQL queries blend?

Topic. Age distribution for men and women in the US state of Maine

Challenge. Blend results from Hive SQL and Spark SQL queries.

Access Mode. Apache Spark and Apache Hive nodes for SQL processing

Will They Blend? Experiments in Data & Tool Blending. Today: Google Big Query meets SQLite. The Business of Baseball Games

Mon, 11/13/2017 - 10:16 admin

In this blog series we’ll be experimenting with the most interesting blends of data and tools. Whether it’s mixing traditional sources with modern data lakes, open-source devops on the cloud with protected internal legacy tools, SQL with noSQL, web-wisdom-of-the-crowd with in-house handwritten notes, or IoT sensor data with idle chatting, we’re curious to find out: will they blend? Want to find out what happens when IBM Watson meets Google News, Hadoop Hive meets Excel, R meets Python, or MS Word meets MongoDB?

Follow us here and send us your ideas for the next data blending challenge you’d like to see at willtheyblend@knime.com.

Today: Google Big Query meets SQLite. The Business of Baseball Games

Author: Dorottya Kiss, EPAM

The Challenge

They say if you want to know American society, first you have to learn baseball. As reported in a New York Times article, America had baseball even in times of war and depression, and it still reflects American society. Whether it is playing, watching, or betting on the games, baseball is in some way always connected to the lives of Americans.

According to Accuweather, different weather conditions play a significant role in determining the outcome of a baseball game. Air temperature influences the trajectory of the baseball; air density has an impact on the distance covered by the ball; temperature influences the pitcher’s grip; cloud coverage affects the visibility of the ball; and wind conditions - and weather in general - have various degrees of influence on the physical wellbeing of the players.

Another interesting article on Crowdhitter describes the fans’ attendance of the games and how this affects the home team’s success. Fan attendance at baseball games is indeed a key factor, in terms of both emotional and monetary support. So, what are the key factors determining attendance? On a pleasant day are they more likely to show up in the evening or during the day, or does it all just depend on the opposing team?

Some time ago we downloaded the data about attendance at baseball games for the 2016 season from Google’s Big Query Public data set and stored them on our own Google Big Query database. For the purpose of this blending experiment we also downloaded data about the weather during games from Weather Underground and stored these data on a SQLite database.

The goal of this blending experiment is to merge attendance data at baseball games from Google Big Query with weather data from SQLite. Since we have only data about one baseball season, it will be hard to train a model for reliable predictions of attendance. However, we have enough data for a multivariate visualization of the various factors influencing attendance.

Topic. Multivariate visual investigation of weather influence on attendance of baseball games.

Challenge. Blend attendance data from Google Big Query and weather data from SQLite.

Access Mode. Database Connector node with Simba 4.2 JDBC driver compatible with access to Google Big Query and dedicated SQLite Connector node.

Will They Blend? Experiments in Data & Tool Blending. Today: Finnish meets Italian and Portuguese through the Google Translate API. Preventing weather from getting lost in translation

Mon, 10/09/2017 - 11:18 admin

In this blog series we’ll be experimenting with the most interesting blends of data and tools. Whether it’s mixing traditional sources with modern data lakes, open-source devops on the cloud with protected internal legacy tools, SQL with noSQL, web-wisdom-of-the-crowd with in-house handwritten notes, or IoT sensor data with idle chatting, we’re curious to find out: will they blend? Want to find out what happens when IBM Watson meets Google News, Hadoop Hive meets Excel, R meets Python, or MS Word meets MongoDB?

Follow us here and send us your ideas for the next data blending challenge you’d like to see at willtheyblend@knime.com.

Today: Finnish meets Italian and Portuguese through the Google Translate API. Preventing weather from getting lost in translation

Will They Blend? Experiments in Data & Tool Blending. Today: SugarCRM meets Salesforce. Crossing Accounts and Opportunities

Mon, 09/25/2017 - 10:30 RolandBurger

In this blog series we’ll be experimenting with the most interesting blends of data and tools. Whether it’s mixing traditional sources with modern data lakes, open-source devops on the cloud with protected internal legacy tools, SQL with noSQL, web-wisdom-of-the-crowd with in-house handwritten notes, or IoT sensor data with idle chatting, we’re curious to find out: will they blend? Want to find out what happens when IBM Watson meets Google News, Hadoop Hive meets Excel, R meets Python, or MS Word meets MongoDB?

Follow us here and send us your ideas for the next data blending challenge you’d like to see at willtheyblend@knime.com.

Today: SugarCRM meets Salesforce. Crossing Accounts and Opportunities

The Challenge

Businesses use Customer Relationship Management (CRM) systems to keep track of all their customer related activities – creating leads and opportunities, managing contacts and accounts, sending quotes and invoices, etc. As long as it is somehow related to the stream of revenue, it is (or at least should be) stored in a CRM system.

Since there is more than one CRM solution on the market, there is a distinct chance that your organization uses multiple CRM platforms. While there might be sound reasons for this, it also poses a significant challenge: How do you combine data from several platforms? How do you generate a single, consolidated report that shows you how well the sales activities of your whole company are going?

One option is to export some tables, fire up your spreadsheet software of choice, and paste the stuff together. Then do the same thing next week. And the week after. And the week after that one (you get the point). Doesn’t sound too enticing? Fear not! This is KNIME, and one of our specialties is to save you the frustration of doing things manually. Fortunately, both SugarCRM and Salesforce allow their users to access their services via REST API, and that is exactly what we are going to do in this blog post.

There are a couple of prerequisites here. First of all, you obviously need accounts for SugarCRM and Salesforce. If you don’t have them but still want to try this yourself, you’ll be happy to see that both companies offer free trial licenses:

https://info.sugarcrm.com/trial-crm-software.html?utm_source=crmsoftware&utm_medium=referral&utm_campaign=crmsoftware-review

https://developer.salesforce.com/signup

You can learn more about how to use the REST APIs of SugarCRM and Salesforce here:

http://support.sugarcrm.com/Documentation/Sugar_Developer/Sugar_Developer_Guide_7.9/Integration/Web_Services/v10/

https://developer.salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/intro_what_is_rest_api.htm

Topic. Get a consolidated view of all customer data from two separate platforms

Challenge. Query data from SugarCRM and Salesforce via their APIs

Access Mode. KNIME REST Web Services

Will They Blend? Experiments in Data & Tool Blending. Today: A Recipe for Delicious Data: Mashing Google and Excel Sheets

Mon, 07/24/2017 - 10:47 amartin

In this blog series we’ll be experimenting with the most interesting blends of data and tools. Whether it’s mixing traditional sources with modern data lakes, open-source devops on the cloud with protected internal legacy tools, SQL with noSQL, web-wisdom-of-the-crowd with in-house handwritten notes, or IoT sensor data with idle chatting, we’re curious to find out: will they blend? Want to find out what happens when IBM Watson meets Google News, Hadoop Hive meets Excel, R meets Python, or MS Word meets MongoDB?

Follow us here and send us your ideas for the next data blending challenge you’d like to see at willtheyblend@knime.com.

Today: A Recipe for Delicious Data: Mashing Google and Excel Sheets

A newer version of this blog post and workflow is available at https://www.knime.com/blog/GoogleSheet-meets-Excel-part2 using the new Google Sheets nodes available with KNIME Analytics Platform 3.5. These nodes make accessing the Google Sheets (private or public) a much easier task!

The Challenge

Don’t be confused! This is not one of the data chef battles, but  a “Will they blend?” experiment - which, just by chance, happens to be on a restaurant theme again.

A local restaurant has been running its business relatively successfully for a few years now. It is a small business. An Excel Sheet was enough for the full accounting in 2016. To simplify collaboration, the restaurant owner decided to start using Google Sheets at the beginning of 2017. Now (2017 with Google Sheets) she faces the same task every month of calculating the monthly and YTD revenues and comparing them with the corresponding prior-year values (2016 with Microsoft Excel). 

The technical challenge at the center of this experiment is definitely not a trivial matter: mashing the data from the Excel and Google spreadsheets into something delicious… and digestible. Will they blend?

Topic. Monthly and YTD revenue figures for a small local business.

Challenge. Blend together Microsoft Excel and Google Sheets.

Access Mode. Excel Reader and REST Google API for private and public documents.

Will They Blend? Experiments in Data & Tool Blending. Today: OCR on Xerox Copies meets Semantic Web. Have Evolutionary Theories changed?

Mon, 07/03/2017 - 11:04 Dario Cannone

In this blog series we’ll be experimenting with the most interesting blends of data and tools. Whether it’s mixing traditional sources with modern data lakes, open-source devops on the cloud with protected internal legacy tools, SQL with noSQL, web-wisdom-of-the-crowd with in-house handwritten notes, or IoT sensor data with idle chatting, we’re curious to find out: will they blend? Want to find out what happens when IBM Watson meets Google News, Hadoop Hive meets Excel, R meets Python, or MS Word meets MongoDB?