Will They Blend? Experiments in Data & Tool Blending. Today: A Recipe for Delicious Data – Part 2: The new Google Sheets Nodes

Mon, 01/08/2018 - 12:32 admin

In this blog series we’ll be experimenting with the most interesting blends of data and tools. Whether it’s mixing traditional sources with modern data lakes, open-source devops on the cloud with protected internal legacy tools, SQL with noSQL, web-wisdom-of-the-crowd with in-house handwritten notes, or IoT sensor data with idle chatting, we’re curious to find out: will they blend? Want to find out what happens when IBM Watson meets Google News, Hadoop Hive meets Excel, R meets Python, or MS Word meets MongoDB?

Follow us here and send us your ideas for the next data blending challenge you’d like to see at willtheyblend@knime.com.

Today: A Recipe for Delicious Data – Part 2: The new Google Sheets Nodes

Authors: Rene Damyon and Oleg Yasnev

Post Update!

This is the updated version of the original blog post “A Recipe for Delicious Data: Mashing Google and Excel Sheets”, using the new Google Sheets nodes available in KNIME Analytics Platform 3.5.

The Challenge

Remember this blog post from July 2017?

A local restaurant has been keeping track of its business on Excel in 2016 and moved to Google Sheets in 2017. The challenge was then to include data from both sources to compare business trends in 2016 and in 2017, both as monthly total and Year To Date (YTD) revenues.

The technical challenge of this experiment was then of the “Will they blend?” type: mashing the data from the Excel and Google spreadsheets into something delicious… and digestible. The data blending was indeed possible and easy for public Google Sheets. However, it became more cumbersome for private Google Sheets, by requiring a few external steps for user authentication.

From the experience of such a blog post, a few Google Sheets dedicated nodes have been built and released with the new KNIME Analytics 3.5. A number of new nodes indeed are now available to connect, read, write, update, and append cells, rows, and columns into a private or public Google Sheet.

The technical challenge then has become easier: accessing Google Sheets with these new dedicated nodes and mashing the data with data from an Excel Sheet. Will they blend?

Topic. Monthly and YTD revenue figures for a small local business.

Challenge. Retrieve data from Google Sheets using the new Google Sheets nodes available in KNIME Analytics Platform 3.5.

Access Mode. Excel Reader node and Google Sheets Reader node for private and public documents.

Before diving into the technicalities, let’s spend a few words on the data. Both the Excel file and the Google spreadsheet contain the restaurant bill transactions with:

  • date and time (DATE_TIME),
  • table number (TABLE)
  • bill amount (SUM)

The Experiment

The Excel Spreadsheet can be read easily with an Excel Reader node (see blog post: “Will they blend? Hadoop Hive meets Excel”).

Accessing a Google Sheet Document

From a web browser:

  • Create a Google account at www.google.com (if you do not have one already). This account will be used for user authentication to access the Google Sheets document.
  • Open the Google Sheets document in a browser while logged into that Google account. This will allow the Google Sheets Reader node to list it alongside other documents when configuring the node.

Now in the workflow:

  • First, we need to authenticate a Google account using the Google Sheets Interactive Service Provider node.
  • Then we need to select and read the document with the Google Sheets Reader node. Here we can also, specify the range of data cells to be read. At the output port the content of the specified data cells is provided.

We are using just two Google Sheets nodes: Google Sheets Interactive Service Provider - to connect to Google’s services and store our authorization credentials - and Google Sheets Reader - to select the document and return the selected data.

Note. We use the same step sequence as well as the same nodes to access a public or a private Google Sheet document. There is no need any more to differentiate the access procedure!

The final workflow, to access data from public and private Google Sheets, is shown in figure 1 and can be downloaded from the KNIME EXAMPLES server from 01_Data_Access/01_Common_Type_Files /08_Google_Sheet-meets-Excel01_Data_Access/01_Common_Type_Files /08_Google_Sheet-meets-Excel*

Figure 1. This workflow blends together data from an Excel spreadsheet and data from a public or private Google Sheet document. This workflow is available on the KNIME EXAMPLES server under 01_Data_Access/01_Common_Type_Files /08_Google_Sheet-meets-Excel01_Data_Access/01_Common_Type_Files /08_Google_Sheet-meets-Excel*.

Authentication of Google User

The node to authenticate the Google user is the Google Sheets Interactive Service Provider node. This node connects to Google services and authenticates the user via the Google authentication page.

If you open the node configuration window, just click the button named (Re-)Authentication (Fig. 2). This takes you to the Google authentication page on your web browser. If you insert here your credentials, the node can then try to establish a connection with Google services.

If connection was successful, the (Re-)Authentication button will turn some shade of green: full green on Mac, green contour on Windows.

These authentication credentials can be saved as part of the workflow (option “Default”) or they can be persisted to a separate file to share with other workflows (option “Custom”).

Note. When exporting your workflow or when in need to make the workflow forget the authentication credentials, just click the button named Forget Default Credentials.

Figure 2. The configuration dialog for the Google Sheets Interactive Service Provider node

Selecting and Reading the Google Sheet Document

Connecting a Google Sheets Reader node with our Google Sheets Interactive Service Provider node allows to select the document and, if needed, also a particular sheet or a range of data cells.

Configuration is as simple as clicking the Select button to display a list of available documents and choosing the one we want. The “Open in Browser” option in the configuration window of the Google Sheets Reader node opens the document on a web browser for a quick inspection.

Remember! If a document doesn’t appear in this list, make sure that you have permissions to access it and that you’ve opened it once within a browser to associate it with your Google account.

For this experiment, we accessed and read the following Google Sheet document Google Sheets sample data: Google Sheets Public Data 2017.

We know that our data has header information in the first row of the document. Selecting “Has Column Header” will set these values as the names of the columns in the node output.

Figure 3. The configuration dialog for the Google Sheets Reader node

The rest of the workflow calculates the revenues by month as total sum and as Year To Date (YTD) cumulative sum. While the monthly total sum is calculated with a GroupBy node, the YTD cumulative sum is calculated with a Moving Aggregation node.

The Results

The two bar charts below show the restaurant revenues in euros as a total monthly sum and as a Year To Date (YTD) monthly cumulative sum respectively, for both groups of transactions in 2016 (light orange, from the Excel file) and in 2017 (darker orange, from the Google Sheet document).

We are happy to see that the small restaurant we used as an example has increased its business sales in 2017 with respect to the same months in 2016.

We are also happy to see that new Google Sheets nodes available now in KNIME Analytics Platform 3.5!

In this experiment we retrieved and blended data from an Excel spreadsheet and a Google Sheet document. Our Excel spreadsheets and Google Sheets documents blended easily, to produce a delicious dish for our restaurant business.

Figure 4. Total monthly revenues for our restaurant in year 2016 (light orange on the left) and in year 2017 (darker orange on the right). Business in 2017 seems to be better than in 2016.

Figure 5. Cumulative YTD revenues for our restaurant in year 2016 (light orange on the left) and in year 2017 (darker orange on the right). Also here, business in 2017 seems to be better than in 2016.

So, if you are asked by your friend running a local business whether you can blend data from Excel spreadsheets and Google Sheet documents, the answer is: Yes, they blend … and please use the new Google Sheets nodes!

Coming Next

If you enjoyed this, please share it generously and let us know your ideas for future blends.


* The link will open the workflow directly in KNIME Analytics Platform (requirements: Windows; KNIME Analytics Platform must be installed with the Installer version 3.2.0 or higher)