In this blog series we’ll be experimenting with the most interesting blends of data and tools. Whether it’s mixing traditional sources with modern data lakes, open-source devops on the cloud with protected internal legacy tools, SQL with noSQL, web-wisdom-of-the-crowd with in-house handwritten notes, or IoT sensor data with idle chatting, we’re curious to find out: will they blend? Want to find out what happens when IBM Watson meets Google News, Hadoop Hive meets Excel, R meets Python, or MS Word meets MongoDB?
Today: IBM Watson meets Google API
The Challenge
It’s said that you should always get your news from several, different sources, then compare and contrast to form your own independent opinion. At the point of this writing we are only days away from the US election and all the news we could find are articles about the election race between Hillary Clinton and Donald Trump. Our question then is: What has Obama been doing?
So what about blending IBM Watson’s News Service with Google News to find out?
My free subscription for IBM Watson is running out in a few days, so I’d better hurry up to experiment how I can use it inside KNIME Analytics Platform. Google News on the other side is a free limited service.
Let’s see what happens when we blend IBM Watson News and Google News within KNIME Analytics Platform. Shall we?
Topic. Barack Obama in the news.
Challenge. Extract and combine news headlines from Google News and IBM Watson News.
Access Mode. REST service for both Google API and IBM Watson.
The Experiment
Querying IBM Watson News Service
- Create an account with IBM Watson Console at https://console.ng.bluemix.net (free for 30 days) and enable it for the Alchemy category, since the News service is part of the Alchemy category.
- Identify the API Key that was produced with your registration.
- Define the template REST request for IBM Watson. In the case of the News service, it would look something like this: https://access.alchemyapi.com/calls/data/GetNews?apikey=<API_Key>&return=enriched.url.title&start=1477008000&end=now&q.enriched.url.cleanedTitle=InsertTitle&count=100&outputMode=xml
Where:
- access.alchemyapi.com/calls/data/GetNews is the REST service
- <API_Key> is the API key you got at registration
- return specifies which part of the news we get back, in this case just the title
- start and end indicate respectively start and end of time window (in UTC seconds) for news search
- q.enriched.url.cleanedTitle is the topic string (InsertTitle is just a placeholder. It would be Barack+Obama in our case)
- count is the maximum number of returned results
- outputMode is the response format: XML or JSON
Notice that <API_Key> and InsertTitle are just placeholders for the real values. <API_Key> would be the API Key you got at registration and InsertTitle would be the search topic, in our case Barack+Obama.
- We used Quickform String Input nodes to define two flow variables: the search topic and the API Key.
- The template REST request was written into a Table Creator node and passed to a String Manipulation node to override the search topic and API Key with the corresponding flow variable values
- The request is then sent to IBM Watson News through the GET Request node
- The response is received back in XML format as selected in the query and parsed with an XPath node
Now we’ll send a similar request to the Google News API service.
- Create an account with Google Console at https://console.developers.google.com and enable it for the Custom Search category. There used to be a Google News API but this seems not to be available anymore. So we will query the Custom Search service for a specific topic on the Google News page.
- Identify the API Key that was produced with your registration.
- Create a custom search engine at https://cse.google.com to be used in the Custom Search query and remember its engine ID or its cx parameter.
- Define the template REST request for Google News. In the case of the custom search service on the Google News page, it would look something like this:
https://www.googleapis.com/customsearch/v1?q=InsertTitle&cref=News+google&cx=<Your-cx-id.number>&key={YOUR_API_KEY}
Where:
- googleapis.com/customsearch/v1 is the REST service
- key is the API key you got at registration
- q is the topic string (InsertTitle is just a placeholder. It would be Barack+Obama in our case)
- cref is the name of the custom search engine we created
- cx is the code for the custom search engine we created
Notice that {YOUR_API_KEY} and InsertTitle are just placeholders for the real values. {YOUR_API_KEY} would be the API Key value you got and InsertTitle would be the search topic, in our case Barack+Obama.
Notice also that you need to insert your own cx value in the template REST request.
- We used Quickform String Input nodes to define two flow variables: the search topic and the API Key.
- The template REST request was written into a Table Creator node and passed to a String Manipulation node to override the topic and the API Key with the corresponding flow variable values
- The request is then sent to the Google API service through the GET Request node
- The response is received back in JSON format and parsed with the help of a JSON Path and of an Ungroup node.
The results from both queries are then labeled with their source value (Google News or IBM Watson News) and concatenated together to form a new data table.
(click on the image to see it in full size)
Hint: You could also encapsulate and encrypt the String Input nodes containing the API keys. In this way, the end user would be able to run the workflow and query the two REST services, but could not use the API keys for anything else. Since most of you will not have such a KNIME tool, the workflow has been uploaded without encryption and without API keys to the EXAMPLES server under 01_Data_Access/05_REST_Web_Services. The API keys will be your responsibility.
Note: In this workflow we also blend data in XML format with data in JSON format. But this will be the topic for another “Will they blend?” challenge.
The workflow is available, without the API keys, on the KNIME EXAMPLES server under 01_Data_Access/05_REST_Web_Services/02_IBM_Watson_News-Google_News01_Data_Access/05_REST_Web_Services/02_IBM_Watson_News-Google_News*.
The Results
Yes, they blend!
All in all, we got back 100 headlines from IBM Watson News as the maximum number we had specified, and 10 headlines from Google News - the maximum number the Google API free service allows us to retrieve.
From those headlines we can see that President Barack Obama was very active around Halloween, he can apparently predict the outcome of the NBA tournament, and of course he is actively campaigning with celebrities and politicians. Some of the articles are actually about his wife, Michelle. A somewhat surprising article is the one that talks about Obama’s dietary habits “How to eat like Barack Obama”. Given that he weathered his years as president remarkably well, maybe it’s worth a look?
If we check the intersection of headlines from IBM Watson News and Google News with a Reference Row Filter node, we find that none of the 10 headlines recovered by Google News were also present in the results of the IBM Watson News. It seems, that diversifying news sources in the digital era still makes sense.
But the most important conclusion is: Yes, they blend!
Coming Next …
If you enjoyed this, please share it generously and let us know your ideas for future blends.
We’re looking forward to the next challenge. There we will find out if we can blend Open Street Map with Google Geocode API passing through a CSV text file.
* The link will open the workflow directly in KNIME Analytics Platform (requirements: Windows; KNIME Analytics Platform must be installed with the Installer version 3.2.0 or higher)