Querying Google Analytics in KNIME

Mon, 10/06/2014 - 00:00 winter

The KNIME Google API extension (since version 2.10) allows for the connection and interaction of KNIME with Google APIs. For now nodes are provided to request and load data from Google Analytics.

 

Setup

To connect to the Google Analytics API you need to have a Google account that has access to Google Analytics. Furthermore you need to setup a service account and configure the Google API Connector and Google Analytics API Connector node. The following steps will walk you through this process:
1. Go to https://console.developers.google.com/project
2. Click 'Create Project'
3. Fill out the form
4. Go to 'APIS & AUTH' → 'APIs' and switch on the Google API

 

5. Go to 'APIS & AUTH' → 'Credentials'
6. Click 'Create new Client ID'
7. Select 'Service Account' and click 'Create Client ID'
8. Download a P12 key file by clicking 'Generate new P12 key'
9. Go to https://www.google.com/analytics/web/#management/Settings/, click on User Management and add the service account email as user with Read & Analyze permission

 

Now all is set to configure the Google API Connector node. This node connects to the general Google API with a service account email, a scope (e.g.: https://www.googleapis.com/auth/analytics.readonly), and the P12 key file, which have to be specified in the node dialog.

 

Next, the Google Analytics Connector has to be applied. This node connects to Google Analytics via the Google API and the predefined settings. In the node dialog the Google Analytics account, webproperty, profile and profile id need to be specified. Once the node is executed successfully a connection to the API has been established.

Finally the Google Analytics Query node can be applied to specify a query and load the results from Google Analytics.

 

Google Analytics Query Options

The query as well as all query parameters can be specified in the dialog of the Google Analytics Query node.

 

Dimensions and Metrics

The dimensions and metrics can be selected in the top drop down menus of the Settings tab, and added to the query. In the panel below the drop down menu information about the selected dimension and metric is shown.

Dimensions are classes such as e.g. full referrer, session count, keywords, country, operating system, and much more. Metrics are aggregations of data such as e.g. page views, users, new users, bounces, adsense revenue, etc. on the selected dimensions.

For instance if operating system and browser are specified as dimensions and users as metric, then each value represents the sum of users for the given combination of operating system and browser. For country as dimension and page load time as metric the resulting value is the average of the page load time for users of that country.

Dimensions and metrics can be sorted by using the arrow up and down buttons new dimensions and metrics can be added by either using the selector or by using the add-button(+) and typing the name. The remove-button(X) removes the selected dimension or metric from the list.

Segment

Segments filter the data before the metrics are calculated. There are predefined segments to choose in the drop down menu e.g. mobile devices. Alternatively you can specify your own segment by setting up a filter based on dimensions. The syntax is the same as for regular filters.

Filters

Filter the data after calculation of metrics. Possible operations include less than, greater than, regex matches and many more. They can also be combined with a logical AND or OR. For a full list of available operations and details about the syntax please see the node description or the Google Analytics developer documentation.

Sort

Sorts the results by the given dimension or metric. The sort order can be changed to descending by prepending a dash.

Start date and end date

Specifies the time frame for the returned data. Both start and end date are inclusive.

Start index

The API limits one query to maximum 10000 rows. To retrieve more rows the index parameter can be used as pagination mechanism.

Max results

The number of rows that should be returned. The maximum is 10000.

For details about the parameters and settings see the node description or the Google Analytics developer documentation.

Example 1

 

The result of this configuration are the top 100 referrals that brought new users to your website. The dimensions source and referralPath contain the source addresses and the path of the page from which the new users came to your website. The metric new users results in the amount of new users for every referring page. By sorting by new users in descending order and limiting the max results to 100 we only get the 100 most relevant referrals.

Example 2

 

The result of this configuration are the most relevant forum topics for the last month. The dimension pagePath is used to filter only for forum topics. The pageTitle contains the name of the forum topic. The metric counts the number of views for the specific page. The specified filters filter out all pages that had less views than 100 and filter out all pages that are not a topic in the forum (note: this path is very side specific). The result is sorted descending by page views to get the most relevant pages first. The specified start and end date keeps only the page views of the last month.

An KNIME workflow with the two examples is available for download in the attachment section of the blog.

Further reading:

Follow KNIME on Twitter.