Create

# Choropleth with R and KNIME

August 19, 2014 — by Aaron Hart

Familiar from common applications such as demographic analysis, visualizing election results, and mapping the outbreaks of disease, choropleths are an important visualization technique for aggregating spatial data. Today I want to discuss some techniques for generating such graphics in KNIME. So, to get us started, lets look at a very simple workflow to build such a visualisation.

## US State Map from R

Conceptually, choropleths are relatively simple: you just need a map and some data to color it with. For the purpose of this post, I'm pulling our map from ggplot2, a very nice graphics package available in the R programming language. The data is formatted as a series of lat/long points that are grouped together by region and ordered so as to define a polygon when plotted consecutively. Using the new R source node, we can write some very simple R code to bring this data into KNIME:

`  library(ggplot2)  knime.out <- map_data("state")`

This snippet assigns the data frame containing shape data for US states to a data frame named "knime.out". From here, the R Source node grabs the data and automatically converts it into a KNIME table.

Once we have the shape data, all we need to do is find a way to join it with whatever information we want to overlay on our maps. In this example, we are looking at KNIME download statistics. The downloads data set was derived from 3000 randomly selected registered downloads from users who self identified as being located in the USA.

Our initial code looks like:

```Downloads = knime.in\\$”Downloads”

qplot(long,
lat,
data = knime.in,
group = group,
geom = "polygon"
)```

This gives the following:

## Information Scaling

So KNIME is big in California! Unfortunately, little else about the data is easily discernible. The problem is one of scaling: there are so many downloads in California, that no other trends are really visible. In this case, we can likely increase visibility of the information in the plot by taking the log of the number of downloads in the R code:

```Downloads = log(knime.in\\$”Downloads”)

qplot(long,
lat,
data = knime.in,
group = group,
geom = "polygon"
)```

Which gives:

Further refinements to the plot to optimise its appearance are possible, but might fit better in a future post dedicated to plotting in ggplot2.

The KNIME workflow used to create the images is available for download in the attachment section of the blog.

Hopefully this was useful, if you have any follow-up questions, please write us an email.