The #KNIME Connection. Where Are You?

Mon, 10/22/2018 - 15:00 admin

Authors: Marten Pfannenschmidt and Paolo Tamagnini

There are two main analytics streams when it comes to social media: the topic and tone of the conversations and the network of connections. You can learn a lot about a user from their connection network!

Let’s take Twitter for example. The number of followers is often assumed to be an index of popularity. Furthermore, the number of retweets quantifies the popularity of a topic. The number of crossed retweets between two connections indicates the livelihood and strength of the connection. And there are many more such metrics.

@KNIME on Twitter counts more than 4500 followers (as of October 2018): the social niche of the KNIME real-life community. How many of them are expert KNIME users, how many are data scientists, how many are attentive followers of posted content?

Let’s check the top 20 active followers of @KNIME on Twitter and let’s arrange them on a chord diagram (Fig. 1).

Are you one of them?

Figure 1: Chord diagram visualizing interactions from the top 20 Twitter users around #knime. Nodes are represented as arcs along the outer circle and connected to each other via chords. The total number of retweeted tweets defines the size of the circle portion (the node) assigned to the user. A chord (the connection area) shows how often a user’s tweets have been retweeted by a specific user, and is in the retweeter’s color.

A chord diagram is another graphical representation of a graph. The nodes are represented as arcs along the outer circle and are connected to each other via chords.

The chord diagram displayed above refers to tweets including #knime during the week following July 12, 2018. This was the week immediately after the KNIME 3.6 release. The number of retweeted tweets defines the size of the circle portion (the node). Each node/user has been assigned a random color. For example @KNIME is olive, @DMR_Rosaria is orange, and @KilianThiel is green.

Being the week after the release of KNIME Analytics Platform 3.6, it is not surprising that @KNIME occupies such a large space on the outer circle.

The number of retweets by another user defines the connection area (chord), which is then displayed in the color of the retweeter. @DMR_Rosaria is an avid retweeter. She has managed to retweet the tweets by @KNIME and KNIME followers disproportionally more than everybody else and has therefore managed to make the color orange the dominant color of this chart.

However, moving on from orange, we can see that the second retweeter of KNIME tweets for that week has been @KilianThiel.

We will update this chart regularly during the KNIME Fall Summit in Austin on Nov 6-9.

Help us change this chart from an orange monochromatic scale into a polychromatic picture by tweeting and retweeting tweets with #knimesummit2018.

How to Build a Chord Diagram in KNIME Analytics Platform

We’d now like to show how we built the chord diagram using KNIME Analytics Platform.

Data Access

We access the data by using the Twitter nodes included in the KNIME Twitter API extensions. If you want to learn more about how to access Twitter data with KNIME, have a look at the Twitter Data Collection workflow or one of the other Twitter workflows on the KNIME Workflow Hub.

We gathered the sample of data around the hashtag #knime during the week following the release of KNIME Analytics Platform 3.6 on July 12 2018. Each record consists of the user name, the tweet itself, the posting date, the number of reactions and retweets and, if applicable, who retweeted it.

Let’s build the network of retweeters. A network contains edges and nodes. The users represent the nodes and their relations, i.e. how often user A retweets user B is represented by the edges. Let’s build the edges first:

  1. We filter out all tweets with no retweets or that consist of auto-retweets only.
  2. We count the number of retweets a user has retweeted tweets of another user.

To clean the data and compute the edges of the network all you need are two Row Filter nodes and a GroupBy node.

The Matrix of Nodes and Interactions

Now we want to build a weighted adjacency matrix of the network with usernames as column headers and row IDs, and the number of retweets by one username on the tweets of the other in the data cell. We achieve that by addressing the following steps (Fig. 2).

Figure 2: This metanode builds the matrix of interactions between Twitter usernames around #knime.

  1. We build a comprehensive list of all users (usernames), both tweeting and retweeting, and count the number of times their tweets have been retweeted overall. These numbers will fill the nodes of the network.
  2. We narrow our analysis down to investigate only the 20 topmost retweeted users. That is why we sort them in descending order with respect to the number of retweets on their tweets and keep only the top 20.
  3. Using a Cross Joiner node, we build the pairs of users. From an original set of twenty users we end up with 400 different user pairs.
  4. To these user pairs we add the previously computed edges by using a Joiner node.
  5. The Pivoting node then creates the matrix structure from the (username1, username2, count of retweets) data table.

You can read it like this: “The user named in Row ID’s row was retweeted n times by the user named in the column header’s column.”

Drawing the Chord Plot

  • The matrix we created is the data input for a Generic JavaScript node.
  • The Generic JavaScript node draws the chord diagram.
  • To draw the chord diagram, we need the D3 library which can be added to the code in the Generic JS node.
  • The JS code required to draw this chart is relatively simple and is shown here.

 

// creating the chord layout given the entire matrix of connections.

var g = svg.append("g")

.attr("transform", "translate(" + width / 2 + "," + height / 2 + ")")

.datum(chord(matrix));

 

// creating groups, one for each twitter user.

// each group will have a donut chart segment, ticks and labels.

var group = g.append("g")

.attr("class", "groups")

.selectAll("g")

.data(function(chords) { return chords.groups; })

.enter().append("g")

.on("mouseover", mouseover)

.on("mouseout", mouseout)

.on("click", click);

 

// creating the donut chart segments in the groups.

group.append("path")

.style("fill", function(d) { return color(d.index); })

.style("stroke", function(d) { return d3.rgb(color(d.index)).darker(); })

.attr("d", arc)

.attr("id", function(d) {

return "group" + d.index;

})

 

// creating the chords (also called ribbons) connections,

// one for each twitter users pair with at least 1 retweet.

g.append("g")

.attr("class", "ribbons")

.selectAll("path")

.data(function(chords) { return chords; })

.enter().append("path")

.attr("d", ribbon)

.style("fill", function(d) { return color(d.target.index); })

.style("stroke", function(d) { return d3.rgb(color(d.target.index)).darker(); });

Conclusion

In today’s blog post we have shown an alternative approach to the more traditional network visualization techniques by using a chord diagram. We did this by leveraging the flexibility of the Generic JavaScript View node. The JavaScript code required is an adaption of an existing D3 template and the crucial parts are displayed above.

During the week of our Fall Summit in Austin on November 6-9, 2018 we will be collecting your tweets around the #knimesummit2018 hashtag and regularly rebuilding this chord diagram. 

We are looking forward to seeing your contribution to the retweet jam!

References:

The workflow is available on KNIME EXAMPLES Server under

50_Applications/19_TwitterAnalysis/03_Visualizing_Twitter_Network_with_a_Chord_Diagram