Do you remember the Iron Chef battles?
It was a televised series of cook-offs in which famous chefs rolled up their sleeves to compete in making the perfect dish. Based on a set theme, this involved using all their experience, creativity, and imagination to transform sometimes questionable ingredients into the ultimate meal.
Hey, isn’t that just like data transformation?Or data blending, or data manipulation, or ETL, or whatever new name is trending now? In this new blog series requested by popular vote, we will ask two data chefs to use all their knowledge and creativity to compete in extracting a given data set's most useful “flavors” via reductions, aggregations, measures, KPIs, and coordinate transformations. Delicious!
Want to find out how to prepare the ingredients for a delicious data dish by aggregating financial transactions, filtering out uninformative features or extracting the essence of the customer journey? Follow us here and send us your own ideas for the “Data Chef Battles” at firstname.lastname@example.org.
Ingredient Theme: A Social Forum. Sentiment vs. Influence
Author: Rosaria Silipo & Kilian Thiel
Data Chefs: Haruto and Momoka
Ingredient Theme: A Social Forum
Today we have decided to go vintage and show the analysis implemented in the first KNIME whitepaper, where text processing met network analytics by Tobias Koetter, Kilian Thiel, and Phil Winters.
We propose the data from year 1999 of the Slashdot News Forum. Slashdot (sometimes abbreviated as “/.”) is a social news website, which was founded in 1997 for science and technology. Users can post news and stories about diverse topics and receive online comments from other users (cfr: Wikipedia).
Some years ago, we started a debate on whether the loudest customers were as important as everybody – including they themselves - thought. We started looking for public data on customer interactions about a given product and stumbled upon the Slashdot dataset. Users in the Slashdot data set are not strictly customers; they interact via a social forum about a given topic. If the topic were a product, they would be customers. So, assuming that talking about a product is a particular instance of talking about a generic topic, we decided to adopt the Slashdot data set for the analysis. We propose this same data set here again for today’s challenge.