17 Oct 2019admin

Authors: By Paolo Tamagnini and Rosaria Silipo

The ugly truth behind all that data

We are in the age of data. In recent years, many companies have already started collecting large amounts of data about their business. On the other hand, many companies are just starting now. If you are working in one of these companies, you might be wondering what can be done with all that data.

What about using the data to train a supervised machine learning (ML) algorithm? The ML algorithm could perform the same classification task a human would, just so much faster! It could reduce cost and inefficiencies. It could work on your blended data, like images, text documents, and just simple numbers. It could do all those things and even get you that edge over the competition.

Read more


14 Oct 2019admin

Authors: Scott Fincher, Paolo Tamagnini, Maarit Widmann

Guided Visualization and Exploration

No matter if we are experienced data scientists or business analysts, one of our daily routines is the easy and smooth extraction of the relevant information from our data regardless of the kind of analysis we are facing.

A good practice for this is to use data visualizations: charts and graphs to visually summarize the complexity in the data. The required expertise for data visualization can be divided in two main areas:

  • The ability to correctly prepare and select a subset of the dataset columns and visualize them in the right chart
  • The ability to interpret the visual results and take the right business decisions based on what is displayed

Read more


07 Oct 2019Maarit

Authors: Alfredo Roccato (Data Science Trainer and Consultant) and Maarit Widmann (KNIME)

Wheeling like a hamster in the data science cycle? Don’t know when to stop training your model?

Model evaluation is an important part of a data science project and it’s exactly this part that quantifies how good your model is, how much it has improved from the previous version, how much better it is than your colleague’s model, and how much room for improvement there still is.

In this series of blog posts, we review different scoring metrics: for classification, numeric prediction, unbalanced datasets, and other similar more or less challenging model evaluation problems.

Read more


30 Sep 2019admin

Author: Angus Veitch

KNIME: a gateway to computational social science and digital humanities

I discovered KNIME by chance when I started my PhD in 2014. This discovery changed the course of my PhD and my career. Well, who knows: perhaps I would have eventually learned how to do things like text processing, topic modelling and named entity extraction in R or Python. But with no previous programming experience, I did not feel ready to take the plunge into those platforms. KNIME gave me the opportunity to learn a new skill set while still having time to think and write about what the results actually meant in the context of media studies and social science, which was the subject of my PhD research.

KNIME is still my go-to tool for data analysis of all kinds, textual and otherwise. I use it not only to analyse contemporary text data from news and social media, but to analyse historical texts as well. In fact, I think the accessibility of KNIME makes it the perfect tool for scholars in the field knowns as the digital humanities, where computational methods are being applied to the study of history, literature and art.

Read more


23 Sep 2019admin

The task: train and evaluate a simple time series model using a random forest of regression trees and the NYC Yellow taxi dataset

Authors: Andisa Dewi and Rosaria Silipo

I think we all agree that knowing what lies ahead in the future makes life much easier. This is true for life events as well as for prices of washing machines and refrigerators, or the demand for electrical energy in an entire city. Knowing how many bottles of olive oil customers will want tomorrow or next week allows for better restocking plans in the retail store. Knowing the likely increase in the price of gas or diesel allows a trucking company to better plan its finances. There are countless examples where this kind of knowledge can be of help.

Read more


16 Sep 2019Corey

Author: Corey Weisinger

You’ve always been able to fine tune and modify your networks in KNIME Analytics Platform by using the Deep Learning Python nodes such as the DL Python Network Editor or DL Python Learner, but with recent updates to KNIME Analytics Platform and the KNIME Deep Learning Keras Integration there are more tools available to do this without leaving the familiar KNIME GUI.

Read more


09 Sep 2019longoka

Author: Kenneth Longo

The cheminformatics world is replete with software tools and file formats for the design, manipulation and management of small molecules and libraries thereof. Those tools and formats are often specialized in analyzing small molecules of ~500 daltons, give or take a few, or those molecules that can reasonably be drawn and understood using classic ball-and-stick or molecular coordinate frameworks. Perhaps not coincidentally, this neatly envelops the needs of small molecule drug discovery, where it is not uncommon to find both public and privately-held repositories of hundreds of thousands (to millions) of such molecules, for use in molecular or phenotypic screening assays. The small size and elemental simplicity of these molecules has resulted in a variety of storage file formats (e.g., mol, SMILES, sdf, etc) and many supporting software packages (e.g., RDkit, CDK, ChemAxon, etc) for visualization and manipulation that support them. KNIME Analytics Platform provides easy access to those file formats and software packages.

Read more


02 Sep 2019admin

Recently on social media we asked you for tips on tidying up and improving workflows. Our aim was to find out how you declutter to make your workflows not just superficially neater, but faster, more efficient, and smaller: ultimately an elegant masterpiece! Check out the original posts on LinkedIn and Twitter.

Declutter - Four Tips for an Efficient, Fast Workflow
Fig. 1 From confusion to clarity - decluttering your workflow

Read more


29 Jul 2019admin

Using Okta to Modernize LDAP

Author: James Weakley

We’d like to introduce James Weakley, a Data Architect at nib health funds, who recently wrote a short blog post on the topic of KNIME Server and Okta. James has given us permission to republish it here. But first a few words about James.

Read more


22 Jul 2019berthold

The data science dilemma: Automation, APIs, or custom data science?

As companies place an increasing premium on data science, there is some debate about which approach is best to adopt — and there is no straight up, one-size-fits-all answer. It really depends on your organization’s needs and what you hope to accomplish.

There are three main approaches that have been discussed over the past couple of years; it’s worth taking a look at the merits and limitations of each as well as the human element involved. After all, knowing the capabilities of your team and who you’re attempting to serve with data science influences heavily how to implement it.

Read more


Subscribe to KNIME news, usage, and development