Authors: Christian Dietz, Paolo Tamagnini, Simon Schmid, Michael Berthold
In recent months a wealth of tools has appeared, which claim to automate all or parts of the data science cycle. Those tools often automate only a few phases of the cycle, have a tendency to consider just a small subset of available models, and are limited to relatively straightforward, simple data formats.
At KNIME we take a different stance: automation should not result in black boxes, hiding the interesting pieces from everyone; the modern data science environment should allow automation and interaction to be combined flexibly. If the data science team works on a well defined type of analysis scenario, then more automation may make sense. But more often than not, the interesting analysis scenarios are not that easy to control and a certain amount of interaction with the users is actually highly desirable.
We have already described the principles of Guided Analytics and how KNIME workflows very naturally support them (see blog post “Principles of Guided Analytics”) and briefly discussed how this way of creating analytical applications allows automation and interaction to be mixed & matched. Since then, we have put together a more comprehensive workflow, serving as a blueprint for anyone to build her or his own version of a Guided Analytics application to combine just the right amount of automation and interaction for a specific set of problems. The workflow provides reusable pieces for data transformation and cleaning, feature selection and engineering, model optimization and selection and, at the end, even allows the user to download and inspect the resulting scoring workflow. The workflow is available on our new Workflow Hub and the following video walks through the different steps and explains the underlying techniques.
"Guided Analytics for Machine Learning Automation" for us is just a starting point. We will continue to provide more customized variants and we ask our community to do the same: Share them on our new Community Workflow Hub! We will have a look ourselves and maybe you could get to present your version of Guided Analytics for Data Science Automation at one of our Summits?
The workflow running behind this web based application is available here on the KNIME Workflow Hub. It's also available on the EXAMPLES server under /50_Applications/36_Guided_Analytics_for_ML_Automation