Tipps and trick using knime.

Anomaly Detection Techniques: Defining Normal

Thu, 12/12/2019 - 10:00 admin

Author: Rosaria Silipo (KNIME). As first published in DarkReading.

The challenge is to identify suspicious events in training sets where no anomalies are encountered. Part two of a two-part series.

The problem of anomaly detection is not new, and a number of solutions have already been proposed over the years. However, before starting with the list of techniques, let's agree on a necessary premise: All anomaly detection techniques must involve a training set where no anomaly examples are encountered. The challenge consists of identifying suspicious events, even in the absence of examples.

The Importance of Community in Data Science

Thu, 11/21/2019 - 10:00 paolotamag

Authors: Rosaria Silipo and Paolo Tamagnini (KNIME)

The Importance of Community in Data Science

Nobody is an island. Even less so a data scientist. Assembling predictive analytics workflows benefits from help and reviews: on processes and algorithms by data science colleagues; on IT infrastructure to deploy, manage, and monitor the AI-based solutions by IT professionals; on dashboards and reporting features to communicate the final results by data visualization experts; as well as on automatization features for workflow execution by system administrators. It really seems that a data scientist can benefit from a community of experts!

Data Anonymization in KNIME. A Redfield Privacy Extension Walkthrough

Mon, 11/18/2019 - 10:00 Redfield

Anonymization is a hot topic of discussion. We are generating and collecting huge amounts of data, more than ever before. A lot of this data is personal and needs to be handled sensitively. In recent times, we’ve also seen the introduction of the GDPR stipulating that only anonymized data may be used extensively and without privacy restrictions.

The 80/20 Challenge: From Classic to Innovative Data Science Projects

Thu, 11/14/2019 - 10:00 admin

Author: Rosaria Silipo (KNIME)

As first published in Dataversity

Sometimes when you talk to data scientists, you get this vibe as if you’re talking to priests of an ancient religion. Obscure formulas, complex algorithms, a slang for the initiated, and on top of that, some new required script. If you get these vibes for all projects, you are probably talking to the wrong data scientists.

From Modeling to Scoring: Correcting Predicted Class Probabilities in Imbalanced Datasets

Mon, 10/07/2019 - 10:00 Maarit

Authors: Alfredo Roccato (Data Science Trainer and Consultant) and Maarit Widmann (KNIME)

Wheeling like a hamster in the data science cycle? Don’t know when to stop training your model?

Model evaluation is an important part of a data science project and it’s exactly this part that quantifies how good your model is, how much it has improved from the previous version, how much better it is than your colleague’s model, and how much room for improvement there still is.

In this series of blog posts, we review different scoring metrics: for classification, numeric prediction, unbalanced datasets, and other similar more or less challenging model evaluation problems.

Time Series Analysis: A Simple Example with KNIME and Spark

Mon, 09/23/2019 - 10:00 admin

The task: train and evaluate a simple time series model using a random forest of regression trees and the NYC Yellow taxi dataset

Authors: Andisa Dewi and Rosaria Silipo

I think we all agree that knowing what lies ahead in the future makes life much easier. This is true for life events as well as for prices of washing machines and refrigerators, or the demand for electrical energy in an entire city. Knowing how many bottles of olive oil customers will want tomorrow or next week allows for better restocking plans in the retail store. Knowing the likely increase in the price of gas or diesel allows a trucking company to better plan its finances. There are countless examples where this kind of knowledge can be of help.

Transfer Learning Made Easy with Deep Learning Keras Integration

Mon, 09/16/2019 - 10:00 Corey

Author: Corey Weisinger

You’ve always been able to fine tune and modify your networks in KNIME Analytics Platform by using the Deep Learning Python nodes such as the DL Python Network Editor or DL Python Learner, but with recent updates to KNIME Analytics Platform and the KNIME Deep Learning Keras Integration there are more tools available to do this without leaving the familiar KNIME GUI.

Accessing the HELM Monomer Library with KNIME

Mon, 09/09/2019 - 10:00 longoka

Author: Kenneth Longo

The cheminformatics world is replete with software tools and file formats for the design, manipulation and management of small molecules and libraries thereof. Those tools and formats are often specialized in analyzing small molecules of ~500 daltons, give or take a few, or those molecules that can reasonably be drawn and understood using classic ball-and-stick or molecular coordinate frameworks. Perhaps not coincidentally, this neatly envelops the needs of small molecule drug discovery, where it is not uncommon to find both public and privately-held repositories of hundreds of thousands (to millions) of such molecules, for use in molecular or phenotypic screening assays. The small size and elemental simplicity of these molecules has resulted in a variety of storage file formats (e.g., mol, SMILES, sdf, etc) and many supporting software packages (e.g., RDkit, CDK, ChemAxon, etc) for visualization and manipulation that support them. KNIME Analytics Platform provides easy access to those file formats and software packages.

KNIME Analytics Platform 4.0: Components are for Sharing

Thu, 06/27/2019 - 10:00 michael.berthold
With this release we are continuing our progress toward a community oriented data science platform, adding lots of functionality that enables easier sharing with the KNIME Community. Most noticeably, of course, the KNIME Hub itself but there are also a number of changes in KNIME Analytics Platform making sharing and collaborating with the community easier...
Subscribe to KNIME Blog: tech