High-throughput screening, data analysis, processing, and hit identification

Fri, 06/19/2020 - 10:26 Jordi

High throughput biochemical and phenotypic screening (HTS) enables scientists to test thousands of samples simultaneously. Using automation, the effects of thousands of compounds can be evaluated on cultured cells, or using biochemical in vitro assays. The goal of HTS is to be able to identify or “hit” compounds that match certain properties. As HTS is usually conducted on very large libraries of compounds the volume of raw data that is produced is usually huge. This calls for an analysis tool that is able to handle large volumes of data easily.

Anomaly Detection Techniques: Defining Normal

Thu, 12/12/2019 - 10:00 admin

Author: Rosaria Silipo (KNIME). As first published in DarkReading.

The challenge is to identify suspicious events in training sets where no anomalies are encountered. Part two of a two-part series.

The problem of anomaly detection is not new, and a number of solutions have already been proposed over the years. However, before starting with the list of techniques, let's agree on a necessary premise: All anomaly detection techniques must involve a training set where no anomaly examples are encountered. The challenge consists of identifying suspicious events, even in the absence of examples.

The Importance of Community in Data Science

Thu, 11/21/2019 - 10:00 paolotamag

Authors: Rosaria Silipo and Paolo Tamagnini (KNIME)

Nobody is an island. Even less so a data scientist. Assembling predictive analytics workflows benefits from help and reviews: on processes and algorithms by data science colleagues; on IT infrastructure to deploy, manage, and monitor the AI-based solutions by IT professionals; on dashboards and reporting features to communicate the final results by data visualization experts; as well as on automatization features for workflow execution by system administrators. It really seems that a data scientist can benefit from a community of experts!

Data Anonymization in KNIME. A Redfield Privacy Extension Walkthrough

Mon, 11/18/2019 - 10:00 Redfield

Anonymization is a hot topic of discussion. We are generating and collecting huge amounts of data, more than ever before. A lot of this data is personal and needs to be handled sensitively. In recent times, we’ve also seen the introduction of the GDPR stipulating that only anonymized data may be used extensively and without privacy restrictions.

The 80/20 Challenge: From Classic to Innovative Data Science Projects

Thu, 11/14/2019 - 10:00 admin

Author: Rosaria Silipo (KNIME)

As first published in Dataversity

Sometimes when you talk to data scientists, you get this vibe as if you’re talking to priests of an ancient religion. Obscure formulas, complex algorithms, a slang for the initiated, and on top of that, some new required script. If you get these vibes for all projects, you are probably talking to the wrong data scientists.

From Modeling to Scoring: Correcting Predicted Class Probabilities in Imbalanced Datasets

Mon, 10/07/2019 - 10:00 Maarit

Authors: Alfredo Roccato (Data Science Trainer and Consultant) and Maarit Widmann (KNIME)

Wheeling like a hamster in the data science cycle? Don’t know when to stop training your model?

Model evaluation is an important part of a data science project and it’s exactly this part that quantifies how good your model is, how much it has improved from the previous version, how much better it is than your colleague’s model, and how much room for improvement there still is.

In this series of blog posts, we review different scoring metrics: for classification, numeric prediction, unbalanced datasets, and other similar more or less challenging model evaluation problems.

Time Series Analysis: A Simple Example with KNIME and Spark

Mon, 09/23/2019 - 10:00 admin

The task: train and evaluate a simple time series model using a random forest of regression trees and the NYC Yellow taxi dataset

Authors: Andisa Dewi and Rosaria Silipo

I think we all agree that knowing what lies ahead in the future makes life much easier. This is true for life events as well as for prices of washing machines and refrigerators, or the demand for electrical energy in an entire city. Knowing how many bottles of olive oil customers will want tomorrow or next week allows for better restocking plans in the retail store. Knowing the likely increase in the price of gas or diesel allows a trucking company to better plan its finances. There are countless examples where this kind of knowledge can be of help.

