Anonymization is a hot topic of discussion. We are generating and collecting huge amounts of data, more than ever before. A lot of this data is personal and needs to be handled sensitively. In recent times, we’ve also seen the introduction of the GDPR stipulating that only anonymized data may be used extensively and without privacy restrictions.
Author: Rosaria Silipo (KNIME)
As first published in Dataversity
Sometimes when you talk to data scientists, you get this vibe as if you’re talking to priests of an ancient religion. Obscure formulas, complex algorithms, a slang for the initiated, and on top of that, some new required script. If you get these vibes for all projects, you are probably talking to the wrong data scientists.
Author: Armin Ghassemi Rudd (Data Scientist & Consultant)
Are you trying to build an attractive CV? Maybe you’ve been searching the web for online CV builders? Using these online CV builders, you have to fill out a form and enter your information like name, contact information, skills, experiences, and so on. There are a few online CV builders that ease the job for you and ask for permission to access your LinkedIn profile and read your information. They are great tools for sure, but they have down points as well.
As first published in Harvard Data Science Review.
Given recent claims that data science can be fully automated or made accessible to nondata scientists through easy-to-use tools, I describe different types of data science roles within an organization. I then provide a view on the required skill sets of successful data scientists and how they can be obtained, concluding that data science requires both a profound understanding of the underlying methods as well as exhaustive experience gained from real-world data science projects. Despite some easy wins in specific areas using automation or easy-to-use tools, successful data science projects still require education and training.
Authors: Rosaria Silipo and Mykhailo Lisovyi
Today’s style: Caravaggio or Picasso?
While surfing on the internet a few months ago, we came across this study1, promising to train a neural network to alter any image according to your preferred painter’s style. These kinds of studies unleash your imagination (or at least ours).
Two decades into the AI revolution, deep learning is becoming a standard part of the analytics toolkit. Here’s what it means
By Michael Berthold, KNIME
Pick up a magazine, scroll through the tech blogs, or simply chat with your peers at an industry conference. You’ll quickly notice that almost everything coming out of the technology world seems to have some element of artificial intelligence or machine learning to it. The way artificial intelligence is discussed, it’s starting to sound almost like propaganda. Here is the one true technology that can solve all of your needs! AI is here to save us all!
Authors: Ana Vedoveli and Iris Adä (KNIME)
At the beginning of this year, we sent out a “Help us to Help you with KNIME” survey to the KNIME community. The idea behind the questionnaire was to listen to what the KNIME community wanted and incorporate some of those suggestions into the next releases. There were a few questions about how people are using KNIME Analytics Platform, and also questions designed to help us understand what kinds of new nodes and features people dream about. We additionally promised that we would select one dedicated node - the node most mentioned - and make sure that it would be part of our next major release.
In this post we present this "community node" and we've also put together five tips & tricks garnered from other answers given in the survey.
Authors: Kathrin Melcher, Rosaria Silipo
- Fraud detection techniques mostly stem from the anomaly detection branch of data science
- If the dataset has a sufficient number of fraud examples, supervised machine learning algorithms for classification like random forest, logistic regression can be used for fraud detection
- If the dataset has no fraud examples, we can use either the outlier detection approach using isolation forest technique or anomaly detection using the neural autoencoder
- After the machine learning model has been trained, it's evaluated on the test set using metrics such as sensitivity and specificity, or Cohen’s Kappa
Author: Julian Bunzel
Keeping track of the latest developments in research is becoming increasingly difficult with all the information published on the Internet. This is why Information Extraction (IE) tasks are gaining popularity in many different domains. Reading literature and retrieving information is extremely exhausting, so why not automate it? At least a bit. Using text processing approaches to retrieve information about drugs has been an important task over the last few years and is getting more and more important1.
Authors: Paolo Tamagnini and Rosaria Silipo
The ugly truth behind all that data
We are in the age of data. In recent years, many companies have already started collecting large amounts of data about their business. On the other hand, many companies are just starting now. If you are working in one of these companies, you might be wondering what can be done with all that data.
What about using the data to train a supervised machine learning (ML) algorithm? The ML algorithm could perform the same classification task a human would, just so much faster! It could reduce cost and inefficiencies. It could work on your blended data, like images, text documents, and just simple numbers. It could do all those things and even get you that edge over the competition.