Extracting Risk Information

Remove the need for manual work by automatically gathering and harmonizing text-based information.

View workflow on KNIME Hub

The Challenge

A Safety Data Sheet (SDS) is a standardized document by which chemical manufacturers communicate a chemical’s hazard information to chemical handlers. It typically contains chemical properties, health and environmental hazards, protective measures, and safety precautions for storing, handling, and transporting chemicals. Chemical handlers extract information from these by reading the section of interest, but this “manual” workflow is not effective if the Health, Safety & Environment manager needs to gather information about all chemicals used in the company in order to put an adequate risk management plan in place. This KNIME workflow makes it possible to automatically extract hazard information from thousands of SDS.

knime_icons_rz Our Solution

SDS from different sources, customers, and providers are gathered. The user uploads either a single PDF, a library of PDFs, or a PDF-containing folder, as well as an Excel file with the list of all the requested phrases to be updated, to a KNIME workflow - which can be deployed on KNIME Server if more computational power is needed. Text mining nodes are applied to the result of the Tika Parser to extract all sentences composing each file. Every sentence, using string or regex manipulation, is analyzed by searching the Chemical Abstracts Service (CAS) number, product name, and all risk phrases. A try and catch construct helps with large variations in the input files. The results report the file name, product name, all the CAS numbers retrieved in each document, and all the retrieved phrases, which are matched with the defined user list.

Why KNIME Software

The open source KNIME Analytics Platform makes this task not only faster, but also reduces the risk of human error. The Tika Parser node enables the retrieval of meta information from each file, the try/catch errors construct effectively avoids workflow errors, and regex code in a java snippet isolates CAS numbers from PDFs.

This Innovation Note was written by our trusted partner S-IN Soluzioni Informatiche.

Explore KNIME

knime_icons_rz Download

This Innovation Note is available for sharing as a PDF.

Download now

Contact us

For information on KNIME Software and what it can do for you.

Contact us

What can KNIME do

Explore solutions to real world problems with KNIME Software.

Read more