Extracting Risk Information

Remove the need for manual work by automatically gathering and harmonizing text-based information.

The Challenge: Automatically Extract Hazard Information

A Safety Data Sheet (SDS) is a standardized document by which chemical manufacturers communicate a chemical’s hazard information to chemical handlers. It typically contains chemical properties, health and environmental hazards, protective measures, and safety precautions for storing, handling, and transporting chemicals. Chemical handlers extract information from these by reading the section of interest, but this “manual” workflow is not effective if the Health, Safety & Environment manager needs to gather information about all chemicals used in the company in order to put an adequate risk management plan in place. This KNIME workflow makes it possible to automatically extract hazard information from thousands of SDS.

The Solution: Text Mining Workflow in KNIME

SDS from different sources, customers, and providers are gathered. The user uploads either a single PDF, a library of PDFs, or a PDF-containing folder, as well as an Excel file with the list of all the requested phrases to be updated, to a KNIME workflow - which can be deployed on KNIME Server if more computational power is needed. Text mining nodes are applied to the result of the Tika Parser to extract all sentences composing each file. Every sentence, using string or regex manipulation, is analyzed by searching the Chemical Abstracts Service (CAS) number, product name, and all risk phrases. A try and catch construct helps with large variations in the input files. The results report the file name, product name, all the CAS numbers retrieved in each document, and all the retrieved phrases, which are matched with the defined user list.

Download Workflow from KNIME Hub


Why KNIME Software

The open source KNIME Analytics Platform makes this task not only faster, but also reduces the risk of human error. The Tika Parser node enables the retrieval of meta information from each file, the try/catch errors construct effectively avoids workflow errors, and regex code in a java snippet isolates CAS numbers from PDFs.

Download this Innovation Note as a PDF

Reach out to Soluzioni Informatiche - a KNIME Trusted Partner - for consultation and support on your data projects.

Download KNIME

Download the free and open source KNIME Analytics Platform.

Download KNIME

More Solutions

Find out how other companies are using KNIME to solve their data challenges.

Read More

Meet Soluzioni Informatiche

Learn more about our Trusted Partner, Soluzioni Informatiche.

Learn More

What are you looking for?