KNIME logo
Contact usDownload
Read time: 3 min

Automate hazard information extraction for stronger risk management

September 11, 2023
ML 201 & AI
Stacked TrianglesPanel BG

KNIME software is adept at pulling chemical hazard information from Safety Data Sheets (SDS).

SDS are standardized documents by which chemical manufacturers communicate a chemical’s hazard information to chemical handlers. They typically contain chemical properties, health and environmental hazards, protective measures, as well as safety precautions for storing, handling, and transporting chemicals.

The European Union requires that every hazardous chemical clearly denotes its risks and handling precautions on its SDS. When it comes to using multiple hazardous chemicals, a company needs to gather the risk and precaution information for each one, and make that information available to anyone working with a chemical substance.

To put the relevant risk management plans together, the Health, Safety, and Environment (HSE) manager has to compile the hazard information for every single chemical a company uses. They often have to review these documents for the relevant information manually. That's not an effective process on a company-wide basis.

You can remove the need for manual work by automatically gathering and harmonizing text-based information. KNIME software can automatically pull hazard information from thousands of SDS without manual effort, for stronger risk management planning. It combines text mining and string manipulation to extract the risk information from a collection of SDS. Then, it can categorize that information by how dangerous the substance is, which a user can download.

Here's how:

The user compiles the SDS from various sources, customers, and providers. Then, the user uploads those to a KNIME workflow, either as a single PDF, a library of PDFs, or as a folder containing PDFs. The user also uploads an Excel file with a list of requested phrases to be updated. These materials may also be deployed to KNIME Business Hub if the volume requires more computational power. The Tika Parser garners results, which the software applies text mining nodes to; via string or regex manipulation, the software analyzes each sentence by searching through the Chemical Abstracts Service (CAS) database.

The results report the file name, product name, all the CAS numbers retrieved in each document, and all the retrieved phrases, which are matched with the defined user list. In accordance with regulation, the software saves SDS reporting codes concerning mutagen or cancerogenic dangers in a second Excel file.

SDS phrases are extracted by the KNIME workflow

Fig. 1: Graphical output of the workflow. The table shows the file name of the SDS, the product name, all the retrieved CAS numbers, and all the phrases contained in the document.

1000+ PDF files parsed in under 60 minutes:

KNIME Analytics Platform fully accomplished the recovery of the complete range of risk phrases. With a few thousand PDF files, all SDS present in a medium-size company were parsed in less than an hour.

Notable results:

  • Significant time saved in repetitive operations (from about two minutes to a few seconds for each SDS)

  • Useful for both single and batch processing of SDS files

  • Avoidance of deprecated terms due to updating the risk phrases list using an Excel file

Fig. 2 The KNIME workflow to automate chemical hazard information extraction from safety data sheets

An efficient process for strong risk management plans

KNIME makes this task not only faster, but also reduces the risk of human error. The Tika Parser node enables the retrieval of meta information from each file. The try/catch errors construct effectively avoids workflow mistakes, and regex code in a Java snippet isolates CAS numbers from the PDFs.

Try it out yourself!

Download the SDS Risk Phrase Extraction workflow from KNIME Community Hub to try it out for yourself.

Download this innovation note as a PDF