File Access and Transformation with KNIME

File Handling Extension makes ETL with KNIME even more powerful

Work with Data Files Easily

KNIME Analytics Platform provides a visual interface to work with different file formats including CSV and Excel, plus many others across different file systems. Whether you work with single files or multiple files in the cloud, your data center, on your local hard drive, or any combination of these, KNIME makes it easy to combine, extract, and analyze information from all these files. The combined and transformed data can then be used for further analysis or visualization; you can write it as files or or into databases, or sent it out to one of the integrated data sinks or reporting solutions such as Spotfire, Tableau, or Microsoft Power BI.

Quick Processing of Data: No Limits on the Amount, Almost no Limits in Formats

The number of files and amount of data you can process is only limited by the physical resources of your machine. This is due to KNIME’s sophisticated memory management and caching strategies. Whether you have to process a single Excel file or thousands, your analysis flow will look the same.

KNIME supports a vast number of different data formats. From structured data (simple text CSV files, Excel tables, JSON, or XML files) to unstructured data (documents, images, or audio files), there are dedicated reader and writer nodes plus matching processing nodes that will allow you to work efficiently with your data. KNIME also provides integrations for highly specialized file formats such as sensor, biological, or chemical data.

Note: If your machine isn’t powerful enough you can use the KNIME Database Integration to push the data processing down into the database. Alternatively, the KNIME Big Data Extension can orchestrate an entire big data system from within your KNIME workflow.

 

Example KNIME Workflows

knime_icons_rz Working with Utility Nodes

This workflow demonstrates the usage of various file utility nodes e.g. Decompress Files node. As well as how to delete and move processed files.

View on KNIME Hub

knime_icons_rz Data Transfer Between Clouds

This workflow demonstrates the usage of the file system connection nodes e.g. reading files from SharePoint into KNIME and written to Google Drive.

View on KNIME Hub

Visual Data Preparation Makes it Easy

Cleaning and preparing data for analysis or visualization is often the most time-consuming task. KNIME has hundreds of data processing nodes (a basic operation in a workflow) to visually create self-documenting, reproducible, and shareable data processing workflows. Easily add, exchange, or remove nodes from the workflow and inspect the intermediate results at any stage for fast prototyping and sanity checking.

Once the workflow is in place you can deploy it to KNIME Server to share it with your colleagues or to put it into production and have it automatically executed - either on a regular schedule or via external triggers such as a REST call or via the KNIME WebPortal.

 

Supported File Systems and Formats

Supported File Systems

  • Cloud: Amazon Web Services, Microsoft Azure, Google Cloud, Databricks

  • Big Data file systems: hdfs, httpFS, webHDFS, DBFS, ...

  • KNIME Server Repository

  • Local File System (Linux, Windows, Mac)

  • Other file systems (e.g. ssh, ftp, …)

Supported File Types

  • Excel, CSV, txt, PDF, mdf, mol2, sdf, JSON, XML, RDF, email formats, …

  • Images, Audio, Network, …

A free e-book containing 32 chapters describing data blending techniques for more than 50 data sources and external tools.

Download KNIME

Download the free and open source KNIME Analytics Platform and get started.

Download KNIME

Documentation

Read or download the relevant technical documentation.

Read Now

Read Blog

Read the blog post on the move of File Handling out of labs and into production.

Read Blog