File Access and Transformation with KNIME
File Handling Extension makes ETL with KNIME even more powerful
Working with data files made easy
KNIME Analytics Platform provides a visual interface to work with different file formats including CSV and Excel, plus many others across different file systems. Whether you work with single files or multiple files in the cloud, your data center, on your local hard drive, or any combination of these, KNIME makes it easy to combine, extract, and analyze information from all these files. The combined and transformed data can then be used for further analysis or visualization; you can write it as files or or into databases, or sent it out to one of the integrated data sinks or reporting solutions such as Spotfire, Tableau, or Microsoft Power BI.
Quick processing of data: no limits on the amount, almost no limits in formats
The number of files and amount of data you can process is only limited by the physical resources of your machine. This is due to KNIME’s sophisticated memory management and caching strategies. Whether you have to process a single Excel file or thousands, your analysis flow will look the same.
KNIME supports a vast number of different data formats. From structured data (simple text CSV files, Excel tables, JSON, or XML files) to unstructured data (documents, images, or audio files), there are dedicated reader and writer nodes plus matching processing nodes that will allow you to work efficiently with your data. KNIME also provides integrations for highly specialized file formats such as sensor, biological, or chemical data.
Note: If your machine isn’t powerful enough you can use the KNIME Database Integration to push the data processing down into the database. Alternatively, the KNIME Big Data Extension can orchestrate an entire big data system from within your KNIME workflow.
Two example workflows
Working with utility nodes
This workflow demonstrates the usage of various file utility nodes e.g. Decompress Files node. As well as how to delete and move processed files.
Data transfer between clouds
This workflow demonstrates the usage of the file system connection nodes e.g. reading files from SharePoint into KNIME and written to Google Drive.
Visual data preparation makes it easy
Cleaning and preparing data for analysis or visualization is often the most time-consuming task. KNIME has hundreds of data processing nodes (a basic operation in a workflow) to visually create self-documenting, reproducible, and shareable data processing workflows. Easily add, exchange, or remove nodes from the workflow and inspect the intermediate results at any stage for fast prototyping and sanity checking.
Once the workflow is in place you can deploy it to KNIME Server to share it with your colleagues or to put it into production and have it automatically executed - either on a regular schedule or via external triggers such as a REST call or via the KNIME WebPortal.
Supported file systems and formats
Supported file systems
- Cloud: Amazon Web Services, Microsoft Azure, Google Cloud, Databricks
- Big Data file systems: hdfs, httpFS, webHDFS, DBFS, ...
- KNIME Server Repository
- Local File System (Linux, Windows, Mac)
- Other file systems (e.g. ssh, ftp, …)
Supported file types
- Excel, CSV, txt, PDF, mdf, mol2, sdf, JSON, XML, RDF, email formats, …
- Images, Audio, Network, …
Watch Bernd talk about these new features at KNIME Fall Summit 2020.
Download the free and open source KNIME Analytics Platform and get started.
Read or download the relevant technical documentation.
Read the blog post on the move of File Handling out of labs and into production.