Build data science workflows
Nodes for the entire data science life cycle
Model each step of your analysis, control the flow of data, and ensure your work is always current.
Blend tools from different domains with KNIME native nodes in a single workflow, including scripting in R & Python, machine learning, or connectors to Apache Spark.
Get started quickly
Check out the KNIME Hub and the hundreds of publicly available workflows, or use the integrated workflow coach.
Blend data from any source
Open and combine simple text formats (CSV, PDF, XLS, JSON, XML, etc), unstructured data types (images, documents, networks, molecules, etc), or time series data.
Connect to a host of databases and data warehouses to integrate data from Oracle, Microsoft SQL, Apache Hive, and more. Load Avro, Parquet, or ORC files from HDFS, S3, or Azure.
Access and retrieve data from sources such as Salesforce, SharePoint, SAP Reader (Theobald), Twitter, AWS S3, Google Sheets, Azure, and more.
Shape your data
Derive statistics, including mean, quantiles, and standard deviation, or apply statistical tests to validate a hypothesis. Integrate dimensions reduction, correlation analysis, and more into your workflows.
Aggregate, sort, filter, and join data either on your local machine, in-database, or in distributed big data environments.
Clean data through normalisation, data type conversion, and missing value handling. Detect out of range values with outlier and anomaly detection algorithms.
Extract and select features (or construct new ones) to prepare your dataset for machine learning with genetic algorithms, random search or backward- and forward feature elimination. Manipulate text, apply formulas on numerical data, and apply rules to filter out or mark samples.
Leverage Machine Learning & AI
Build machine learning models for classification, regression, dimension reduction, or clustering, using advanced algorithms including deep learning, tree-based methods, and logistic regression.
Optimize model performance with hyperparameter optimisation, boosting, bagging, stacking, or building complex ensembles.
Validate models by applying performance metrics including Accuracy, R2, AUC, and ROC. Perform cross validation to guarantee model stability.
Explain machine learning models with LIME, Shap/Shapley values. Understand model predictions with the interactive partial dependence/ICE plot.
Make predictions using validated models directly, or with industry leading PMML, including on Apache Spark.
Discover and share insights
Visualize data with classic (bar chart, scatter plot) as well as advanced charts (parallel coordinates, sunburst, network graph, heat map) and customise them to your needs.
Display summary statistics about columns in a KNIME table and filter out anything that's irrelevant.
Export reports as PDF, PowerPoint, or other formats for presenting results to stakeholders.
Store processed data or analytics results in many common file formats or databases.
Scale execution with demands
Build workflow prototypes to explore various analysis approaches. Inspect and save intermediate results to ensure fast feedback and efficient discovery of new, creative solutions.
Scale workflow performance through in-memory streaming and multi-threaded data processing.
Exercise the power of in-database processing or distributed computing on Apache Spark to further increase computation performance.