Groundbreaking approach removes the gap between creating data science and using it in production
Virtual KNIME Spring Summit 2020 dives into the future of data science
ZURICH and BERLIN — April 1, 2020 — KNIME today unveiled a groundbreaking approach — Integrated Deployment — to eliminate the gap between the creation of data science models and their use in production.
Integrated Deployment allows not just a model but all of its associated preparation and post- process steps to be identified and automatically reused in production with no changes or manual work required. From within the KNIME platform, organizations can replicate the process repeatedly with ease to maintain model performance.
- This not only saves massive amounts of time and frees data science and model operations resources, it also dramatically reduces the risk of errors that can occur when moving from creating a model to deploying a complete production process based on that model.
- Another benefit is that good governance and compliance reporting for such topics as GDPR and CCPR are fully supported since the entire creation and production processes are captured and stored in self-documenting workflows.
“Our open approach and close collaboration with the community means that KNIME is always at the forefront of what is possible in data science. Integrated Deployment represents another big step forward,” said Michael Berthold, CEO and co-founder of KNIME. “This solves perhaps one of the biggest problems in data science today by completely eliminating the gap between the art of data science creation and moving the results into production.”
Integrated Deployment is being unveiled today by Berthold in his livestreamed keynote presentation during the virtual KNIME Spring Summit 2020: www.knime.com/integrated-deployment.
Closing the gap: why integrated deployment matters
Integrated Deployment is significant because virtually all business topics that use decision science are affected by this gap. For example, a mobile provider might develop a model to predict whether customers will renew their contracts. This model relies on call transaction data, payment data, and information about support provided. The iterative model creation process discovers that the best model is made by combining 15 pieces of data. Nine of these pieces do not exist in the raw data but were created using both traditional mathematics as well as advanced techniques. The model method itself has had settings tuned for best performance.
Until now, the process of moving that model into production and applying it to new customers has required manual replication of the exact data creation and model settings to ensure that the model could be usable in production. With KNIME Integrated Deployment, however, the created model as well as all required steps and settings are automatically captured and packaged so that the entire production process is, for the first time, instantly available for production use.
Back to basics: KNIME refines end-to-end data science
KNIME’s Integrated Deployment approach represents the next step in the evolution of data science. Traditionally, the end-to-end data science process starts with raw data and ends with the creation of a model, but the model cannot be moved into daily production use without a lot of additional work. This is because every machine learning model uses data that have been specially optimized for it. When that model is made available in production, it requires the data in exactly the correct form.
Data science offerings to date have allowed data scientists to save the model and provide access to their library for production use, but the process of recreating the exact data required by the model is manual and involves investigating the optimized creation process to identify just those final steps required. This is then followed by manually recoding or moving portions of that create process to generate a production process. In some cases, data scientists even need to leave an environment and rebuild something different to be able to put the model in production. No matter which approach is used, it takes time and introduces a risk of errors creeping into the productionizing process.
How it works in KNIME
KNIME’s Integrated Deployment is the first approach to address these challenges effectively. Using open-source KNIME Analytics Platform, a workflow is created to generate an optimal model. Integrated Deployment allows a data scientist to mark the portions of the workflow that would be necessary for running in a production environment, including data creation and preparation as well as the model itself, and save them automatically as workflows with all appropriate settings and transformations saved. There is no limitation in this identification process — it can be simple or as advanced (and complex) as required.
KNIME Integrated Deployment automatically creates production data science
With KNIME Server in production, these captured workflows are then referenced and reused. There is no need to rewrite or recode any of the process. Moving an optimized process from creation to production can be totally automated or done manually with a simple drag-and-drop from the KNIME Analytics Platform creation environment to the KNIME Server production environment. As all production workflows are also KNIME workflows, users gain all the advantages of documentation, version control, security and collaboration.
For organizations with many production models, this setup gives the additional benefits of being able to take the optimized creation workflows and use them in a scheduled or triggered environment. In doing so, when new models are required in production, the same KNIME Server setup can rerun the creation and optimization workflow automatically, delivering the newly updated and automated production workflows to the business.
To find out more about Integrated Deployment, please visit www.knime.com/integrated-deployment.
KNIME provides open-source software for fast and intuitive access to advanced data science. At the core is the open-source KNIME Analytics Platform, a visual workbench providing a wide range of state-of-the-art analytics tools and techniques to handle any use case — from basics to highly advanced. It is complemented by the commercial KNIME Server which makes data science productive in the enterprise, while staying in the same software environment for deployment, collaboration, management and optimization. Headquartered in Zurich, KNIME has offices in Austin TX, Konstanz and Berlin. Learn more at www.knime.com.
KNIME, KNIME Analytics Platform, and KNIME Server are trademarks of KNIME. All other brand names and product names are trademarks or registered trademarks of their respective companies.
Tags: KNIME, data science, Integrated Deployment, data scientists, data analytics, machine learning, deep learning, artificial intelligence, AI, ML, open source, big data, KNIME Analytics Platform, KNIME Server