KNIME logo
Contact usDownload
Read time: 5 min

KNIME (and data science) in the age of GenAI

April 10, 2024
Candid remarks
Gen AI blog header
Stacked TrianglesPanel BG

You may wonder why KNIME isn’t going all in on GenAI, abandoning data science and calling itself KN-AI-ME. For one, it doesn’t fit on a German license plate. But also, we strongly believe that GenAI will continue to have an impressive impact in areas we aren’t even considering yet – and that the broader range of tools and technologies needed to make sense of data will not go away any time soon. KNIME needs to give intuitive access to working with all tools and techniques – not only GenAI. 

GenAI will make data science technology a lot more powerful – but will not eliminate the need for humans working with them. Quite the opposite, actually.

We have seen similar trends before. In 2013, big data was all the hype, new companies popped up and existing companies were quick to completely reposition their software to focus on big data environments only. A few years later, traditional data storage technologies were still around and having the ability to add big data to the (typically much more diverse) mix was what truly mattered. Then, in 2017, autoML came up, and the end of data scientists was proclaimed. Ultimately, however, we learned that autoML can only automate some of the work of data scientists. It became just another tool in the toolbelt for data scientists, helping them work more efficiently – but certainly not making them redundant.

Undoubtedly, GenAI has and will continue to have a more profound impact on data science and lots of other aspects of our work than these two examples – also because it touches many more aspects of our life. However, not everything data will be AI, and AI will not replace data scientists, just like it will not replace programmers. It will take care of many of the more routine tasks and it will also continue to stimulate more creative aspects of our work, but it will not completely replace human intuition, domain knowledge, and experience.

So at KNIME, we will again not abandon all the other critical pieces of data work to focus exclusively on GenAI. Instead, we will ensure that data teams are supported by and can work with the latest and greatest GenAI technologies, so they can remove the mundane parts of data science creation and focus on what's really interesting. We'll also do our best to integrate GenAI into our software so that people can get started more easily and dive deeper into the fields of data science and data analytics. And, just like with big data and autoML, we will continue to make sure developments in the field are accessible from a KNIME workflow. Sometimes you will want to add GenAI to your workflow, sometimes you will want to wrap GenAI into a workflow to add some sanity checking for the data going in and coming out. And sometimes you will want to use GenAI to abstract your workflow and explain itself or its output to others. And I am sure we – and, more likely, our community – will come up with even more ways to augment data and data science using AI.

However, with that flexibility comes a fear: How can we make sure that a flexible use of GenAI is safe? Can we ensure only anonymized data is sent to someone else’s AI? Can we somehow guarantee that the output is not complete nonsense (or, worse, unethical, or plain illegal)? And can we make sure that use in your own organization is channeled to the most economic AIs so that consumption doesn’t go completely out of bounds?

The interesting thing is: None of this is truly new. Building models always had the risk of including confidential data or producing biased outputs. Governing data science processes is – or at least should have been – on people’s radars for years. But the risks have reached a new scale. Before, we worried about this when data science went into production. As long as our team explored new technologies, those risks were small. But now even a small, experimental data science workflow has the power to share extremely confidential information with a shady, third-party AI that promises cheaper yet even cooler responses.

It therefore helps that KNIME Hub was built from the beginning to help with the continuous, safe deployment of data science processes in mind. The mechanisms to validate and safeguard data science processes are already in place so it was easy for us to add controls around which GenAI is used when working with the Analytics Platform itself and which AI models can be accessed from within KNIME workflows.

It also helps that we have always focused on one unifying and very transparent way to work with and make sense of data. The visual workflow paradigm that underlies virtually every aspect of what we do allows documentation, auditing, and reproducibility every step of the way. And the modular Continuous Deployment of Data Science (CDDS) framework, which is also built using workflows, enables transparent validation of data and workflows to ensure no unwanted data leaks into models (or to the outside world) and outputs are sanity checked systematically. This stringent approach extends naturally to GenAI. In the end, we can ensure that no confidential data is sent to an AI model on the outside or is used for training. Incidentally, these processes that sanity check the responses from an AI model also bear a lot of similarities with those that safeguard classic data science processes. 

In our view, GenAI will not make data scientists run out of work. It will also not make their lives easier. But it will make it more fun! It will eliminate the mundane, boring tasks to allow data scientists to focus on the truly interesting, usually complex activities that truly make sense of data. That will continue to be not easy but we aim to make it as intuitive as possible so the human brain can find even more interesting stuff in data.