Data Science Team Managers Name Two Key Factors
I host a monthly podcast on LinkedIn Live called My Data Guest., Each episode features an interview with an expert — on data, education, management, and more. All of them are also technical experts in KNIME.
Today let’s see what we can learn from them about implementing an AI strategy within a company.
Note. All 12 episodes are available on my YouTube channel, each offering something to learn.
1st Key Factor: The Talents
Let’s start from the first key factor: the employees. We all know by now that the idea of a unicorn data scientist who can run a whole project by themselves is a myth. You need data analysts for the dashboards, data engineers for the plumbing, data scientists for the machine learning models, and domain and business experts to ensure the trained models are useful. So how can you assemble all those different professional figures in a single lab? Where do you start?
We asked Andrea De Mauro (Head of Data & Analytics at Vodafone, Milan, Italy) how he has built his many data science teams over the years. In response, he turned around the question and moved focus to the talent already available within a company; those who are just waiting to be educated. The first step in any AI strategy is to repurpose and reeducate those employees wishing for a career change.
“Let’s start where you don't start from! You don’t start by hiring dozens of generic data professionals without first looking at the talents that you already have in your family. You can definitely grow data scientists from the current talents you already have in your company by upskilling those who have curiosity, passion, and willingness to learn.”
This approach has two additional advantages. It will boost the morale of the current lab crew and leverage the domain and business knowledge of the current employees.
“Following this course of action, in my opinion, has two major benefits, which I feel are worth mentioning. Only the person with knowledge and understanding about how the company operates and how data flows through it can really understand the real opportunities and build some meaningful data analytics. It’s refreshing for the professionals, no matter what their background is, to boost their career path and development by getting serious on data analytics. This opportunity should not be restricted to those who have a technical or an IT background.”
2nd Key Factor: The Tools
This is the second key factor. The KNIME product landscape offers two software tools for data science:
KNIME Analytics Platform, low code and open source, helps with the creation of workflows for the analysis, transformation, and visualization of the data.
KNIME Business Hub, our commercial offering, provides the IT infrastructure for easy productionization of the applications implemented with KNIME Analytics Platform.
Both tools work in tandem. One supports the development process of data science applications, and the other the productionization and monitoring process. In an enterprise environment, the pure development cycle must be supported by a controlled productionization cycle. Let’s see what our experts think about that.
KNIME Analytics Platform for Development
Our experts value these features of KNIME Analytics Platform for the data science creation process.
Visual programming for fast development, simple debugging, clear documentation and easy collaboration. Ease of use is provided by the ability to blend many different data types, powerful ETL operations, integration with other languages, such as Python & R, and reusability via components
Evan Bristow (Senior Principal Analyst at Genesys, USA) highlights data blending, especially when a variety of data sources is involved:
“The ability to integrate different data sources and technologies is probably one of the best things in KNIME. I can pull data from essentially anywhere and store it essentially anywhere without worrying about if it will blend. That’s something you often have to deal with in business: you’ve got data stored on a server, you have data in a smartsheet somewhere, and someone sends you an Excel file. With KNIME you can bring all those scattered pieces of information together, build an analysis out of it, and easily put it back on your server to create a visualization.”
Philipp Kowalski (Digitalization Evangelist at Siemens, Germany) vouches for ETL operations:
“In the beginning of a project I use a lot of ETL nodes for data cleaning, data preprocessing, data summarizing, or data export.”
A lot of appreciation for KNIME comes from its intuitive interface, which speeds up development. This story by Vijaykrishna Venkataram (Senior Manager Data Analytics at Relevantz, Chennai, India) is enlightening:
“Probably [the best feature of KNIME Analytics] is speed of implementation. Okay, let me give an example. As you know, the loans in the banking sector are issued in multiple currencies and the currencies keep fluctuating every day. Now, let’s say I’m doing a portfolio review and I want to see how my portfolio looks today. So I need the latest currency value to convert the loans of my portfolio into the actual currency."
“We have asked our IT team whether it is possible to build an app that updates the exchange rates for about 50-60 currencies on a daily basis. However, it would’ve taken too long for them to build such a pipeline."
“So, the next best solution was to make use of KNIME. Using the KNIME REST nodes, we created a workflow that automatically crawls the exchange rates provided by the respective currency provider on a daily basis and then updates the exchange rates and writes them back to the database. This solution was literally implemented in a couple of days. Thanks to the KNIME Server we could orchestrate the workflow to run it every day at 12 o’clock.”
We heard a similar story from Malik Yousef (Professor, The Head of the Galilee Digital Health Research Center, Israel):
“I save time using KNIME just because it is faster doing it in KNIME. I used Matlab before and, for example, to write the code in Matlab it took me more than one month but to do the same thing in KNIME took me only one and a half days.”
A graphical user interface also facilitates debugging. Still, according to Yousef, “with KNIME I don’t have to spend a lot of time on debugging only to find a small bug. It gives me more time to focus on my research.”
KNIME's visual programming environment improves communication among professionals with different backgrounds. “Most biologists are no computational guys. Showing them actual Python, R, or Java code would be difficult. However, using KNIME and showing them a workflow in KNIME Analytics Platform made it easier to communicate with them,” says Malik Yousef.
The integration with Python and R is listed as one of the key features of KNIME Analytics Platform enabling good workflow development. “The ability to combine Python or R with KNIME is very powerful. For a lot of things I’m doing, I use R and Python code. With KNIME, it’s possible to use existing approaches, tools, or algorithms and to combine them,” comments Yousef.
Finally, Bristow reports that components are also a winning factor for KNIME Analytics Platform: “The ability to abstract segments and processes of your workflow into a component. That helps you focus on what you are doing instead of what you are writing.”
KNIME Business Hub for Production
KNIME Business Hub is our commercial project designed for enterprises to facilitate and accelerate their productionization process. In comparison to individuals working on data science projects, orgnaizations need a reliable and robust process to support the production queue.
Here is how Vijaykrishna explains it:
“How to automatically trigger the workflows and how to set things up? This is where KNIME [Business Hub] becomes really valuable, as you can schedule jobs. For example, this dashboard here is what the end user would see without having to work through the workflows behind it. For someone at the business side, this is all that person needs to see.”
How can a single data professional convince upper management that they need KNIME Business Hub to support their processes? This is what Vijaykrishna suggests:
“Create a tricky business case and solve the problem using KNIME. Show them, for example, the KNIME WebPortal so they see the benefits of it without requiring KNIME knowledge per se.”
Integrating Your Tool of Choice with KNIME
We have also seen that one of the most attractive features of KNIME Analytics Platform is its ability to integrate with Python, R, and many other tools. According to Vijaykrishna, data scientists use KNIME for everyday work and Python for special machine learning tasks:
“Initially, we have been using proprietary software for running our scoring models. We started with SAS but the license got very costly and SAS talents in the market were getting dearer. The younger generation was using more open source software which we wanted to adapt to. So we started using Python for most of our machine learning work and SQL for all our data munging. Now we’re using KNIME for all our ETL tasks and still rely on Python for all the machine learning tasks. However, we are slowly migrating some of our machine learning work from Python to KNIME. Especially with the latest v4.6 release, we can now create nodes with Python.”
Vijay also refers to BI tools and big data scripts for more special tasks. All tools, Python, BI, and big data, can be easily incorporated into and controlled from a KNIME workflow:
“Besides KNIME and Python, we use other Business Intelligence software, predominantly PowerBI. Now that we are experimenting with increased size of data, we are looking at distributed systems. Spark looks promising because of its speed and because a KNIME Extension is available.“
No one tool fits all the needs of an AI / data science lab. Rather, an integration of a number of tools is often necessary to address different aspects of the data science applications. KNIME Analytics Platform has an open architecture, which allows for seamless integration of data sources as well as external tools, like Python, R, and many reporting tools.
Combining Talent & Tool for a Successful AI Strategy
There are two main factors in a successful AI strategy: human talent and the right tools. According to our data science and management experts, the former should be fostered internally before looking elsewhere, while tools should cover development and productionization whilst at the same time remaining open to integrating other external tools.