Just KNIME It!

The challenges are back! Register now

Challenge 7: AI-Generated Football Scouting Report

Level: Medium

Description: You are working as a data analyst for a top European football club that is looking to recruit new talent for the upcoming season. The club has provided you with the Football Players Stats 2024-2025 dataset and wants to use AI-powered scouting to find undervalued players, rising stars etc. To answer these questions, you have been provided with three sample prompts. Either use those prompts or be creative with your own prompts and come up with a report for the scout. Note: It is not mandatory to use Open AI LLM models. You can also use local LLMs.

Prompt ideas:
- "Summarize the strengths and weaknesses of Player X using the provided data."
- "Who is the most well-rounded midfielder based on - available statistics?"
- "Which players have similar playing styles to Player X based on their stats?”

Beginner-friendly objective(s): 1. Set up the initial data reading process by configuring the CSV Reader node to import the dataset. 2. Filter the dataset based on specific criteria using the Row Filter node.

Intermediate-friendly objective(s): 3. Convert the filtered data into JSON format and manage flow variables for dynamic workflow control. 4. Create and configure prompts for the language model using Variable Expression nodes. 5. Integrate the language model interaction by setting up the LLM Prompter nodes to generate responses based on the prompts. 6. Compile the final report by configuring the Report PDF Writer node to output the results.

Author: Sanket Joshi

Dataset: Football Players Dataset in KNIME Community Hub

Remember to upload your solution with tag JKISeason4-7 to your public space on KNIME Community Hub. To increase the visibility of your solution, also post it to this challenge thread on KNIME Forum.

We will post our solution to this challenge here next Tuesday.

Go to KNIME Forum

Previous challenges

Level: Medium

Description: You are a medical researcher working with a hospital to uncover key risk factors behind heart failure. Using an unbalanced dataset of patient records, your task is to build a predictive model to identify potential heart disease cases. But accuracy alone is not enough—clinicians want to understand why the model makes the predictions it does. Train your model, then apply explainable AI (xAI) techniques to reveal the top three features influencing its decisions. Can your insights help doctors detect heart failure earlier and more effectively?

Beginner-friendly objectives: 1. Load and preprocess the heart disease dataset, ensuring that the data is clean and ready for analysis. 2. Perform a train-test split on the dataset, maintaining the class distribution for accurate model evaluation.

Intermediate-friendly objectives: 1. Implement a parameter optimization loop to fine-tune the model's hyperparameters for improved performance. 2. Within the Parameter Optimization Loop, conduct cross-validation to assess the model's robustness and generalization (default: Naïve Bayes, but feel free to experiment with other models). 3. Integrate multiple data science techniques, including one-hot encoding and normalization, to enhance the model's predictive power. 4. Evaluate the model's performance using advanced metrics and visualization techniques to gain insights into its accuracy and reliability. 5. Use the Surrogate Random Forest model from Global Feature Importance to determine the top 3 most important features driving predictions.

What are the top 3 features responsible for the model's predictions?

Author: Keerthan Shetty

Dataset: Heart Failure Dataset in KNIME Community Hub

Solution Summary: The solution involves a comprehensive workflow that begins with reading and preprocessing a datase on heart failure. The data is then split into training and test sets, ensuring the class distribution is maintained. A Naive Bayes model is trained, with hyperparameters optimized through a parameter optimization loop, and cross-validation is employed to validate the model's performance. The final model is evaluated using a variety of metrics to assess its predictive quality. The workflow leverages techniques such as one-hot encoding, normalization, and Bayesian optimization to enhance the model's effectiveness. At the end, the Global Feature Importance component is used to check the importance of different features.

Solution Details: The workflow starts with a CSV Reader node to load the heart failure dataset, followed by a Number to String node to convert the target variable, "HeartDisease," into a string format. The data is then partitioned ino training and test sets using the Partitioning node, with stratified sampling based on the "HeartDisease" column. A Parameter Optimization Loop Start node is used to optimize the def_prob parameter using Bayesian Optimization, with a range of 0.001 to 0.01. Next, the Naive Bayes Learner node is employed to train the model, and the Normalizer node applies Min-Max normalization to the numeric data. One-hot encoding is performed using the One to Many (PMML) node, transforming categorical features into binary variables. The model's predictions are generated using the Naive Bayes Predictor node, and the results are aggregated with the X-Aggregator node. The Scorer node evaluates the model's performance by comparing actual and predicted values, while the Column Splitter node separates the target variable for further analysis. Throughout the workflow, key nodes such as Column Filter, Column Appender, and Table Row to Variable are used to manage data flow and ensure the integrity of the analysis. The workflow uses the Capture Workflow nodes, and send predictions to the Global Feature Importance component along with the test dataset.

See our Solution in KNIME Community Hub

Level: Hard

Description: Emma is an interior designer recently hired to help a new homeowner enhance the aesthetics of several rooms in her house. To kick off the project, Emma is asked to focus on just one room of her choice. The client, having just purchased the property, is currently working with a limited budget: her goal is to refresh the space using what she already has—relying on changes in decoration, style, accessories, or layout to improve the room’s functionality and visual appeal. Emma’s task is to produce multiple photorealistic visualizations showing how the selected room could look after a restyle.

To help Emma, you decide to build a workflow that relies on AI and the generative capabilities of vision models to generate images. Start by selecting a room (e.g., private office) and defining its existing essential furniture pieces (e.g., a desk, chair, bookshelf, or lamp), which must remain consistent throughout the process. Then, generate a base image of the room in its current state. From there, use the model to create three stylistically distinct reinterpretations—e.g., as minimalist, industrial, and bohemian—by modifying only the decorative elements, accessories, or layout. Finally, save the resulting images and display them side by side in a table for easy comparison. Can you design a solution that loops through various styles and prompts the model to produces compelling visual proposals Emma’s client will love?

Beginner-friendly objectives: 1. Connect to a vision model of your choice (for example, OpenAI's GPT Image 1 or DALL-3). 2. Type a prompt to generate a base image of the room with the existing essential furniture pieces (you choose which room and which furniture pieces). Tweak the settings of the image generator to your liking. 3. Save the generated image.

Intermediate-friendly objectives: 1. Instead of typing the prompt directly in the image generator, define in a table the base room (e.g., “private office”) and a few furniture items (e.g., desk, chair) to compose a parameterized prompt that generates the image of the base room. 2. Edit the image of the base room by prompting the vision model a second time (note: OpenAI's GPT Image 1 supports image editing). Ask the model with a parameterized prompt to create three stylistically distinct reinterpretation of the room (you choose which style) by modifying only the decorative elements, accessories, or layout - not the furniture pieces. 3. Display the edited images and save them.

Advanced objectives: 1. Save each edited image immediately after it is generated, rather than waiting until all styles have been processed. Ensure that each image file is named automatically according to the style it represents (e.g., minimalist.png, bohemian.png, industrial.png) for easy identification.

Author: Roberto Cadili

Solution Summary: To solve this challenge, we leverage OpenAI’s image generation and editing capabilities to automate the creation of room redesigns using fixed furniture pieces. The process starts with structured input of room type and essential furniture, followed by generation of a base image. A loop mechanism iterates through predefined styles, and we edit the base image only with decorative elements and layout. Each stylized image is saved and visualized side-by-side for comparative inspection.

Solution Details: We begin our solution by using a Table Creator node to define the room type, list of essential furniture pieces, and the target design styles. To securely connect to the OpenAI API, we configure access using the Credentials Configuration and OpenAI Authenticator nodes. We then process the initial data with a GroupBy node to prepare it for prompt engineering, and use the Expression node to build a parameterized prompt describing the base room. This prompt is injected as a flow variable into the OpenAI Image Generator node, which generates a photorealistic image of the room with its existing furniture. The output image is saved and simultaneously passed as input to a second OpenAI Image Generator node. This node is wrapped within a loop controlled by a Table Row to Variable Loop Start, which iterates over a table of parameterized prompts with predefined styles. For each iteration, we inject a style-specific prompt, requesting the model to edit only decorative elements, layout, and accessories in the input image—while keeping the core furniture unchanged. Edited images are saved immediately after generation using the Image Writer (Table) node within the loop. To ensure that we can dynamically assign filenames based on the style, we configure the Image Writer (Table) node to pick the name from the RowID column. After all iterations complete, we collect the outputs via a Loop End node, transpose the resulting table using a Table Transposer, and display the images side-by-side for comparison using a Table View node.

See our Solution in KNIME Community Hub

Level: Medium

Description: You work for an entertainment company that wants to analyze various TV shows to gain insights into audience preferences. Your next project focuses on transforming data from the show “Rick and Morty” into a structured dataset that can be easily analyzed and queried. As a first step, you need to extract a complete list of characters featured in the series. However, the API that provides this information is paginated, returning separate JSON files for each page. How can you use KNIME to retrieve and merge all these pages into one unified catalog of the show’s characters? To solve this challenge, use the https://rickandmortyapi.com/api url.

Beginner-friendly objective: 1. Successfully send a Get Request to receive the characters data from the API.

Intermediate-friendly objectives: 1. Implement a recursive loop to handle paginated API responses and ensure all data is collected. 2. Unify all the pages into a table with two string columns, id and character name.

Author: Babak Khalilvandian

Solution Summary: The solution involves creating a KNIME workflow that interacts with the Rick and Morty API to fetch character data. It uses a recursive loop to handle paginated responses, extracting and transforming JSON data into a structured format. The workflow filters and processes the JSON data to focus on specific character attributes, transforming the data into tabular format.

Solution Details: The workflow begins with a GET Request node configured to fetch data from the Rick and Morty API. The response is processed using a JSON Path node to extract the "characters" element, storing it in a new column named "next page". A Column Filter node retains only the "next page" column, preparing the data for recursive processing. A Recursive Loop Start node initiates the loop, allowing the workflow to handle paginated API responses. Another GET Request node dynamically uses the "next page" URI to fetch subsequent pages. The response is filtered again using a Column Filter node to retain essential columns. The JSON data is converted into a table format using a JSON to Table node, expanding the structure to a manageable depth. A JSON Path node extracts the "next" value from the JSON, storing it in the "next page" column for further iterations. Another Column Filter node ensures only the necessary columns are retained. The Recursive Loop End node concludes the loop after a maximum of 42 iterations, ensuring all data is collected. A final Column Filter node focuses on the "results" column, which is then ungrouped using the Ungroup node to handle collection data types. The workflow concludes with a JSON Path node extracting specific fields like "id" and "name" from the JSON data, transforming them into structured columns for analysis.

See our Solution in KNIME Community Hub

Level: Medium

Description: You are a data scientist working for a grocery store that focuses on wellness and health. One of your first tasks in your new job is to go over the grocery's inventory and find patterns in the items they sell, based on nutritional composition. This will help them assess if they need to tweak their offerings, and where, to match their ethos of wellness and health.

Beginner-friendly objectives: 1. Load and normalize the grocery data. 2. Cluster the data based on its numeric values using an unsupervised learning algorithm such as k-Means. 3. Denormalize the data after clustering it.

Intermediate-friendly objectives: 1. Visualize the clustering results using scatter plots and analyze the distribution of clusters. Use flow variables to dynamically control the scatterplot and enhance interactivity. 2. Perform dimensionality reduction using PCA to simplify the dataset while retaining essential information. 3. Visualize the results with scatterplots as well.

What patterns can you find? What recommendations and insights can you come up with based on these patterns?

Author: Aline Bessa

Dataset: Groceries Dataset in KNIME Community Hub

Solution Summary: The solution involves clustering the normalized data to find data groupings based on nutritional attributes. We then create two components for the visualization of results: one uses an important dimensionality reduction technique named PCA to project the data onto two dimensions of high variance, and the other implements an interactive scatterplot for users to check the clustered data using different nutritional attributes as axes.

Solution Details: The workflow starts with the CSV Reader node configured to read grocery data from a file named "food.csv”. The Normalizer (PMML) node is used to apply Min-Max normalization to all numeric columns, scaling them between 0.0 and 1.0. Next, the k-Means node clusters the data into three groups using nutritional attributes, with centroids initialized from the first rows. The data is then denormalized to facilitate visualization and interpretation. In one component, the PCA node reduces the data to two dimensions of very high variance, retaining the original columns in the output. The Column Filter node retains only the PCA dimensions and cluster information for visualization, and an interactive scatter plot is created using the Scatter Plot (JavaScript) node, configured to display PCA results and clustering outcomes. In a second component, Single Selection Widget nodes allow users to pick two different nutrients to work as axes in a scatterplot of the data points, which are plotted in their assigned cluster color. The final steps of both components involve sorting and sampling the data to provide insights into the grocery items, with results displayed in Table View nodes for easy exploration.

See our Solution in KNIME Community Hub

Level: Easy
Description: You are a linguist studying linguistic diversity around the world. You have found a dataset that includes information about countries, such as the number of languages spoken, area, and population. The dataset also contains a column called MGS, which refers to the mean growing season in each country (i.e., for how many months per year crops can be grown on average). What are the top 5 countries by the number of languages spoken? What are the top 5 countries by the ratio of languages spoken to population? What are the top 5 countries by the ratio of languages spoken to land area? Finally, do you notice any patterns between the numbers of languages spoken and the MGS values?

Objective 1 (Easy): Learn how to import a CSV file into KNIME.
Objective 2 (Easy): Perform ratio calculations between columns (e.g., number of languages spoken and population size ratio).
Objective 3 (Easy): Sort the resulting table using specific criteria to select top 5 countries.
Objective 4 (Easy): Filter the top rows based on your selected criteria.

Author: Michele Bassanelli

Dataset: Linguistic in the KNIME Community Hub

Solution Summary: We solve this challenge with by computing the ratios between the number of languages spoken in a country and its population and area, and then ranking the countries.

Solution Details: After reading the linguistic dataset with the CSV Reader node, we answer the first question using the Top K Row Filter node, sorting by the "Lang" column. For the second question, an Expression node is used to calculate the ratio of languages to population, followed by another Top K Row Filter node to sort by the newly calculated ratio.
The third question is addressed with a similar approach, but the ratio is calculated between the number of languages spoken and the country’s area.
This challenge was adapted from Statistics for Linguists and uses a modified version of the dataset from Nettle 1999. In this case, the columns that were initially log-transformed are restored to their original values.

See our Solution in KNIME Community Hub

Level: Medium

Description: You have an EV and want to live in a place that has many available charging stations, and where it is also cheap to charge your vehicle. Given a dataset on chargers around the world, you need to find out the top ten cities that have the most EV chargers. You also want to consider which of those ten cities offer, on average, the cheapest KwH rates in cost. You should be narrowing down your city of choice to five after taking into account the costs.

Objective 1 (Easy): Clean data by removing addresses without real city names and extract country out of their addresses.
Objective 2 (Easy): Count the total number of EV charging stations by city and find the top ten cities.
Objective 3 (Easy): Of the top ten cities, find out which cities have the cheapest average cost to charge per kwH and show the five cheapest cities.
Objective 4 (Medium): Create a bar chart that allows you to compare the top ten cities in terms of average cost to charge per kwH. Create a widget that lets you select the cities you want to see in this plot and control the plotting with flow variables.

Author: Thor Landstrom

Dataset: EV data in the KNIME Community Hub

Solution Summary: We solve this problem by grouping the data by city, so that we can count every city's unique EV stations and also calculate their average cost. We sort and filter the data and create visualizations that allow users to compare cities' average EV prices interactively.

Solution Details: After reading the dataset with the CSV Reader node, we preprocess the data. First, we remove rows without proper addresses (Row Filter node) and then extract the addresses in the remaining rows for grouping (Expression node). The next step is to group the data by address with the GroupBy node, and sort the resulting data in descending order by count (Sorter node). We then extract the top 10 cities with the most EV charging stations with the Row Filter node. We use the Sorter and the Row Filter nodes again to extract the top 5 cheapest cities out of these 10, and visualize their average costs with the Table View node. In parallel, we create a component that allows users to compare the top 10 cities with the most EV charging stations in terms of average charging cost. This component has a widget that lets users select the cities they want to see in this plot, which turns into a flow variable that controls the plotting.

See our Solution in KNIME Community Hub

Enjoying our challenges?

They are a great way of preparing for our certifications.

Explore Certification Program

Just KNIME It! Leaderboard

KNIME community members are working hard to solve the latest "Just KNIME It!" challenge - and some of you have solved dozens of them already! Who are the KNIME KNinjas who have completed the most challenges? Click over to the leaderboard on the KNIME Forum to find out! How many challenges have you solved?

See leaderboard

Sign up for reminder emails

First Name *

Last Name *

Email *

Company *

Role *

Department *

Country *

Yes, I’d also like to receive regular news updates from KNIME and I accept the privacy policy.

*KNIME uses the information you provide to share relevant content and product updates and to better understand our community. You may unsubscribe from these emails at any time.

Just KNIME It!

The challenges are back! Register now

Challenge 7: AI-Generated Football Scouting Report

Previous challenges

Challenge 6: Heart Failure Prediction

Challenge 5: Interior Design with GenAI

Challenge 4: Rick and Morty Catalog

Challenge 3: Nutritional Composition of Groceries

Challenge 2: Linguistic Diversity

Challenge 1: Best Places to Live with an EV

Enjoying our challenges?

Just KNIME It! Leaderboard

Sign up for reminder emails

Previous Just KNIME It! Challenges