KNIME logo
Contact usDownload
Read time: 10 min

The spreadsheet user’s guide to data mastery

Meeting a new standard for data literacy

September 19, 2023
Automation inspiration
Stacked TrianglesPanel BG

The seed that grew into the killer app we all have on our computers was planted in Dan Bricklin’s brain in 1979. He was watching a professor create a financial model on a blackboard in a Harvard Business School classroom. Whenever the professor changed a parameter or identified an error, he erased and rewrote. Bricklin replicated the process on a computer with an electronic spreadsheet. Changing a parameter here now caused a ripple through the table, changing all related values instantly. VisiCalc was born.

And it was revolutionary. It was one of the most critical factors in making personal computers more appealing to ordinary individuals, because it was no longer necessary to be a programmer to use one. Accountants turned 20 hours per week of bookkeeping chores into a few minutes of data entry; budget planners replaced their paper ledgers with it to ask “what if” questions about numerical data and get immediate responses (which wasn’t even possible for giant corporations using mainframe computer systems).

Today, Excel spreadsheets are more than simple calculators. They’ve continued to transform the way businesses handle data and calculations by adding data analysis features and taking increasing advantage of computer processing power and performance. But we’ve reached the point at which spreadsheets can break down.

The table of columns and rows for the corporate balance sheet made perfect sense five years ago. Now, though, we’re dealing with a lot more data, data types, and data sources. That one neat table has become an intricate web of worksheets. This is the point at which spreadsheets become hard to follow, hard to edit, and hard to debug. It’s when volume and complexity swamp what’s achievable within a spreadsheet.

Modern analysts want to have all their critical data at their fingertips to use easily, at any time, for more informed decision making. With stored data growing five times faster than the world economy, there’s an increasing need to use data analysis techniques capable of generating insights from massive volumes of data—data too large to be manipulated by tools with less information processing power.

A new data literacy standard

Data literate individuals need to understand, interpret, and communicate with data. While Excel is a valuable tool to becoming data literate, it’s just one piece of the puzzle. To truly make sense of all the information we have, we need to expand our data literacy skills to include tools and techniques that deal elegantly with the volumes and complexity of today’s data.

To meet the new data literacy standard, we need to improve our data handling skills to contend with large datasets; we need to learn data integration skills to merge complex data in multiple formats from multiple locations; we need to acquire more expertise in advanced analytics techniques to swiftly reveal insight otherwise concealed from us; we need to gain skills in building interactive data visualizations to inject domain expertise as we explore our data; and learn how to scale our work to share and communicate analytic insight with colleagues. We need a data literacy skill set with the analytic breadth and depth that enables us to work with data throughout its lifecycle.

The biggest entry barrier to gaining these skills is user-friendliness of the tool. This takes us back to VisiCalc again.

Remember how VisiCalc gave us a user interface to do accounting more easily (Dan Bricklin calls it a natural, program-by-example interface)? Today, low-code, no-code data science platforms give us an intuitive user interface to access and apply advanced analytics techniques.

The user-friendliness of these tools makes it easier for individuals to not just build an analytics process, but understand the flow of processes and relationships between the different elements. Data analysis processes are assembled by dragging and dropping building bricks that each perform discrete actions on the data. When you run the analysis process, your step-by-step instructions are applied to the data. This clear, intuitive environment enables anyone to start dipping their toes into the world of data science.

Low-code, no-code data science tools let you visualize the process of what you're doing with data and provide you with that "program by example" view – that's a table shown for every output.

A low-code KNIME workflow: Here, the second VLookup node adds a date to the table. You can see what the function has done in the table, which is automatically shown in the bottom half of the screen.

A shift in mindset from worksheets to workflows

The first steps to any analytic process are the same. You gather the data, clean it up, and then apply some calculations. We’re essentially following those steps no matter what, whether we’re preparing a balance sheet or a predictive model. These steps are very repetitive.

In Excel, we start with our data. We perform a number of calculations on it and get it into shape. When we’re finished, we have our table of clean data, but no historical record of what happened. We’ve lost the process, and next time, we have to repeat it.

In the data science tool, we start with the process. We define the series of operations to build a blueprint or “model” process. We run our process by sending the data through it. We never modify the source data. And we keep our process.

Let’s have a closer look at some Excel automation examples.

Automate prep and cleaning chores for data integrity

Bad data integrity is like a bad phone connection. When you’re missing half the words, the conversation won’t be very informational. Data is also pointless if you can’t get any information out of it. When it’s full of missing or duplicate values, it’s not workable. To make it informational, you need to prep and clean the data to make it workable. While there are ways to clean data with Excel, coverage of data operations is low, and you have to do each operation manually for each new task.

You can transpose these steps into a low-code workflow that you can reuse and share again and again. For the purposes of this guide, we'll use KNIME workflows as an example, but there are other low-code tools you can use. (Note, however, KNIME is open source and free to download.) Each step of the workflow is documented so that when shared, each team member easily understands what is happening to the data. You can export the clean data to any format your team members prefer, to their favorite BI tools, or even back to Excel. The entire workflow can also be bundled into a so-called component, and like a macro, used and shared in any workflow. When a change is required in the component, all edits are automatically rippled through to all the workflows that contain this component.

Explore these Excel automation resources around prepping data:

Merge more than 2 data sources? Are you kidding?

When we’re using spreadsheets for data management, data is usually stored in one or more source columns in the spreadsheet, while the other columns contain the macros and formulas for the processing. For example, when you reuse the spreadsheet for this month’s analysis, you copy and paste the new data manually into the dedicated source columns. But when the data rows for the current month are more than the data rows for the previous month, the copy/paste of the data covers regions of the sheet where macros are as yet undefined, producing wrong, unverified sums and macro results.

The lack of a verified, reliable, repetitive way to collect data from multiple sources limits reusability of spreadsheets to very simple processes.

With more sophisticated techniques to pull and import data from multiple sources, you a) don’t have to use Excel anymore as your data source, but you can collect the data from where “it’s at” e.g., scrape the web, pick up data from an API, collect it from a database, and more. And b) you’ll get a verifiable, reliable, and reusable way to collect data. Using data science techniques, you’ll also have no limitations on the volume of data you’re pulling, and you’ll have an accelerated way of processing data.

Here’s some Excel automation inspiration for connecting to source data:

Data visualization

The most important step in the whole data analytics process is communicating the results effectively. But “you have to undergo all manner of coding contortionism to programmatically export a picture via a damn chart object” said a VBA user on Reddit on a post about the worst and best of VBA. One of the most commonly (tediously) developed macros is one that generates summary statistics and pivot tables, and creates charts from raw data.

With the data science tool, you can choose from countless visualizations to display data in colorful charts—no contortionism required. You can choose from a large group of visualization components to simply plug in to your workflow. If you have a Python scripter on your team, they’ll be pleased to know that you can access the vast range of Python-based visualization libraries through KNIME—all without needing to learn to code in Python.

Here are three Excel automation resources on storytelling with data, including a few data viz examples:

Are macros the reason people are looking for Excel alternatives?

Talk to a VBA expert and you’ll hear how brilliant macros are: pieces of code bundled to run repetitive tasks automatically, saving time, and hiking up efficiencies. Talk to IT and they’ll warn against all the harmful code they could contain; Excel’s own security feature even blocks or warns against running them. Talk to a non-VBA expert and you’ll hear about their frustrations when they need to adjust some of the macro’s functionality and then battle with code to do the edits.

While KNIME does require a basic understanding of data analytics concepts, the visual interface for building repeatable, reusable workflows removes the need to code. That means your workflows for data automation can be shared and understood by non-technical users. They’re versatile and can handle complex data pipelines and aren’t limited to a single application. Teams can use them to involve multiple data sources, transformations, and analytical processes across various applications.

Hear from two spreadsheet users in accounting and auditing on the topic of Excel automation with these two resources:

Collaborate with 20/20 vision

There are times when you have to work with a spreadsheet someone else created. You open it and close it immediately. It’s so unwieldy. Your colleague works differently than you, and you know that before you can start working with it, you’re going to have to decipher formulas that extend over multiple rows, and maybe look into the VBA code of the macros. You’re thinking it’ll be faster if you just redesign it.

Manual modifications to spreadsheets are often undocumented and consequently lost over time, making it hard to reproduce results”, said Michael Raithel, Senior Systems Analyst at Westat. Collaborating on spreadsheets that are not-documented is practically impossible.

In the low-code, no-code data science environment, the data underlies the formulas now, which makes it easy to follow. Arrows guide any user through the task; explanatory notes can be inserted underneath each step of the process; longer annotations can be added around groups of related steps. It’s also easier to make changes to individual parts of the process because you can isolate them. That’s trickier with a spreadsheet, because of all the dependencies that are created without any clear structure explaining them.

The inherent good documentation of the process provides transparency, easier sharing and re-application of processes throughout the team.

Automating data analysis in KNIME, a low-code, no-code analytics platform

Learn how commonly used Excel functions and techniques are done in KNIME:

Join the data science club

When you’re considering what tools might be best to automate Excel data analysis, KNIME—as an open source, low-code, no-code platform—gives all users free access to a full data science framework, and all the advanced capabilities and automation this provides.

Most notably, KNIME Analytics Platform is:

  • Open. There’s no need to replace Excel if you don’t want to. KNIME is integrative, so it will sit atop, alongside, or even inside your existing setup.

  • Collaborative. Data teams can create and share work easily and integrate their favorite tools and technologies.

  • Reusable. Repeatable processes are written once and can be wrapped into a “component” (like a macro).

  • A community. You can get your questions answered quickly by community members on the KNIME Community Forum, and tap into KNIME Community Hub to find free solutions to download and use in your own projects.

Try it out yourself. Download KNIME—it’s open source and free of charge.

The Perfect On-Ramp for Spreadsheet Users

The Perfect On-Ramp for Spreadsheet Users

March 15, 2023 | by Dipti Panchwadkar