Good Data Makes Good Agents

Agentic systems are only as good as the data they have access to – and their effectiveness depends on how representative and holistic that data is. Just like humans had trouble with the data lake concept (which, way too often, turned into a data swamp), AI can also use a bit of help with putting some structure around the data wilderness at the heart of most companies.

Agents need broad-data access to be effective

Let’s start with an easy example, that infamous smart fridge. Yes, I can ask the built-in AI what dishes I can prepare from its contents. But the moment I ask it “what shall I cook tonight” it is at a loss and it will start asking all sorts of questions: how many guests are you cooking for? Are there vegetarians? What do they like? When does it need to be ready? A truly useful “kitchen agent” needs to have access to my calendar and also have some memory of past events and ideally even a bit of a rating system (“dinner went well, everybody liked it”).

In this example, we can simply create a system that combines all of that additional information (inside or outside the fridge AI, it doesn’t matter) and give the agentic system access to all of that. With all of this data, it’ll likely figure out what I should be cooking. If I am not a professional host and have not invited guests every evening for the past two decades, the amount of data should still be manageable.

Agents need context

In the real world it’s more complex. There are many more data sets to consider and they are massive. Just consider asking a question about a customer in a larger company and expecting a holistic response: the agent will need access to my CRM, ERP, ticketing system, forum setup, call logs, and many other resources that might contain information about possible customer touch points. You may even want to add some publicly available information to the mix.

With all of this data, agents can naturally make deeply informed decisions based on all relevant data sources. Many companies, including your CRM, ERP, ticketing system (and all of the above) are now trying to upsell you on tool-specific AI features. The example above already illustrates why it is probably not a wise idea to spend lots of money to add a plethora of individual AIs to all of your tools and data sources. They can help you operate that tool, of course, but can use only the data that’s available within their silo, making their insights very limited.

On the other hand, your data platform vendor will now say: why don’t you just copy all of this data over to our system and our AI will take it from there. However, by copying things around, you are not only wasting time and space but, more importantly, you inadvertently lose structure and meta information. The results will never be as good as if the agent had access to the original sources.

Agents need well-integrated data

Before continuing this thought, let us take a quick peek under the hood: how *does* an agent benefit from all of this information? Agents don’t work in isolation but make use of more specialized agents and sometimes also of less agentic systems referred to as “tools”.

Those tools provide information or take care of tasks that the parent agent doesn’t know how to do. Remember the inability of early chatbots to do math? Well, nowadays they reach out to a tool that does math properly. And similarly, in our example above, the agent will reach out to a tool that knows how to get complete information about a customer. Note that this tool does not just provide access to the CRM and all those other systems directly. Why? Because the agent couldn’t care less about which systems you are using to store various parts of customer info. It only wants information (and all of it, please!) about customers so it can continue doing its job using a holistic perspective.

If this sounds a bit like a data integration challenge, that’s because it is. For truly powerful agents, we need to provide them with the right type of tools: tools that give them the information they need, not just more or less blind access to data sources.

That was and is a data engineering job. Can we sometimes ask another AI to take care of that? Of course! But for the types of information that are needed regularly (such as: what do we know about this customer), spending the extra effort and providing our agents with the appropriate view on that information is time well spent. Because the less grunt work our agent has to do, the more it can focus on the real job, just like us humans.

And just like before, removing the data integration from the data platforms also future proofs your investment in agents. That way, when you swap out your ticketing system, you only need to adjust that middle layer of integration tools. The data mesh is alive and well – good data makes humans effective, and it now also helps agents do their jobs better, too.

Good Data Makes Good Agents

Agents need broad-data access to be effective

Agents need context

Agents need well-integrated data

You might also like