Perspective_

Why Agentic AI Breaks (and How Workflows Fix It)

The good news is, the breaking points of agentic AI are knowable and so is the solution: structured workflows.

Alex Adamczyk

Vice President, Sales Engineering & Rapid Prototyping

As large language models (LLMs) rapidly advance, their ability to execute tasks by leveraging external “tools” has expanded what AI can achieve. These tools — API calls, database queries, computational functions, or custom utilities — supercharge LLMs, moving them beyond simple text generation into operational systems. This is the beating heart of the AI revolution. But an AI that can think like a Ph.D. yet can’t send an email, retrieve information, or connect to enterprise data will have limited real-world impact. We don’t want AI intelligence sitting idle, we want it orchestrating operations across legacy and modern systems, fueling mission-critical decision-making and automating execution.

This is the premise for “agentic AI,” where AI agents take actions on your behalf. However, after conducting our own informal literature review, we’ve confirmed that when you give agentic AI too many tools, it stumbles, it picks the wrong tool for the job, doesn’t know it has the tools it needs, or misuses the tools that it has.

The good news is, the breaking points of agentic AI are knowable and so is the solution: structured workflows.

Too many tools

In theory, agentic AI can leverage a tool as an external function for a task like booking a flight. The AI uses descriptions of available tools—one for browsing flights, another for processing payments, and so on, to determine which ones to harness. Successful execution requires the AI to:‍

Understand tool descriptions, parameters, and contexts.
‍Select the right tool for the query from all available options. ‍
Use the tool correctly so the agent takes the action the human wants them to. ‍
Process results from the tool and integrate them into the task at hand

Many commercial providers are increasing the number of tools they allow you to give each agent — increasing their ability to perform, in theory. But our research found significant variance between products, with limits ranging from 80 to 512 tools per agent.

While those ceilings sound impressive, just because a platform allows the use of that many tools doesn’t mean they will work. Adding a new tool may expand theoretical capability but, in reality, it’s increasing cognitive load. Like a human digging through an overstuffed toolbox, the agent can spend more time searching than executing.

Token budgets are real budgets

Even if your AI can technically access hundreds of tools, it can only effectively manage a fraction of them at a time. Long before you approach platform tools limits, you’ll encounter a more practical constraint: the context window.

Every LLM is limited by the maximum number of tokens — or blocks of language — it can process at one time. This “context window” defines how much information the model can remember and apply in a single interaction. If you’ve ever had a long conversation with a chatbot and suddenly it forgets what you already told it, you’ve likely exceeded its context window. For most conversations that isn’t an issue, but it’s a major problem when you start trying to use tools.

Each tool is effectively a block of text instructions — tokens that describe its purpose, parameters, and usage. But every tool needs to fit into the model’s context window alongside the user’s request and the model’s reasoning steps.

Here’s where scale starts breaking down. A single, complex tool can consume upwards of 1,600 tokens. You’ll run out of tokens in your model’s context window well before you hit the limits on the number of tools you're allowed to use across these platforms. And because token usage directly affects cost, every unnecessary — or ineffective — tool increases both financial overhead and computational burden. If every query loads every tool definition, you may be burning hundreds of thousands of tokens per interaction.

Attention is finite

Even when tools technically fit inside the context window, performance degrades as that window fills. LLMs use “attention” to focus on the most important parts of the text and make connections between different parts of your query. But just like humans, their attention is finite. If you spread the model’s attention over dozens of irrelevant tools, its focus will be diluted and critical context can be overlooked.

This isn’t just a theoretical concern; in practice, models begin making tool selection errors well before reaching their maximum context limits. Studies show that when more than five tools are made available, LLMs increasingly pick inappropriate options or provide incorrect parameters, a result of hallucination, confusion, or even sycophancy effects (where the model guesses in order to please the user, even if no tool is appropriate).

Worse yet, tool errors compound over multiple steps, further limiting the ability of LLMs to manage sprawling toolkits. One incorrect tool selection in a chain can cascade into broader task failure. Performance data backs this up. For example, one study notes that state-of-the-art models only picked the right tool 65% of the time, even when provided with all the necessary tool definitions for solving tasks across multi-tool domains. The implication is clear: increasing tool count does not increase reliability.

Why workflows outperform stacked tools

If overloading a single agent with more tools degrades performance, the answer is to distribute those tools across workflows. Instead of one agent juggling dozens of tools, tasks are divided into structured steps and routed to domain-specific agents equipped only with the tools they need to do their respective jobs.

So, rather than asking one agent to be a jack-of-all trades, you design a workflow that breaks complex processes into subtasks and assigns each to the appropriate specialist — whether finance, logistics, data processing, or knowledge retrieval.

Each agent stays well within the performance sweet spot — a handful of tools, clear context, minimal attention dilution. The platform orchestrates; the specialists execute.

At a high level, the process of building a workflow is relatively simple:

Define objectives: Start by clearly identifying the problem you hope to solve to ensure that AI tools are truly aligned with the mission.

Assess the current state: Evaluate existing processes, data systems, and technological infrastructure to determine if there are any gaps that need to be addressed.

Validate data: AI workflows are data-intensive, so ensure you’re using high-quality, clean, and secure data sets.

Tool selection: Choose the right tools for each step of the process. Agentic workflows can combine tools, leveraging machine learning (ML), models, existing APIs, SQL queries, or simple “if-then” rules into a single complete process that uses the right technology for each step of the task.

Test, Iterate, and Improve: AI systems require active monitoring and iterative improvements, so build feedback loops and performance metrics into workflows to ensure long-term success and adaptability.

This is the workflow architecture that turns agentic AI from a demo into an enterprise capability.

Solutions like LIGER® provide the orchestration layer, bundling tools logically, enforcing security controls, and routing tasks across agents, so organizations don’t have to build and manage that complexity themselves.‍

The bottom line: less is (reliably) more

Agentic AI works best when agents are lean and workflows are clean. The answer isn't bigger toolkits — it's smarter architecture. Use workflows to “conquer” by dividing complex tasks into simple steps that the LLMs of today can reliably complete. Then run these workflows in an orchestration layer that lets the agents tie it all together across the enterprise.

That's how you move from a chatbot juggling five tools to an enterprise capability delivering dozens of coordinated services at scale — reliably, securely, and repeatably.

Perspective_

Too many tools

Token budgets are real budgets

Attention is finite

Why workflows outperform stacked tools

The bottom line: less is (reliably) more

Related Content_

The AI Orchestration Layer: Built for the Mission, Not the Model

Engineering Resilience: How Data and AI Shield Government Agencies from Exploitation

Planning for the Unknown: Digital Twins and the Future of Space Operations

Turning Provider Directories into National Infrastructure

28 Systems. One Training Enterprise.

Dispersion as Advantage: Designing Logistics for a Contested Environment

Soldier-centric Design for Mission Impact

Designing Intelligent, Resilient Mission Systems

The Future of Government Transformation Is Orchestration, Not Overhaul

Ready to accelerate your mission?

Contact_

Thank you for your submission!