Agentic AI in Banking: Where the Real Opportunities and Constraints are

Agentic AI is quickly becoming one of the most discussed ideas in banking. Yet the term is still used loosely, often covering everything from copilots and assistants to workflow automation and autonomous decisioning. In a recent interview, Wesley Wuyts, Senior Customer Advisor at SAS, offered a more grounded view. Rather than treating agentic AI as the next fashionable label, he framed it as a response to a practical gap that banks have been wrestling with for years: the distance between insight and action.

That starting point matters. For Wesley, the real promise of agentic AI is not simply that it can generate language or summarise documents. Its significance lies in its ability to help close what he described as “the decision execution gap”. Banks have long used statistical models and machine learning to produce scores, probabilities and recommendations. What has often been missing is a reliable mechanism to turn those outputs into timely action within operational processes. In that sense, agentic AI is being positioned as a means of connecting analysis to execution.

Wesley also pointed to two other drivers behind banks’ interest. One is the pressure to reduce cost to serve, especially in operationally heavy areas such as KYC investigations, complaint handling, and other document-intensive functions. The other is productivity, particularly in software development environments, where the conversation is moving beyond question-and-answer support and towards systems that can take real actions within a workflow.

Together, these use cases show why banks are paying attention. The attraction is not only intelligence, but also orchestration and follow-through.

What makes a system genuinely agentic?

One of the most useful parts of the conversation was Wesley’s attempt to draw a clearer boundary around what should count as agentic AI. This is important because the market is already full of systems being described as agentic when they are little more than improved interfaces on top of existing automation.

Wesley highlighted three core attributes:

First, an agentic system must work towards an objective and break that objective down into the tasks required to achieve it.
Secondly, its planning must be dynamic. In other words, it should not simply follow a fixed script or a hard-coded set of business rules. It should respond to the specific input it receives and determine an appropriate course of action in that moment.
Thirdly, it must have a degree of autonomy in selecting the tools it uses. If every step and every tool choice has already been predefined, then the system is far closer to deterministic automation than to agency.

This definition is useful for banking leaders because it cuts through some of the hype. It also exposes why agentic AI is creating new governance questions. Once a system is decomposing objectives, planning dynamically and choosing its own tools, the issue is no longer simply whether a model is accurate. The challenge becomes whether the wider system can be trusted, monitored and constrained in a regulated setting.

Why some of the current hype is running ahead of reality

Wesley was notably cautious about the extent of automation that banks should hand over to agentic systems today. His view was that some use cases are being oversold, especially when the conversation shifts to fully autonomous end-to-end workflows.

He used KYC as an example. While there are many parts of the customer journey that can be automated or accelerated, Wesley does not believe banks should be comfortable giving complete end-to-end control to an agentic system without a human in the loop. The reason is straightforward. If something goes wrong, the consequences can be serious, from regulatory exposure to reputational damage. That makes blanket autonomy a poor fit for many banking environments, however compelling it may appear in a product demonstration.

He was similarly sceptical about the current enthusiasm for multi-agent systems.

In principle, there may well be a future for multiple agents working together. In practice, Wesley’s point was that many firms are still struggling with the governance and reliability of a single agent in production. If managing one agent in a regulated environment is already difficult, a swarm of interacting agents introduces even more complexity. It may work well in a demo, but that is a very different standard from production-grade reliability in banking.

This distinction between what is impressive in a demonstration and what is robust in production is one of the strongest themes from the interview. It is also one that financial services professionals should pay close attention to.

The market narrative often rewards ambition and novelty. Banks, by contrast, have to live with operational failure, supervisory scrutiny and customer impact.

Why do pilots stall when they meet production conditions?

Wesley identified three practical barriers that frequently appear when organisations try to move from pilot to live deployment.

The first is data access. In a pilot, teams typically work with curated extracts of data in a controlled environment. In production, the picture changes. Data freshness matters. Personally identifiable information must be handled correctly. Access controls come into play. Most importantly, the data seen in a live environment may differ materially from the data used during development. Wesley’s point was that agentic systems are already prone to hallucination by nature. If the underlying data context also changes, that problem can be amplified.

The second barrier is integration depth. Many banks still rely on systems that have closed APIs or, in some cases, no usable APIs at all. Yet an agent can only create value if it can access the tools and systems needed to complete its objective. Where integrations are shallow or fragmented, the promise of autonomous execution quickly runs into hard architectural limits.

The third challenge is auditability. If an organisation wants to run these systems in production, it must be able to understand how outcomes were reached and which tools were used. Wesley was clear that banks cannot simply ask the model to explain itself and treat that as an audit trail. That would amount to asking the very system under review to produce its own account of reasoning. Instead, deterministic logic needs to be built around the agent so that the decision process can be evidenced independently. That is a subtle but important point, and it goes to the heart of control design in regulated settings.

Accountability does not reduce just because the agent does more work

Our interview also addressed one of the most important questions in agentic AI: accountability. Wesley’s view was clear and compelling. Even if an agent does 80 per cent of the work, the human’s accountability is not reduced by 80 per cent. What changes is the nature of that accountability.

In his framing, the human becomes accountable not for carrying out every individual action themselves, but for the conditions under which the system was allowed to act on their behalf. That means banks need to define clear boundaries of autonomy, what Wesley described as a kind of “decision rights envelope”. The system must know where it can operate freely, where it must escalate, and where it cannot go at all.

This is a useful way to think about oversight because it avoids a false binary. The question is not whether a human touches every step. It is whether the rules of delegation are clear, defensible and appropriate to the risk involved. That is a much more relevant governance question for banks than the simplistic idea that a person can remain accountable merely by sitting at the end of the process and clicking approve.

The danger of human oversight becoming rubber stamping

That said, Wesley did acknowledge the risk that human oversight can become hollow if workflows are badly designed. In areas such as fraud management, he suggested that the better approach is to use the system to route high-risk or complex cases to humans while giving more autonomy in lower-risk cases. In that model, the point is not to make the human review everything. It is to focus human attention where judgment is most needed.

Just as importantly, when an agent does escalate a case, it should provide the full context behind that escalation. A simple prompt asking the human to approve or reject is not enough. The reviewer needs visibility of the evidence and context the system used so that their intervention is informed rather than ceremonial. This is a crucial design principle. Without context, human oversight easily turns into a rubber-stamping exercise. With context, it has a far better chance of remaining meaningful.

Build versus buy: where banks should hold their ground

On architecture, Wesley took a pragmatic layered view. Not every part of the stack needs to be built in-house. In his view, foundation models are increasingly becoming commodities. Since broad access is available across the market, this is an area where buying makes sense.

Where banks should retain more control is in the layers that encode their own decision logic, policy rules, prompting and routing logic. These are the elements that reflect the institution’s risk appetite, operating model and differentiation. According to Wesley, these are precisely the areas that should not simply be bought off the shelf, because doing so weakens a bank’s ability to maintain an edge and complicates accountability if something goes wrong. Vendors can provide frameworks, versioning and reversibility, but the substantive content should remain in the bank’s hands.

That is a sensible distinction. It recognises that banks do not need to reinvent everything, but nor should they outsource the parts of the system that define how decisions are made and governed.

Measuring agents is harder than measuring models

The final major insight concerned measurement. Wesley noted that there are emerging good practices for evaluating agents before deployment, but that this area is still less mature than the measurement of traditional statistical models. That is an important point for banks that may assume model risk techniques can simply be transferred across unchanged.

The difficulty is that agents are not just models. They are systems made up of models, tools, prompts, policies and logic. Measuring them at runtime is even harder. Uptime is easy enough to track, but it tells little about whether the agent is doing the right job well. Accuracy sounds like the obvious metric, but Wesley questioned what accuracy really means in this context. Ultimately, banks need to decide which outcomes matter most and define quality in organisational terms, not rely on a generic metric supplied from outside.

He also highlighted the limits of using one probabilistic system to judge another, such as having one large language model evaluate the output of another. That approach may be useful, but it is not the whole answer. Wesley pointed to more deterministic methods, including classical NLP techniques, as valuable ways to assess answer quality in a more factual and statistical manner. His underlying message was that robust measurement will require deliberate design, not blind faith in automated scoring.

A more realistic view of what comes next

Taken together, Wesley’s comments offer a measured and useful perspective for financial services professionals. He is not dismissing agentic AI. On the contrary, he sees real potential in using it to close the gap between decision and execution, improve operational efficiency and extend productivity gains into more action-oriented workflows.

But his argument is that the real work begins where the demo ends. Banks need to think carefully about data access, integration depth, auditability, boundaries of autonomy, accountability design and runtime measurement. They need to resist the temptation to confuse agency with automation theatre. And they need to focus on where human judgment adds the most value, rather than preserving human involvement in name only.

For banking leaders, that may be the most important takeaway of all.

The future of agentic AI in banking is unlikely to be defined by headline-grabbing visions of autonomous systems running entire workflows alone. It is more likely to be shaped by the quieter, harder task of designing systems that can act usefully within clear limits, produce evidence that stands up to scrutiny, and support people rather than merely bypass them.

On that front, Wesley’s contribution was both practical and timely.

This interview is included as a reference source for our white paper: "AI and The Agentic Future of Banking", that features exclusive interviews with senior banking executives, technology leaders and AI experts, together with insights gathered through The Banking Scene's research, think tanks, industry events and third-party papers, which you can download on the link below.

Insights & Opinions

More recent News & Opinions