Move Fast and Break Trust: The Real Cost of Irresponsible AI Deployments

March 10, 2026 • Alexandros Fotiadis

ai governance risk rag agents

Move Fast and Break Trust: The Real Cost of Irresponsible AI Deployments

The story of 2025 in enterprise AI was not the capability curve. It was the governance gap. Models got more useful, agents got more autonomous, and organisations shipped them into production faster than they could write down what “acceptable behaviour” even meant.

This is a short field report — five failure patterns we saw repeatedly in the last twelve months, with the governance control that would have caught each one. None of these are hypothetical. The specifics are disguised, the patterns are real.

1. Hallucinated professional advice

A regional firm deploys a customer-facing assistant. It answers product questions well. It also, when asked, confidently invents a clause of local law and tells a customer they are entitled to a refund they are not entitled to. The customer cites it in a complaint. The firm honours the refund to avoid a dispute, then has to either retrain the bot or disable it.

What would have caught it: A scoped system prompt that explicitly refuses legal, medical, or financial advice and redirects to a human. A red-team pass before launch that tries exactly this. A logging and review pipeline where outputs about policy are sampled weekly.

The cost of the control is roughly one week of careful prompt design and evaluation. The cost of not having it compounds with every customer conversation.

2. Leaked system prompts and data

A startup ships a coding assistant. Within a week a user posts its full system prompt, including an internal URL and the rough shape of a vendor integration, to a public forum. Within two weeks a competitor has a suspiciously similar product. The prompt was not a secret — but nobody had ever written down that it wasn’t, either.

What would have caught it: Treating every system prompt as eventually public. No secrets, no internal URLs, no “don’t tell the user” instructions you would be embarrassed to see screenshotted. Secrets belong in tool responses behind an authenticated boundary, not in prompt text.

The control is cultural, not technical. It needs to be explicit on day one.

3. Autonomous agents spending real money

A client pilots an agent that can file expense reports. The scope creeps to “also pay small recurring vendors.” A prompt-injection in an invoice PDF — classic, well-known, entirely preventable — convinces the agent to change a vendor’s bank details. Two invoices clear before anyone notices.

What would have caught it: Authority separation. An agent can propose payments; a human or a second system with a different attack surface approves them. Anything that moves money has a per-transaction limit, a per-day limit, and an anomaly check. Invoices ingested from untrusted sources are treated as untrusted input — because they are.

The correct mental model is that every LLM is a gullible new hire on their first day. You would not give that person the company credit card.

4. RAG pipelines that leak private data

A support assistant is connected to an internal knowledge base. A customer asks a clever question. The retrieval layer, which has no concept of who is asking, returns a snippet from an HR document. The model, doing its job, incorporates the snippet into a helpful answer.

What would have caught it: Access-controlled retrieval. Every document in the index is tagged with who is allowed to see it. The retrieval query is filtered by the caller’s identity before the LLM ever sees the results. “Trust the LLM to redact” is not a strategy; it is a wish.

If your RAG architecture diagram shows the vector store as a single cloud without access control inside it, you have this bug. You just haven’t hit it yet.

5. Autonomous deployment without rollback

A team integrates an AI agent into their CI/CD pipeline. The agent can open pull requests, run tests, and — because it was expedient — merge them when tests pass. A weekend incident produces a stream of well-meaning but subtly wrong merges. By Monday the main branch is in a state no human fully understands.

What would have caught it: The same rules you apply to junior engineers. Agents can open PRs; they cannot merge to protected branches. Every agent action is attributable in the audit log. There is a kill switch, and someone has tested that it works.

AI agents in production are not a new category of thing. They are a new category of employee. Treat them that way.

A short responsible-deployment checklist

If you take nothing else from this, take the list:

System prompts are written assuming they will be public.
Scope is explicit: the agent refuses out-of-scope requests and says why.
Retrieval is access-controlled, not “filtered by prompt.”
Any action that spends money, sends external communication, or changes records requires a second authority.
Untrusted inputs (PDFs, emails, scraped pages) are clearly marked as untrusted through the stack.
Every action taken by the agent is logged with enough context to reconstruct the decision.
There is a kill switch. You have tested it.
A human reviews a sample of outputs weekly, and writes down what they find.
A red-team pass was run before launch and is rerun on significant changes.

The pattern behind the patterns

Every failure on this list shares a shape: a team treated an LLM as a safe, bounded function instead of a probabilistic, adversarially-targetable system. The controls that catch these problems are not exotic. They are the same controls you would apply to a new hire with unknown judgement — scope, supervision, authority limits, logging, and the ability to pull the plug.

We’ve helped several clients unwind exactly these kinds of incidents, and stand up the governance frameworks that prevent them happening again. If you’re about to ship something agentic and want a second opinion before it goes live, that is an hour well spent.