Back to Essays

AI Guardrails: What the Status Quo Gets Wrong

Vivek Khandelwal
Vivek KhandelwalChief Business Officer, CoFounder @ CogniSwitch
Feb 13, 2026·11 Min Read·Updated May 2, 2026
Reviewed by: Dilip Ittyera — CEO & Co-Founder, CogniSwitch

Last blog, we had covered why evals are not audits. Claude rephrased it as - evals are vibe audits. And it's only natural to ask - can we do an audit agent response in realtime and catch policy violations?

That's what guardrails are supposed to be. Before we get there, let's step back and figure - how were guardrails enforced before Nov '22? Did we even have guardrails before GPT?

Guardrails Are Not New

Of course we did. We just didn't call them that. Remember - how sign-up forms required a business email and Gmail IDs won't be accepted? That was a classic marketing guardrail.

  • Every claims processing system had validation rules.
  • Every EHR had required fields and range checks.
  • Every prior auth workflow had decision trees that enforced policy before a human saw the output.

If a dosage fell outside the approved range, the system rejected it. Not probabilistically. Deterministically.

These weren't sexy. Nobody built a pitch deck around them. But they worked. A business rule that says "pediatric dosage cannot exceed X mg/kg" doesn't hallucinate. It doesn't drift.

What Actually Changed in the Post-GPT World

Pre-GPT vs Post-GPT — Prior Authorization ExampleFig 1
Item
Pre-GPT
Post-GPT
Input
Structured form — CPT code, ICD-10, patient age
Free-text clinical note with narrative
Processing
Decision tree matches codes. Binary: present or absent
Agent interprets narrative, infers meaning
Output
Approved/denied. Reason code. Done.
Natural language explanation — generated text
Guardrail
500mg > 200mg ceiling → reject. No ambiguity.
Content safety checks the surface, not the substance.

"Both input and output went from structured to unstructured. The guardrails that worked on structured data have nothing to grab onto."

When faced with a hard question, we answer an easier one instead, and don't notice we've done it.

Daniel Kahneman

The Substitution Effect

Fig 2
The Hard Question

"Does this AI output comply with our SOPs?"

Requires: reasoning over policy logic, version checks, patient-category matching. Slow, deliberate, System 2 work.

Needs: formal knowledge structure, ontology, versioned policy rules.

Click toggle to switch between states

Guardrails Are THE Feedback Layer

Here's the thing most teams miss: guardrails aren't just a safety or compliance enforcement system. They provide feedback and set the foundation for a closed-loop learning system.

The point isn't just to block bad outputs. It's to catch what went wrong, figure out why, and feed that back into the system so it gets better.

  • Shallow feedback: "Toxic content detected" → block the response, move on. System learns nothing useful.
  • Rich feedback: "Response cited Protocol v2021 when v2024 changed the threshold from 5.0 to 7.5. Knowledge base needs update." → System learns exactly what to fix.

The first triggers a whack-a-mole feeling. The second is an actual learning loop. Most "guardrails" are in the first layer. Regulated industries need the second.

Shallow vs Rich Feedback LoopsFig 3
Catch Type
Toxicity detected
PII leak found
Jailbreak blocked
Protocol version mismatch
Feedback Quality
Shallow — block and move on
System Response
Whack-a-mole (no learning)
✓ System Connected

Most guardrails are shallow. Regulated industries need the rich feedback loop.

Guardrails = Real-Time Audits

If evals are post-deployment quality checks, and audits are post-deployment compliance proof — then guardrails should be audits done in real time, with enforcement. Not "did the AI say something harmful?" but "did the AI follow Protocol X, Section 4.2, using the correct version?"

The Guardrails 2×2

TimingBasic QualitySOP ComplianceCoverage Gap
Runtime (before response)
Post-deployment (after the fact)

Meet Human-In-The-Loop

"Don't worry, we have a human-expert-in-the-loop." This is the phrase that ends the safety conversation too early. And to be fair — HITL makes sense as a bridge.

But here's where it breaks: human-in-the-loop distributes liability without solving the underlying problem. Throughput beats accuracy. Every time.

We expanded this into a full diagnostic: Phantom Human-In-The-Loop →

The Alternative: Neuro-Symbolic Guardrails

Guardrails need to be grounded in a source of truth — not 1,000 embeddings of chunks of text.

The LLM handles what it's good at — interpreting unstructured language, extracting clinical intent. Then a symbolic reasoning layer — ontologies, decision graphs, versioned policy rules — checks that intent against your actual SOPs. Deterministically. With traceability.

That's how you get from "does this sound right?" to "does this provably follow the right protocol?" — which is what guardrails were always supposed to do.

Explore the architecture: Neuro-Symbolic AI — A Practitioner's Taxonomy →

Frequently Asked Questions

Most enterprise deployments layer SOP compliance checks on top of content safety guardrails. What's wrong with that sequence?

Nothing is wrong with the sequence. The problem is when the content safety dashboard shows green and the compliance conversation stops there. Content safety asks: is this output harmful or off-topic? SOP compliance asks: does this output follow the specific version of Protocol 4.2.1 active on this date for this patient category? The second requires reasoning over policy logic, version checks, and patient-category matching. A green content safety dashboard says nothing about whether the right protocol was applied. They're different failure modes with different fixes.

Neuro-symbolic guardrails require formalized ontologies and versioned policy rules. We have 200 SOPs updated quarterly. How does this scale operationally?

Without a structured layer, those 200 SOPs are fed to an LLM that blends them by proximity and can't tell you which version governed any decision. The practical path: identify which SOP clauses directly affect your highest-stakes automated decisions, formalize those first. The quarterly update cadence is manageable when changes to specific clauses trigger a targeted impact analysis rather than full reingestion of everything.

HITL is a deliberate design choice — it's how enterprises manage liability when AI is wrong. Why is it the wrong architecture?

HITL is the right architecture as a bridge. The problem is when it becomes permanent and unmeasured. Human review distributes liability without solving the underlying problem — throughput beats accuracy every time. A reviewer checking fifty outputs per hour isn't catching semantic compliance errors; they're checking whether the output looks reasonable. The question isn't whether to have a human — it's what you're actually asking the human to verify.

Most teams treat guardrails as output blocking, not as a learning signal. What does the feedback loop actually look like in practice?

The shallow version blocks an output and logs 'policy violation.' The system learns nothing. The deep version traces the violation to a specific source: 'Response cited Protocol v2021 when v2024 changed the threshold from 5.0 to 7.5.' That trace tells you exactly what to fix in the knowledge base. In practice, the feedback loop requires three things: a violation catch with a specific clause reference, a path back to the source document, and a human decision about whether to update the knowledge or the protocol.

Pre-GPT validation rules and decision trees were deterministic and worked. Why not just reimplement them instead of adding a neuro-symbolic layer?

Pre-GPT guardrails worked for structured inputs — form validation, range checks, required fields. The new failure mode is unstructured language. The neuro-symbolic layer recomposes rather than replaces: the LLM interprets unstructured language and extracts clinical intent, then the symbolic layer checks that intent against your actual SOPs. Deterministically. That composition gives you both natural language handling and compliance-grade verification.

About the Author
Vivek Khandelwal

Vivek Khandelwal

Chief Business Officer, CoFounder @ CogniSwitch·M.Sc. Chemistry, IIT Bombay

Vivek Khandelwal is the Chief Business Officer at CogniSwitch, where he leads go-to-market strategy, enterprise partnerships, and the company's thought leadership programs. He is the author of Signal, CogniSwitch's weekly newsletter that translates the complex machinery of enterprise AI infrastructure into clear, actionable intelligence for practitioners and executives in regulated industries.