Garbage In, Garbage Out: The AI Knowledge Quality Problem

Vivek KhandelwalChief Business Officer, CoFounder @ CogniSwitch

Mar 15, 2026·10 Min Read·Updated May 2, 2026

Reviewed by: Dilip Ittyera — CEO & Co-Founder, CogniSwitch

The phrase - Garbage in, Garbage out - pops up naturally in most AI conversations. Everyone nods. It's self-explanatory. But when it comes to talking about the fix, most draw a blank.

Before we jump into GIGO - it's important to understand why the phrase was coined, what it meant, and why it swung back to relevance.

Origin Story of Garbage In, Garbage Out

In 1957, a US Army specialist named William Mellin was explaining early computers to a reporter. These machines cannot think, he said. Feed them a bad calculation, they'll process it faithfully and return something useless. Programmers started calling it GIGO — Garbage In, Garbage Out. By 1963 it was common jargon at IRS processing centers, where mispunched cards produced wrong refunds and literal bins of discarded magnetic tape. The problem was physical. The output was obviously broken.

Obviously — that problem got solved. Now, both code quality and data quality are well understood terms — mature disciplines. These systems deal with structured data inputs. Not knowledge.

The Shift

AI activates knowledge — in all forms — and net new problems start to surface. How a specific clause in an amendment contradicts the master agreement. This is not a data quality problem. This is a knowledge quality problem. And systems haven't been built for this yet.

This problem gets pushed downstream when agents execute based on poor knowledge. Agents that read every protocol, every policy, every guideline, and apply them across thousands of decisions. The problem is when enterprises jump straight from "we have documents" to "we have an AI agent" - skipping the step in between entirely.

Garbage Out: Then vs Now

Fig 1

1957 — Visible Failure

A mispunched card enters the system. The computer processes it faithfully — and returns something obviously broken.

Output: Wrong refund amount. Literal bins of discarded magnetic tape. Physical, visible, traceable.

Fix: Find the bad card. Re-punch it. Feed it back in. Problem solved.

Click toggle to switch between states

How Do You Know If You Have a Garbage In Problem?

You start with garbage out first. A clinician pulls up a patient summary from an AI bot. It's coherent. Well structured. Sourced from the same clinical protocols the hospital spent months documenting. She trusts it. Why wouldn't she?

What if two of those protocols disagree on the dosing threshold? One was updated 3 months ago. The other wasn't. Both pushed to the same knowledge base. The model blends them into a single answer.

You Spotted Garbage Out. What Next?

The typical response: "this doesn't seem correct. Knowledge is incomplete, add more." The knowledge base grows. The problem compounds. More documents means more versions of the same facts. More versions means more surface area for contradiction.

Context windows don't filter for authority. The model blends across all of it, and the blend gets smoother and more confident the more material it has.

3 Problems in Messy Knowledge SourcesFig 2

Item

What It Looks Like

Why It's Dangerous

Contradictions

Two documents say different things about the same fact

Model blends them into a single confident answer

Staleness

One version is outdated but still in the knowledge base

Context windows don't filter for authority or recency

Incompleteness

Critical knowledge exists only in Slack, email, WhatsApp

Formal document pile is always incomplete in ways you can't see

"The garbage doesn't look like garbage. You're not dealing with typos or missing fields."

Why Audit Trails Are Critical for Correct Diagnosis

LLMs will always generate things outside your knowledge. That's what they do. You cannot engineer your way to zero hallucination.

What one can do - and this is the part I wish I'd understood earlier — is tell the two kinds of failure apart. Because without an audit trail, everything looks the same.

Two Kinds of Failure

Fig 3

Knowledge Failure (Fix Upstream)

The source was wrong, conflicted, or outdated. The model did exactly what it was supposed to do.

Fix: Neuro-symbolic approach gives you the feedback loop — traceable to the source document, resolvable at the root. Fix the knowledge. Output improves.

System: Knowledge management with audit trails, version control, conflict resolution.

Click toggle to switch between states

How Do You Fix Garbage In?

AI cannot resolve truth. It can surface conflict - flag that two documents disagree, identify which version is newer. But the decision about which version is actually true? That belongs to a human. A domain expert.

Truth in a regulated industry is a judgment call, and it always needs a name attached to it. You need audit trails for the knowledge itself - who resolved a conflict, when, what the previous version said, why it changed.

Same thing applies when external contracts and policies change. Someone inside your organization needs to explicitly map that change to your internal SOPs and sign off.

Then there is the operational reality. Businesses run on Slack, WhatsApp, email chains. That knowledge is real. Your agents don't have it. Until there's a curation layer for conversational sources, your knowledge base will always be incomplete.

Knowledge Matrix — What You Know vs Don't

Feature	You Know It's There	You Don't Know	Action Required
Knowledge is correct
Knowledge is outdated
Knowledge is contradictory
Knowledge is missing

Production-Ready Knowledge

Not more documents. Not better retrieval. A governed, versioned source of truth where every update has an owner, a timestamp, and a record of what came before. This is what a neuro-symbolic approach to knowledge management allows for.

The Self-Diagnostic

I built a self-diagnostic around this. Seven questions - answering these takes less time than making instant coffee. Most organizations I've shown it to get stuck by question two.

Take the Knowledge Audit →

Frequently Asked Questions

More high-quality data has historically fixed model performance problems. Why is enterprise knowledge quality different?

More training data improves model capability. More documents in your knowledge base is a different problem — you're retrieving and reasoning, not training. When two clinical protocols disagree on a dosing threshold and you add a third document to clarify, the model blends all three. The answer gets more fluent and more confident, but the conflict persists. The fix is conflict detection at ingest, not volume at the point of failure.

You distinguish two failure modes — bad knowledge vs. model error on good knowledge. How do I tell them apart without a controlled experiment?

The audit trail is how you tell them apart. Without one, both failures look the same. With one, the diagnostic is: trace the wrong output to the source documents the agent retrieved. If those documents contain the information needed for a correct answer, you have a model failure. If the documents don't contain the correct information or contradict each other, you have a knowledge failure — fix it at the source.

AI surfaces conflicts but can't resolve truth — the domain expert decides. We don't have domain experts on-call for every conflict. How does this scale?

Scale doesn't require an expert on every conflict. It requires an expert on the high-stakes conflicts — where two documents disagree on a fact that directly affects an automated decision. The resolution is stored with an audit trail: who resolved it, when, what the previous version said. That resolution governs every downstream query until revisited. Most conflicts are version drift — a document accurate until not retired when updated — resolvable systematically.

You flag Slack and email as knowledge gaps — but extracting knowledge from conversational data is a hard NLP problem. Is that realistic?

The pragmatic path isn't 'mine all of Slack.' It's identifying which conversational decisions became stable practice and formalizing them. When a senior clinician effectively overrides a protocol in a message — and that practice becomes consistent — the curation question is: does this get formalized with an audit trail, or stay in Slack where it's invisible to the agent? The technology for extraction is secondary to the organizational decision about whether informal overrides should be captured at all.

Chain-of-thought reasoning can identify when sources conflict and flag uncertainty. Why isn't that sufficient?

An LLM with CoT can flag when two passages seem contradictory — useful, but different from resolving the conflict. What it cannot do: determine which document is authoritative for a specific decision context, or attach a name and timestamp to the resolution. Truth in a regulated industry is a judgment call with accountability attached. The CoT flag says the model noticed a conflict. It doesn't produce the resolution record an auditor can examine.

Continue Reading

Knowledge Base Audit

Seven questions to diagnose your knowledge base.

Interactive Tool

Evals vs Audit

Why LLM-as-Judge cannot replace deterministic audit trails.

Strategic Analysis

The Governance Blind Spot

Why guardrails, evals, and HITL don't close the compliance gap.