Burden of Proof — What this series is

Every AI vendor shows up to a demo with a pre-cooked setup. None of them generate a document live and run it cold. They won't even change a single word in the prompt.

BoP is a weekly LinkedIn Live where CogniSwitch co-founders Vivek and Josh put their own product on the spot — live, unscripted, on data we generate in front of you. We generate a synthetic enterprise document on the spot, run questions through CogniSwitch, and show you whether we get the same answer every time. Including when we don't.

No slides · No cherry-picked datasets · Claude for doc generation · Live
PL-001 | May 16, 2026 | Burden of Proof
Knowledge Domain: HealthcareStatus: Verified
Topic

CONSISTENCY

The Claim

Context graph retrieval is deterministic — the same ontology entities and triples come back regardless of how the question is phrased. LLMs will still generate and omit from that data. We tested both. We proved both.

Findings
Validated with exceptions
Retrieval — identical repeats

Q1 asked 3× returned the same 21 entities and 16 triples every time. Retrieval is deterministic under identical conditions.

Retrieval — semantic variations

Q2 and Q3 used different phrasing for the same clinical intent. Equivalent concept sets retrieved. The graph handles paraphrasing.

Retrieval — synonym boundary gap

"children" ≠ "pediatric" in SNOMED CT. Q4 returned 9 of 18 expected entities. A retrieval failure, not a generation failure.

Δ
LLMs generate and omit — also proved

Even from identical, deterministic retrieval, LLM verbosity and emphasis shifted across runs. Open-ended questions got longer answers. Pointed ones got shorter ones. This is the generation layer — a separate problem from retrieval.

A. Setup Trace
Document gen (Claude)
Ingested into CogniSwitch
Ontology auto-map (SNOMED CT)
Questions asked
Graph responses retrieved
B. Source Document
Source Document

Pediatric Antibiotic Prescribing Protocol

Generated live using Claude during the session · Format: Markdown / PDF

View Document
C. Experiment Runs
Layer 1 — Identical repeats
Q1 × 3

Same question, three runs. Does retrieval return the same result every time?

✓ Consistent
Layer 2 — Semantic variations
Q2 · Q3 · Q4

Same intent, different phrasing. Does retrieval generalize across paraphrases?

✓ ✓ ✗ — Gap at Q4
Graph Coverage100%
Q1Asked 3× — retrieval consistency test
Under what conditions should antibiotics be prescribed?
Knowledge Graph Retrieval
Deterministic · fetched from context graph
21 entities16 triples
bacterial_infectionempiric_treatmentclinical_evidencelaboratory_evidenceantibiotic_resistanceprescriptionhigh_clinical_probability+ 14 more
[antibiotics]requiresbacterial_infection OR empiric_justification
[clinical_evidence]supportsantibiotic_prescription
[empiric_treatment]requireshigh_clinical_probability
retrieval resultRETRIEVAL CONSISTENT × 3
LLM Generation
Natural language from retrieved context

Antibiotics should be prescribed only when there is clinical or laboratory evidence of bacterial infection, or when empiric treatment is warranted due to high clinical probability of bacterial origin.

Q2Semantic variation of Q1
When should antibiotics be prescribed?
Knowledge Graph Retrieval
Deterministic · fetched from context graph
23 entities19 triples
bacterial_infectionclinical_severityprescription_timingclinical_decision_supportantibiotic_stewardship+ 18 more
[antibiotics]requiresbacterial_infection
[prescription_timing]depends_onclinical_severity
[early_treatment]indicated_forsevere_cases
retrieval resultRETRIEVAL CONSISTENT
LLM Generation
Natural language from retrieved context

Antibiotic prescription is indicated when bacterial infection is confirmed or strongly suspected clinically. Timing is critical — early in severe cases, watchful waiting in mild viral presentations.

Q3Dosing — semantic variation
How are antibiotic doses calculated for pediatric patients?
Knowledge Graph Retrieval
Deterministic · fetched from context graph
18 entities14 triples
weight_based_dosingneonates_0_to_28_daysinfants_1_to_12_monthstoddlers_1_to_5_yearsschool_age_6_to_12_yearsrenal_functiondose_per_kg+ 11 more
[dosage]calculated_bypatient_weight_kg
[age_band]modifiesdose_calculation
[renal_impairment]requiresdosage_adjustment
retrieval resultRETRIEVAL CONSISTENT
LLM Generation
Natural language from retrieved context

Pediatric antibiotic dosing is weight-based (mg/kg). Age-band adjustments apply across neonates, infants, toddlers, school-age, and adolescents. Renal and hepatic function further modify dosing. Maximum daily doses cap weight-based calculations.

Incident Record
Q4Synonym variation — gap found
How are antibiotic doses calculated for children patients?
Knowledge Graph Retrieval
Deterministic · fetched from context graph
9 entities7 triples
weight_based_dosingpatient_weight_kgazithromycinamoxicillindose_per_kg✗ age_band_breakdowns not retrieved
[dosage]calculated_bypatient_weight_kg
[dose_per_kg]has_maximummaximum_daily_dose
Path discontinuity at [children] — 9 of 18 expected entities retrieved
retrieval resultGAP FOUND
LLM Generation
Natural language from retrieved context

Antibiotic doses are calculated based on patient weight in kilograms, with specific mg/kg ratios varying by antibiotic. Maximum daily doses apply.

Incomplete retrieval → incomplete generation. The LLM had no age-band data to work with.

D. The Gap

Semantic Discontinuity

Broken chain
[pediatric]
ONTOLOGY HEAD
???
[children]
UNMAPPED VARIANT
Explanation

The term pediatric is encoded in SNOMED CT within the source document. The term children is a natural language synonym not mapped as an equivalent concept in the ontology. When Q4 used children instead of pediatric, the system traversed a different graph path and returned only 9 of the 18 expected entities — the age-band breakdowns were not retrieved.

Fix

Define the synonym relationship once in the ontology. It resolves consistently thereafter. This is a graph problem with a graph solution — not a prompt engineering workaround.

Corrective Action

Define the synonym relationship once in the ontology.

Index Reference

Gap types like this are categorized and scored in the CogniSwitch Context Quality Index.

Read the CQI paper →
E. What This Doesn't Prove
Limitation 01

Single synthetic document. Real enterprise data is messier, multi-source, and contradictory. Cross-document retrieval consistency is untested here.

Limitation 02

The ontology gap was surfaced, not fixed. We did not re-run Q4 after adding the synonym to confirm the gap closes. That is the next experiment.

Limitation 03

One domain, one session. We tested in healthcare because SNOMED CT is a well-structured ontology. Other domains may have different gap profiles.

The fix is architectural.

Deterministic retrieval doesn't fail because an LLM forgot something. It fails when the ontology has a gap. The distinction matters because one failure mode can be permanently fixed — and the other can only be patched.

Context Quality Index →
Live Session Recording
PL-001 · Burden of Proof · May 16, 2026Open on YouTube →
Engineer's Note
The failure at Q4 is not a reasoning failure, but a mapping failure. Unlike an LLM hallucination, this failure is predictable, localized, and permanently fixable.
Receipts & Artifacts