Burden of Proof — What this series is

Every AI vendor shows up to a demo with a pre-cooked setup. None of them generate a document live and run it cold. They won't even change a single word in the prompt.

BoP is a weekly LinkedIn Live where CogniSwitch co-founders Vivek and Josh put their own product on the spot — live, unscripted, on data we generate in front of you. We generate a synthetic enterprise document on the spot, run questions through CogniSwitch, and show you whether we get the same answer every time. Including when we don't.

No slides · No cherry-picked datasets · Claude for doc generation · Live

PL-001 | May 16, 2026 | Burden of Proof

Knowledge Domain: HealthcareStatus: Verified

Topic

CONSISTENCY

Run byVivek Khandelwal · Joshua Thomas

The Claim

Context graph retrieval is deterministic — the same ontology entities and triples come back regardless of how the question is phrased. LLMs will still generate and omit from that data. We tested both. We proved both.

Findings

Validated with exceptions

✓

Retrieval — identical repeats

Q1 asked 3× returned the same 21 entities and 16 triples every time. Retrieval is deterministic under identical conditions.

✓

Retrieval — semantic variations

Q2 and Q3 used different phrasing for the same clinical intent. Equivalent concept sets retrieved. The graph handles paraphrasing.

✗

Retrieval — synonym boundary gap

"children" ≠ "pediatric" in SNOMED CT. Q4 returned 9 of 18 expected entities. A retrieval failure, not a generation failure.

LLMs generate and omit — also proved

Even from identical, deterministic retrieval, LLM verbosity and emphasis shifted across runs. Open-ended questions got longer answers. Pointed ones got shorter ones. This is the generation layer — a separate problem from retrieval.

A. Setup Trace

Document gen (Claude)

Ingested into CogniSwitch

Ontology auto-map (SNOMED CT)

Questions asked

Graph responses retrieved

B. Source Document

Source Document

Pediatric Antibiotic Prescribing Protocol

Generated live using Claude during the session · Format: Markdown / PDF

View Document

C. Experiment Runs

Layer 1 — Identical repeats

Q1 × 3

Same question, three runs. Does retrieval return the same result every time?

✓ Consistent

Layer 2 — Semantic variations

Q2 · Q3 · Q4

Same intent, different phrasing. Does retrieval generalize across paraphrases?

✓ ✓ ✗ — Gap at Q4

Graph Coverage100%

Q1Asked 3× — retrieval consistency test

Under what conditions should antibiotics be prescribed?

Knowledge Graph Retrieval

Deterministic · fetched from context graph

21 entities16 triples

bacterial_infectionempiric_treatmentclinical_evidencelaboratory_evidenceantibiotic_resistanceprescriptionhigh_clinical_probability+ 14 more

[antibiotics]→ requires →bacterial_infection OR empiric_justification

[clinical_evidence]→ supports →antibiotic_prescription

[empiric_treatment]→ requires →high_clinical_probability

retrieval resultRETRIEVAL CONSISTENT × 3

LLM Generation

Natural language from retrieved context

Antibiotics should be prescribed only when there is clinical or laboratory evidence of bacterial infection, or when empiric treatment is warranted due to high clinical probability of bacterial origin.

Q2Semantic variation of Q1

When should antibiotics be prescribed?

Knowledge Graph Retrieval

Deterministic · fetched from context graph

23 entities19 triples

bacterial_infectionclinical_severityprescription_timingclinical_decision_supportantibiotic_stewardship+ 18 more

[antibiotics]→ requires →bacterial_infection

[prescription_timing]→ depends_on →clinical_severity

[early_treatment]→ indicated_for →severe_cases

retrieval resultRETRIEVAL CONSISTENT

LLM Generation

Natural language from retrieved context

Antibiotic prescription is indicated when bacterial infection is confirmed or strongly suspected clinically. Timing is critical — early in severe cases, watchful waiting in mild viral presentations.

Q3Dosing — semantic variation

How are antibiotic doses calculated for pediatric patients?

Knowledge Graph Retrieval

Deterministic · fetched from context graph

18 entities14 triples

weight_based_dosingneonates_0_to_28_daysinfants_1_to_12_monthstoddlers_1_to_5_yearsschool_age_6_to_12_yearsrenal_functiondose_per_kg+ 11 more

[dosage]→ calculated_by →patient_weight_kg

[age_band]→ modifies →dose_calculation

[renal_impairment]→ requires →dosage_adjustment

retrieval resultRETRIEVAL CONSISTENT

LLM Generation

Natural language from retrieved context

Pediatric antibiotic dosing is weight-based (mg/kg). Age-band adjustments apply across neonates, infants, toddlers, school-age, and adolescents. Renal and hepatic function further modify dosing. Maximum daily doses cap weight-based calculations.

Incident Record

Q4Synonym variation — gap found

How are antibiotic doses calculated for children patients?

Knowledge Graph Retrieval

Deterministic · fetched from context graph

9 entities7 triples

weight_based_dosingpatient_weight_kgazithromycinamoxicillindose_per_kg✗ age_band_breakdowns not retrieved

[dosage]→ calculated_by →patient_weight_kg

[dose_per_kg]→ has_maximum →maximum_daily_dose

Path discontinuity at [children] — 9 of 18 expected entities retrieved

retrieval resultGAP FOUND

LLM Generation

Natural language from retrieved context

Antibiotic doses are calculated based on patient weight in kilograms, with specific mg/kg ratios varying by antibiotic. Maximum daily doses apply.

Incomplete retrieval → incomplete generation. The LLM had no age-band data to work with.

D. The Gap

Semantic Discontinuity

Broken chain

[pediatric]

ONTOLOGY HEAD

???

[children]

UNMAPPED VARIANT

Explanation

The term pediatric is encoded in SNOMED CT within the source document. The term children is a natural language synonym not mapped as an equivalent concept in the ontology. When Q4 used children instead of pediatric, the system traversed a different graph path and returned only 9 of the 18 expected entities — the age-band breakdowns were not retrieved.

Fix

Define the synonym relationship once in the ontology. It resolves consistently thereafter. This is a graph problem with a graph solution — not a prompt engineering workaround.

Corrective Action

Define the synonym relationship once in the ontology.

Index Reference

Gap types like this are categorized and scored in the CogniSwitch Context Quality Index.

Read the CQI paper →

E. What This Doesn't Prove

Limitation 01

Single synthetic document. Real enterprise data is messier, multi-source, and contradictory. Cross-document retrieval consistency is untested here.

Limitation 02

The ontology gap was surfaced, not fixed. We did not re-run Q4 after adding the synonym to confirm the gap closes. That is the next experiment.

Limitation 03

One domain, one session. We tested in healthcare because SNOMED CT is a well-structured ontology. Other domains may have different gap profiles.