Criteria Authoring
Workbench.
Transforming SOPs from subjective documents into executable logic. A practical manual for writing audit-ready criteria.
Why Criteria Fail
Your agent works. Sort of. It handles the happy path. It sounds professional. Demo goes great. But when you deploy to production, automation rates stall at 15-20%. Customer complaints trickle in. Your engineering team asks: "What exactly should we fix?"
You don't know. Because your quality audits aren't telling you anything useful. The root cause isn't your agent. It's your criteria.
The Hidden Cost of Vague Criteria
"Clear explanation provided"
The Courtroom Test
If your criterion would make sense in a courtroom deposition but feels strange in a normal conversation, you've written it wrong.
"State your full legal name for the record."
Criteria should allow for natural flexibility. Outcome over action.
"And I have you as John Doe, correct?"
Grouping Strategy
Before you write a single criterion, decide how you'll organize them. This isn't about neatness. It's about diagnosis. When your agent fails 23 interactions, knowing they all failed in "Clinical Data Collection" tells you exactly which LLM chain to debug.
Admin
The mechanical stuff. Identity verification, call setup, consent. Low complexity, high compliance.
Core Process
The actual job. Clinical assessment, lending qualification. Where domain expertise lives.
Empathy / CX
The human layer. Active listening, tone adjustment. Hardest to quantify.
Compliance
Mandatory disclosures. Verbatim requirements. Regulatory boxes that must be checked.
The Granularity Trap
Each addition feels justified. "But what if the customer spells their name wrong?" "What if they give a nickname?"
Stop. The 80/20 Rule applies here. Does this edge case occur in more than 5% of interactions? If not, let it fail the primary criterion.
The 3-Deep Rule
"For any single SOP requirement, you get one primary criterion and up to two conditionals. That's it."
- 1 Primary Criterion (Core)
- Max 2 Conditional Criteria
- Any more is over-engineering
The Transformation Method
You have an SOP written for humans. You need criteria machines can evaluate. Here is the algorithm to get from paragraphs to atomic logic.
Transformation_Engine
SOP TO CRITERIA COMPILER v2.1
Decompose
Read the SOP aloud. Every time you hear 'and' or a comma, that's likely a split point.
Observable Moments
The Transcript Test: If you can't Ctrl+F for evidence, the criterion isn't observable.
Strong Fail Definitions
The fail definition isn't just "the opposite of pass." It must cover partial completion.
Quantification
Quantified criteria remove the last traces of subjectivity. There is no debate about whether 94% is less than 95%.
Use the Baseline Method: Don't guess thresholds. Run 20 calls, plot the data, and find the natural gap between "clearly bad" and "clearly good".
Ratio-Based
Measures one thing relative to another.
Count-Based
Measures occurrences of specific behaviors.
Time-Based
Measures when something happens or duration.
The Default State Problem
Checking for "Empathy" often yields 98% pass rates because professional behavior is the baseline. This is noise. To find the signal (the 2% of bad calls), flip the criterion to hunt for "Rudeness."
The Default State Problem
You run a criterion against 500 interactions. 487 Pass, 13 Fail. Your report says "97.4% Empathy Rate".
You spent tokens evaluating 487 interactions that were... fine. Normal. Default professional behavior.
Logic Gates
You write 25 criteria for "Debt Collection." But 8 only apply to employed borrowers, and 6 only apply to hardship cases.
If you evaluate a hardship call against employed criteria (e.g., "Set Payment Date"), your agent fails. These are False Failures. They destroy trust in your audit data.
The False Failure Simulator
Toggle the switch below to see how Logic Gates filter out irrelevant criteria and fix audit scores.
System evaluates ALL criteria blindly. Agent is penalized for not asking a jobless person for money.
Common Pitfalls
A quick reference checklist. If you see these patterns, refactor immediately.
The 'Appropriately' Trap
"Your criterion uses words like 'appropriately', 'properly', 'correctly', 'adequately'."
Replace the judgment word with the observable behavior that defines 'appropriate'.
Put Your Criteria to Work
You've written audit-ready criteria. Now enforce them in real-time and verify compliance at scale.