Criteria Authoring Workshop

Your agent works.
Your criteria don't.

Vague evaluation criteria create noise, not signal. Engineering asks "what should we fix?" and you don't have an answer. This is the method to fix that — in 5 steps.

"Two different evaluators, reviewing the same transcript, will reach the same conclusion — 100% of the time."

The Framework

Step
What you do
The shift
Decompose
Split compound requirements into single behaviors
"Verify identity and consent" → 2 criteria
Observe
Find the transcript evidence
"Demonstrates empathy" → "Uses phrase"
Pass
Write for outcome, not action
"Bot asks for name" → "Name is captured"
Fail
Specify what failure looks like
"Didn't pass" → "Proceeded without DOB"
Stress Test
Two evaluators, same answer
Ambiguity → Precision

Click "Next" below to begin →

WelcomeOverview