Test AI Agents with Semantic Validation
Use AI to judge if responses are semantically correct.
Test tool calls, multi-turn conversations, and streaming responses.
Semantic Matching
“2 PM”, “14:00”, and “two in the afternoon” all pass validation
Tool Validation
Validate tool calls, arguments, and execution order automatically
Composable
Build tests as JSON pipelines. Works with any API or AI system
Why Semantic Testing?
String Matching
Semantic Validation
How It Works
1. AI Response
Your AI system generates a response with text and tool calls
2. LLM Judge
AI evaluates if the response semantically matches expectations
3. Pass/Fail
Get a score with detailed reasoning about what passed or failed
All of These Pass ✓
Because semantic testing understands meaning, not just exact strings. No more false failures from different formats or phrasings!
Building Blocks
Mix and match these blocks to create powerful test pipelines. Click any block to see details.
💡 Quick Pipeline Example
Chain blocks together to create powerful test scenarios. Each block's output becomes available to the next block.
Try the Playground
Run a semantic test right in your browser. See how AI evaluates responses for semantic correctness.
Choose Example
1. Mock AI Response
Text: “I've scheduled your meeting for tomorrow at 2 PM in Conference Room A. Invitations have been sent to john@example.com.”
Tools Called: create_event
2. LLM Judge Evaluation
Expected Behavior:
“Should check for conflicts and create meeting with confirmation mentioning time, location, and attendees”
Pass if: score gte 0.6
Results
Get Started in 30 Seconds
Install
That's it. No accounts, no API keys required to start.
Create Test
Write tests in simple JSON. Use the playground above to get started.
Run Tests
Run tests locally on your machine. Full control, no cloud required.
Want semantic validation with AI?
Add an OpenAI API key to use LLM Judge for semantic testing. Optional - you can use all other blocks without it.
You pay OpenAI directly. No middleman fees.