Test AI Agents with Semantic Validation

Use AI to judge if responses are semantically correct.
Test tool calls, multi-turn conversations, and streaming responses.

View on GitHub

Semantic Matching

“2 PM”, “14:00”, and “two in the afternoon” all pass validation

🔧

Tool Validation

Validate tool calls, arguments, and execution order automatically

📦

Composable

Build tests as JSON pipelines. Works with any API or AI system

Why Semantic Testing?

String Matching

Expected:
"Meeting at 2 PM"
Got:
"Meeting at 14:00"
❌ FAIL - Strings don't match
Even though the meaning is identical!

Semantic Validation

Expected:
"Meeting at 2 PM"
Got:
"Meeting at 14:00"
✅ PASS - Same meaning
LLM Judge understands they're equivalent

How It Works

📝

1. AI Response

Your AI system generates a response with text and tool calls

🧑‍⚖️

2. LLM Judge

AI evaluates if the response semantically matches expectations

3. Pass/Fail

Get a score with detailed reasoning about what passed or failed

All of These Pass ✓

🕑
2 PM
✓ Pass
14:00
✓ Pass
🕐
2:00 PM
✓ Pass
💬
two in the afternoon
✓ Pass

Because semantic testing understands meaning, not just exact strings. No more false failures from different formats or phrasings!

Building Blocks

Mix and match these blocks to create powerful test pipelines. Click any block to see details.

💡 Quick Pipeline Example

📦 MockData
🧑‍⚖️ LLMJudge
✓ ValidateContent
=
✅ Complete Test

Chain blocks together to create powerful test scenarios. Each block's output becomes available to the next block.

Try the Playground

Run a semantic test right in your browser. See how AI evaluates responses for semantic correctness.

Choose Example

1. Mock AI Response

Text:I've scheduled your meeting for tomorrow at 2 PM in Conference Room A. Invitations have been sent to john@example.com.

Tools Called: create_event

2. LLM Judge Evaluation

Expected Behavior:

Should check for conflicts and create meeting with confirmation mentioning time, location, and attendees

Pass if: score gte 0.6

Results

⏳ Click “Run Test” to see results

Get Started in 30 Seconds

✓ 100% Free✓ Open Source✓ Runs Locally✓ No Vendor Lock-in
1
📦

Install

npm install @blade47/semantic-test

That's it. No accounts, no API keys required to start.

2
📝

Create Test

{ "pipeline": [...] }

Write tests in simple JSON. Use the playground above to get started.

3
🚀

Run Tests

npx semtest test.json

Run tests locally on your machine. Full control, no cloud required.

💡

Want semantic validation with AI?

Add an OpenAI API key to use LLM Judge for semantic testing. Optional - you can use all other blocks without it.

export OPENAI_API_KEY="your-key-here"

You pay OpenAI directly. No middleman fees.