Test AI Agents with Semantic Validation

Use AI to judge if responses are semantically correct.
Test tool calls, multi-turn conversations, and streaming responses.

Used in production by calendar0.app

View on GitHub

✓

Semantic Matching

“2 PM”, “14:00”, and “two in the afternoon” all pass validation

🔧

Tool Validation

Validate tool calls, arguments, and execution order automatically

📦

Composable

Build tests as JSON pipelines. Works with any API or AI system

Why Semantic Testing?

❌

String Matching

Expected:

"Meeting at 2 PM"

Got:

"Meeting at 14:00"

❌ FAIL - Strings don't match

Even though the meaning is identical!

✅

Semantic Validation

Expected:

"Meeting at 2 PM"

Got:

"Meeting at 14:00"

✅ PASS - Same meaning

LLM Judge understands they're equivalent

How It Works

📝

1. AI Response

Your AI system generates a response with text and tool calls

→

🧑‍⚖️

2. LLM Judge

AI evaluates if the response semantically matches expectations

→

✅

3. Pass/Fail

Get a score with detailed reasoning about what passed or failed

All of These Pass ✓

🕑

2 PM

✓ Pass

⏰

14:00

✓ Pass

🕐

2:00 PM

✓ Pass

💬

two in the afternoon

✓ Pass

Because semantic testing understands meaning, not just exact strings. No more false failures from different formats or phrasings!

Building Blocks

Mix and match these blocks to create powerful test pipelines. Click any block to see details.

💡 Quick Pipeline Example

📦 MockData

→

🧑‍⚖️ LLMJudge

→

✓ ValidateContent

✅ Complete Test

Chain blocks together to create powerful test scenarios. Each block's output becomes available to the next block.

Try the Playground

Run a semantic test right in your browser. See how AI evaluates responses for semantic correctness.

Choose Example

1. Mock AI Response

Text: “I've scheduled your meeting for tomorrow at 2 PM in Conference Room A. Invitations have been sent to john@example.com.”

Tools Called: create_event

↓

2. LLM Judge Evaluation

Expected Behavior:

“Should check for conflicts and create meeting with confirmation mentioning time, location, and attendees”

Pass if: score gte 0.6

Results

⏳ Click “Run Test” to see results

Get Started in 30 Seconds

✓ 100% Free✓ Open Source✓ Runs Locally✓ No Vendor Lock-in

📦

Install

npm install @blade47/semantic-test

That's it. No accounts, no API keys required to start.

📝

Create Test

{ "pipeline": [...] }

Write tests in simple JSON. Use the playground above to get started.

🚀

Run Tests

npx semtest test.json

Run tests locally on your machine. Full control, no cloud required.

💡

Want semantic validation with AI?

Add an OpenAI API key to use LLM Judge for semantic testing. Optional - you can use all other blocks without it.

export OPENAI_API_KEY="your-key-here"

You pay OpenAI directly. No middleman fees.

View on GitHub →View on npm →

GitHub•npm•Documentation

MIT License • Open Source

Built with ❤️ by Alessandro

Test AI Agents with Semantic Validation

Semantic Matching

Tool Validation

Composable

Why Semantic Testing?

String Matching

Semantic Validation

How It Works

1. AI Response

2. LLM Judge

3. Pass/Fail

All of These Pass ✓

Building Blocks

MockData

HttpRequest

JsonParser

StreamParser

ValidateContent

ValidateTools

LLMJudge

Loop

💡 Quick Pipeline Example

Try the Playground

Choose Example

1. Mock AI Response

2. LLM Judge Evaluation

Results

Get Started in 30 Seconds

Install

Create Test

Run Tests

Want semantic validation with AI?