Agentest

🤖

Scenario-based Tests

Define user personas, goals, and knowledge — Agentest generates realistic multi-turn conversations

🔧

Tool-call Mocks

Intercept and control tool calls with functions, sequences, and error simulation

✅

Trajectory Assertions

Verify tool call order and arguments with strict, contains, unordered, and within match modes

📊

LLM-as-judge Metrics

Helpfulness, coherence, relevance, faithfulness, goal completion, behavior failure detection

🔄

Comparison Mode

Run the same scenarios against multiple models/configs side-by-side

🚀

CI-ready CLI

Exit codes, JSON reporter, GitHub Actions annotations, watch mode

Quick Start

Install Agentest:

bash

npm install @agentesting/agentest --save-dev

Create a config file:

// agentest.config.ts
import { defineConfig } from 'agentest'

export default defineConfig({
  agent: {
    name: 'my-agent',
    endpoint: 'http://localhost:3000/api/chat',
  },
})

Write a scenario:

// tests/booking.sim.ts
import { scenario, sequence } from 'agentest'

scenario('user books a morning slot', {
  profile: 'Busy professional who prefers mornings.',
  goal: 'Book a haircut for next Tuesday morning.',

  mocks: {
    tools: {
      check_availability: (args) => ({
        available: true,
        slots: ['09:00', '09:45', '10:30'],
      }),
      create_booking: sequence([
        { success: true, bookingId: 'BK-001' },
      ]),
    },
  },

  assertions: {
    toolCalls: {
      matchMode: 'contains',
      expected: [
        { name: 'check_availability', argMatchMode: 'ignore' },
        { name: 'create_booking', argMatchMode: 'ignore' },
      ],
    },
  },
})

Run:

bash

# If your agent runs on localhost, allow private endpoints:
AGENTEST_ALLOW_PRIVATE_ENDPOINTS=1 npx agentest run

Why Agentest?

Testing agents is not like testing regular APIs. Traditional API tests send a request and assert on the response. Agent tests need to handle multi-turn conversations where the agent decides which tools to call, in what order, with what arguments — and the "correct" output is subjective.

Agentest solves this by:

Simulating realistic users with LLM-powered personas that talk to your agent

Intercepting tool calls and resolving them through your mocks

Verifying trajectories with deterministic assertions on tool call order

Evaluating quality with LLM-as-judge metrics across 8 dimensions

Agentest complements eval platforms and observability tools — it doesn't replace them. Use Agentest to run your agent through test scenarios in CI. Use LangSmith/Langfuse to observe your agent in production.

AgentestVitest for AI agents