Skip to content

AgentestVitest for AI agents

Scenario-based testing with simulated users, tool-call mocks, and LLM-as-judge evaluation

Agentest

Quick Start ​

Install Agentest:

bash
npm install @agentesting/agentest --save-dev

Create a config file:

ts
// agentest.config.ts
import { defineConfig } from 'agentest'

export default defineConfig({
  agent: {
    name: 'my-agent',
    endpoint: 'http://localhost:3000/api/chat',
  },
})

Write a scenario:

ts
// tests/booking.sim.ts
import { scenario, sequence } from 'agentest'

scenario('user books a morning slot', {
  profile: 'Busy professional who prefers mornings.',
  goal: 'Book a haircut for next Tuesday morning.',

  mocks: {
    tools: {
      check_availability: (args) => ({
        available: true,
        slots: ['09:00', '09:45', '10:30'],
      }),
      create_booking: sequence([
        { success: true, bookingId: 'BK-001' },
      ]),
    },
  },

  assertions: {
    toolCalls: {
      matchMode: 'contains',
      expected: [
        { name: 'check_availability', argMatchMode: 'ignore' },
        { name: 'create_booking', argMatchMode: 'ignore' },
      ],
    },
  },
})

Run:

bash
# If your agent runs on localhost, allow private endpoints:
AGENTEST_ALLOW_PRIVATE_ENDPOINTS=1 npx agentest run

Why Agentest? ​

Testing agents is not like testing regular APIs. Traditional API tests send a request and assert on the response. Agent tests need to handle multi-turn conversations where the agent decides which tools to call, in what order, with what arguments — and the "correct" output is subjective.

Agentest solves this by:

  • Simulating realistic users with LLM-powered personas that talk to your agent
  • Intercepting tool calls and resolving them through your mocks
  • Verifying trajectories with deterministic assertions on tool call order
  • Evaluating quality with LLM-as-judge metrics across 8 dimensions

Agentest complements eval platforms and observability tools — it doesn't replace them. Use Agentest to run your agent through test scenarios in CI. Use LangSmith/Langfuse to observe your agent in production.

Released under the MIT License.