Configuration

Complete guide to configuring Agentest.

Configuration File

Create agentest.config.ts (or agentest.config.yaml) in your project root. Agentest auto-detects the format.

import { defineConfig } from 'agentest'

export default defineConfig({
  // Configuration options
})

The defineConfig helper provides:

TypeScript type safety and autocomplete
Zod schema validation
Environment variable interpolation (${VAR} syntax)

For YAML configuration, see the YAML Config Guide.

Agent Configuration

Agentest supports two agent types: chat completions (HTTP endpoint) and custom (in-process handler function).

Chat Completions (HTTP Endpoint)

The default agent type. Your agent must expose an OpenAI-compatible chat completions endpoint.

agent: {
  name: 'my-agent',
  type: 'chat_completions',  // default, can be omitted
  endpoint: 'http://localhost:3000/api/chat',

  // Optional: headers for authentication
  headers: {
    Authorization: 'Bearer ${AGENT_API_KEY}',  // interpolated from process.env
    'X-Custom-Header': 'value',
  },

  // Optional: merged into every request
  body: {
    model: 'gpt-4o',
    temperature: 0.7,
    messages: [
      { role: 'system', content: 'You are a helpful assistant.' },
    ],
  },
}

Required Fields

name (string) — Identifies your agent in test output
endpoint (string) — HTTP/HTTPS URL for your agent's chat endpoint

Optional Fields

type ('chat_completions') — Default agent type (can be omitted)
headers (Record<string, string>) — HTTP headers sent with every request
body (Record<string, any>) — Shallow-merged into every request body
streaming (boolean) — Enable Server-Sent Events (SSE) streaming mode

Environment Variable Interpolation

Headers support ${VAR} syntax for environment variables:

headers: {
  Authorization: 'Bearer ${AGENT_API_KEY}',
  'X-Org-ID': '${ORG_ID}',
}

Important: If a referenced variable is not set in process.env, config loading fails with a clear error. This keeps secrets out of your config files and prevents accidental runs with missing credentials.

Set environment variables in your shell or CI:

bash

export AGENT_API_KEY=your-key-here
export ORG_ID=org-123
npx agentest run

Request Body Merging

The body field is shallow-merged into every request:

agent: {
  endpoint: 'http://localhost:3000/api/chat',
  body: {
    model: 'gpt-4o',
    temperature: 0.7,
    messages: [
      { role: 'system', content: 'You are a helpful assistant.' },
    ],
  },
}

Agentest sends:

json

{
  "model": "gpt-4o",
  "temperature": 0.7,
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "User's first message" },
    ...
  ]
}

The messages array in body is prepended to conversation messages, which is useful for system prompts.

Streaming Support

Enable Server-Sent Events (SSE) streaming mode when your agent endpoint streams responses by default:

agent: {
  name: 'streaming-agent',
  endpoint: 'https://api.openai.com/v1/chat/completions',
  streaming: true,
  headers: {
    Authorization: 'Bearer ${OPENAI_API_KEY}',
  },
  body: {
    model: 'gpt-4o',
  },
}

When streaming: true:

Agentest sends stream: true in the request body
Parses the chunked SSE response
Accumulates delta.content and delta.tool_calls into a complete message
Returns the full message to the simulation loop

This works with any endpoint that follows OpenAI's streaming format:

data: {"choices":[{"delta":{"content":"Hello"}}]}
data: {"choices":[{"delta":{"content":" world"}}]}
data: [DONE]

Custom Handler (In-Process)

Use type: 'custom' to connect any agent that doesn't have an OpenAI-compatible HTTP endpoint:

import { defineConfig, type ChatMessage } from 'agentest'
import { myAgent } from './src/agent.js'

export default defineConfig({
  agent: {
    type: 'custom',
    name: 'my-agent',
    handler: async (messages: ChatMessage[]) => {
      // Call your agent however you want
      const result = await myAgent.chat(messages)

      return {
        role: 'assistant' as const,
        content: result.text,

        // Optional: return tool calls if your agent uses tools
        tool_calls: [
          {
            id: 'call_123',
            type: 'function',
            function: {
              name: 'get_weather',
              arguments: '{"location":"Seattle"}',
            },
          },
        ],
      }
    },
  },
})

Handler Signature

type AgentHandler = (messages: ChatMessage[]) => Promise<ChatMessage>

interface ChatMessage {
  role: 'system' | 'user' | 'assistant' | 'tool'
  content: string
  tool_calls?: ToolCall[]  // if role is 'assistant'
  tool_call_id?: string    // if role is 'tool'
}

The handler:

Receives the full message history (same format used internally)
Must return an assistant message
If the response includes tool_calls, Agentest runs them through mocks and calls your handler again with the tool results

When to Use Custom Handlers

Non-OpenAI APIs — Anthropic, Google, custom protocols
In-process agents — No HTTP server needed
Framework SDKs — LangChain, Vercel AI SDK, Mastra, etc.
Custom request/response mapping — Your agent uses a different format

See Framework Integration for examples with LangChain, Vercel AI SDK, and more.

LLM Provider

The LLM provider is used for the simulated user and evaluation judges — not for your agent.

Cloud Providers

// Anthropic (default)
provider: 'anthropic',
model: 'claude-sonnet-4-20250514',

// OpenAI
provider: 'openai',
model: 'gpt-4o',

// Google
provider: 'google',
model: 'gemini-2.0-flash',

Local Providers

// Ollama (defaults to http://localhost:11434/v1)
provider: 'ollama',
model: 'llama3.2',

// OpenAI-compatible (LM Studio, vLLM, llama.cpp)
provider: 'openai-compatible',
model: 'my-model',
providerOptions: {
  baseURL: 'http://localhost:1234/v1',  // required
  apiKey: 'optional-key',                // optional, defaults to 'not-needed'
},

Environment Variables for API Keys

Cloud providers read API keys from standard environment variables:

bash

export ANTHROPIC_API_KEY=your-key-here
export OPENAI_API_KEY=your-key-here
export GOOGLE_GENERATIVE_AI_API_KEY=your-key-here

You don't need to configure these in Agentest — they're read automatically.

Provider Options

Option	Type	Default	Description
`provider`	`string`	`'anthropic'`	`'anthropic'`, `'openai'`, `'google'`, `'ollama'`, `'openai-compatible'`
`model`	`string`	`'claude-sonnet-4-20250514'`	Model ID passed to the provider
`providerOptions.baseURL`	`string`	—	Base URL override. Required for `openai-compatible`, optional for `ollama`
`providerOptions.apiKey`	`string`	—	API key override. For local LLMs this is typically not needed

See Local LLMs for more details on running with local models.

Simulation Parameters

Control how many conversations to run and how they're executed:

conversationsPerScenario: 3,  // run 3 independent conversations per scenario
maxTurns: 8,                  // max user↔agent exchanges per conversation
concurrency: 20,              // max parallel LLM calls across all scenarios

Option	Type	Default	Description
`conversationsPerScenario`	`number`	`3`	How many independent conversations to run per scenario. More conversations = more statistical confidence, but more LLM cost.
`maxTurns`	`number`	`8`	Upper bound on user↔agent exchanges. The simulated user can stop earlier if the goal is met.
`concurrency`	`number`	`20`	Max parallel LLM calls. Controls both simulation and evaluation parallelism. Lower this if you're hitting rate limits.

Both conversationsPerScenario and maxTurns can be overridden per-scenario:

scenario('complex workflow', {
  conversationsPerScenario: 10,  // needs more confidence
  maxTurns: 15,                  // needs more turns
  // ...
})

Evaluation

Configure which metrics to run and their pass/fail thresholds:

// Run only specific metrics (default: all)
metrics: [
  'helpfulness',
  'coherence',
  'relevance',
  'faithfulness',
  'goal_completion',
  'agent_behavior_failure',
],

// Set minimum score thresholds (fail the run if average is below)
thresholds: {
  helpfulness: 3.5,      // average must be >= 3.5
  goal_completion: 0.7,  // 70%+ of conversations must complete goal
  coherence: 4.0,
},

// Minimum error severity that fails a scenario
failOnErrorSeverity: 'critical',  // 'low' | 'medium' | 'high' | 'critical'

// Custom metrics (advanced)
customMetrics: [new ToneMetric(), new BrandVoiceMetric()],

Available Metrics

Quantitative (1-5 scale):

helpfulness — How effectively the agent addresses user needs
coherence — Logical flow and consistency
relevance — How on-topic responses are
faithfulness — No contradictions with knowledge or tool results
verbosity — Appropriate response length

Goal Completion (0/1):

goal_completion — Was the goal fully achieved?

Qualitative (failure detection):

agent_behavior_failure — Detects repetition, false info, unsafe actions
tool_call_behavior_failure — Wrong tools, missing calls, ignored results

Evaluation Options

Option	Type	Default	Description
`metrics`	`string[]`	all 8 metrics	Which evaluation metrics to run. Omit to run all.
`thresholds`	`Record<string, number>`	`{}`	Minimum average scores per metric. Values are 0-5 for quantitative, 0-1 for goal_completion. If any metric's average falls below its threshold, the run fails.
`failOnErrorSeverity`	`string`	`'critical'`	Minimum error severity that fails a scenario. One of `'low'`, `'medium'`, `'high'`, `'critical'`.
`customMetrics`	`Metric[]`	`[]`	Custom metric instances (see Custom Metrics).

See Evaluation Metrics for detailed explanations of each metric.

File Discovery

Configure which files to search for scenarios:

include: ['**/*.sim.ts'],      // glob patterns for scenario files
exclude: ['node_modules/**'],  // glob patterns to ignore

Option	Type	Default	Description
`include`	`string[]`	`['*/.sim.ts']`	Glob patterns to find scenario files. Relative to the working directory.
`exclude`	`string[]`	`['node_modules/**']`	Glob patterns to exclude from discovery.

Scenario files are TypeScript files that call scenario() when imported. Agentest uses jiti to import them directly — no build step needed.

Reporters

Configure output formats:

reporters: ['console', 'json', 'github-actions'],

Reporter	Description
`console`	Colored pass/fail output with live progress spinner, metric scores, and error summaries. Default. In non-TTY environments (CI), falls back to line-by-line progress output.
`json`	Writes full results to `.agentest/results.json` including all turns, evaluations, and errors. Useful for programmatic analysis and custom reporting.
`github-actions`	Writes a markdown summary table to `$GITHUB_STEP_SUMMARY` and emits `::error`/`::warning`/`::notice` annotations that surface inline on PRs.

See the Reporters Guide for detailed documentation.

Mock Behavior

Control what happens when the agent calls a tool that has no mock defined:

unmockedTools: 'error',       // default: throw AgentestError
unmockedTools: 'passthrough', // return undefined (no-op)

Setting	Behavior
`'error'` (default)	Throws `AgentestError` with a helpful message suggesting how to add the mock. The conversation is recorded as an error.
`'passthrough'`	Returns `undefined` as the tool result. The agent sees `null` in the response.

'error' mode is recommended — it catches unexpected tool usage early. Switch to 'passthrough' if your agent uses many tools and you only want to mock a subset.

See Mocks Guide for more details.

Comparison Mode

Run the same scenarios against multiple models or agent configurations side-by-side:

export default defineConfig({
  agent: {
    name: 'gpt-4o',
    endpoint: 'http://localhost:3000/api/chat',
    body: { model: 'gpt-4o' },
  },

  // Each entry inherits endpoint, headers, body from agent above
  // Only specify what differs
  compare: [
    { name: 'gpt-4o-mini', body: { model: 'gpt-4o-mini' } },
    { name: 'claude-sonnet', body: { model: 'claude-sonnet-4-20250514' } },
  ],
})

See Comparison Mode for examples and output.

Full Config Example

import { defineConfig } from 'agentest'
import { ToneMetric } from './metrics/tone.js'

export default defineConfig({
  // Agent configuration
  agent: {
    name: 'support-bot',
    endpoint: 'http://localhost:3000/api/chat',
    streaming: false,
    headers: {
      Authorization: 'Bearer ${AGENT_API_KEY}',
      'X-Org-ID': '${ORG_ID}',
    },
    body: {
      model: 'gpt-4o',
      temperature: 0.7,
      messages: [
        { role: 'system', content: 'You are a customer support agent for Acme Inc.' },
      ],
    },
  },

  // LLM provider for simulation and evaluation
  provider: 'anthropic',
  model: 'claude-sonnet-4-20250514',

  // Simulation settings
  conversationsPerScenario: 5,
  maxTurns: 10,
  concurrency: 10,

  // Evaluation
  metrics: [
    'helpfulness',
    'coherence',
    'relevance',
    'faithfulness',
    'goal_completion',
    'agent_behavior_failure',
    'tool_call_behavior_failure',
  ],

  thresholds: {
    helpfulness: 3.5,
    coherence: 4.0,
    goal_completion: 0.8,
  },

  failOnErrorSeverity: 'high',

  customMetrics: [new ToneMetric()],

  // File discovery
  include: ['tests/**/*.sim.ts'],
  exclude: ['node_modules/**', 'dist/**'],

  // Output
  reporters: ['console', 'json', 'github-actions'],

  // Mock behavior
  unmockedTools: 'error',
})

Next Steps

Scenarios - Write test scenarios
Mocks - Control tool behavior
Evaluation Metrics - Understand quality metrics
Configuration API - Complete API reference
YAML Config - Use YAML instead of TypeScript

Configuration ​

Configuration File ​

Agent Configuration ​

Chat Completions (HTTP Endpoint) ​

Required Fields ​

Optional Fields ​

Environment Variable Interpolation ​

Request Body Merging ​

Streaming Support ​

Custom Handler (In-Process) ​

Handler Signature ​

When to Use Custom Handlers ​

LLM Provider ​

Cloud Providers ​

Local Providers ​

Environment Variables for API Keys ​

Provider Options ​

Simulation Parameters ​

Evaluation ​

Available Metrics ​

Evaluation Options ​

File Discovery ​

Reporters ​

Mock Behavior ​

Comparison Mode ​

Full Config Example ​

Next Steps ​