Configuration
Complete guide to configuring Agentest.
Configuration File
Create agentest.config.ts (or agentest.config.yaml) in your project root. Agentest auto-detects the format.
import { defineConfig } from 'agentest'
export default defineConfig({
// Configuration options
})The defineConfig helper provides:
- TypeScript type safety and autocomplete
- Zod schema validation
- Environment variable interpolation (
${VAR}syntax)
For YAML configuration, see the YAML Config Guide.
Agent Configuration
Agentest supports two agent types: chat completions (HTTP endpoint) and custom (in-process handler function).
Chat Completions (HTTP Endpoint)
The default agent type. Your agent must expose an OpenAI-compatible chat completions endpoint.
agent: {
name: 'my-agent',
type: 'chat_completions', // default, can be omitted
endpoint: 'http://localhost:3000/api/chat',
// Optional: headers for authentication
headers: {
Authorization: 'Bearer ${AGENT_API_KEY}', // interpolated from process.env
'X-Custom-Header': 'value',
},
// Optional: merged into every request
body: {
model: 'gpt-4o',
temperature: 0.7,
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
],
},
}Required Fields
name(string) — Identifies your agent in test outputendpoint(string) — HTTP/HTTPS URL for your agent's chat endpoint
Optional Fields
type('chat_completions') — Default agent type (can be omitted)headers(Record<string, string>) — HTTP headers sent with every requestbody(Record<string, any>) — Shallow-merged into every request bodystreaming(boolean) — Enable Server-Sent Events (SSE) streaming mode
Environment Variable Interpolation
Headers support ${VAR} syntax for environment variables:
headers: {
Authorization: 'Bearer ${AGENT_API_KEY}',
'X-Org-ID': '${ORG_ID}',
}Important: If a referenced variable is not set in process.env, config loading fails with a clear error. This keeps secrets out of your config files and prevents accidental runs with missing credentials.
Set environment variables in your shell or CI:
export AGENT_API_KEY=your-key-here
export ORG_ID=org-123
npx agentest runRequest Body Merging
The body field is shallow-merged into every request:
agent: {
endpoint: 'http://localhost:3000/api/chat',
body: {
model: 'gpt-4o',
temperature: 0.7,
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
],
},
}Agentest sends:
{
"model": "gpt-4o",
"temperature": 0.7,
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "User's first message" },
...
]
}The messages array in body is prepended to conversation messages, which is useful for system prompts.
Streaming Support
Enable Server-Sent Events (SSE) streaming mode when your agent endpoint streams responses by default:
agent: {
name: 'streaming-agent',
endpoint: 'https://api.openai.com/v1/chat/completions',
streaming: true,
headers: {
Authorization: 'Bearer ${OPENAI_API_KEY}',
},
body: {
model: 'gpt-4o',
},
}When streaming: true:
- Agentest sends
stream: truein the request body - Parses the chunked SSE response
- Accumulates
delta.contentanddelta.tool_callsinto a complete message - Returns the full message to the simulation loop
This works with any endpoint that follows OpenAI's streaming format:
data: {"choices":[{"delta":{"content":"Hello"}}]}
data: {"choices":[{"delta":{"content":" world"}}]}
data: [DONE]Custom Handler (In-Process)
Use type: 'custom' to connect any agent that doesn't have an OpenAI-compatible HTTP endpoint:
import { defineConfig, type ChatMessage } from 'agentest'
import { myAgent } from './src/agent.js'
export default defineConfig({
agent: {
type: 'custom',
name: 'my-agent',
handler: async (messages: ChatMessage[]) => {
// Call your agent however you want
const result = await myAgent.chat(messages)
return {
role: 'assistant' as const,
content: result.text,
// Optional: return tool calls if your agent uses tools
tool_calls: [
{
id: 'call_123',
type: 'function',
function: {
name: 'get_weather',
arguments: '{"location":"Seattle"}',
},
},
],
}
},
},
})Handler Signature
type AgentHandler = (messages: ChatMessage[]) => Promise<ChatMessage>
interface ChatMessage {
role: 'system' | 'user' | 'assistant' | 'tool'
content: string
tool_calls?: ToolCall[] // if role is 'assistant'
tool_call_id?: string // if role is 'tool'
}The handler:
- Receives the full message history (same format used internally)
- Must return an assistant message
- If the response includes
tool_calls, Agentest runs them through mocks and calls your handler again with the tool results
When to Use Custom Handlers
- Non-OpenAI APIs — Anthropic, Google, custom protocols
- In-process agents — No HTTP server needed
- Framework SDKs — LangChain, Vercel AI SDK, Mastra, etc.
- Custom request/response mapping — Your agent uses a different format
See Framework Integration for examples with LangChain, Vercel AI SDK, and more.
LLM Provider
The LLM provider is used for the simulated user and evaluation judges — not for your agent.
Cloud Providers
// Anthropic (default)
provider: 'anthropic',
model: 'claude-sonnet-4-20250514',
// OpenAI
provider: 'openai',
model: 'gpt-4o',
// Google
provider: 'google',
model: 'gemini-2.0-flash',Local Providers
// Ollama (defaults to http://localhost:11434/v1)
provider: 'ollama',
model: 'llama3.2',
// OpenAI-compatible (LM Studio, vLLM, llama.cpp)
provider: 'openai-compatible',
model: 'my-model',
providerOptions: {
baseURL: 'http://localhost:1234/v1', // required
apiKey: 'optional-key', // optional, defaults to 'not-needed'
},Environment Variables for API Keys
Cloud providers read API keys from standard environment variables:
export ANTHROPIC_API_KEY=your-key-here
export OPENAI_API_KEY=your-key-here
export GOOGLE_GENERATIVE_AI_API_KEY=your-key-hereYou don't need to configure these in Agentest — they're read automatically.
Provider Options
| Option | Type | Default | Description |
|---|---|---|---|
provider | string | 'anthropic' | 'anthropic', 'openai', 'google', 'ollama', 'openai-compatible' |
model | string | 'claude-sonnet-4-20250514' | Model ID passed to the provider |
providerOptions.baseURL | string | — | Base URL override. Required for openai-compatible, optional for ollama |
providerOptions.apiKey | string | — | API key override. For local LLMs this is typically not needed |
See Local LLMs for more details on running with local models.
Simulation Parameters
Control how many conversations to run and how they're executed:
conversationsPerScenario: 3, // run 3 independent conversations per scenario
maxTurns: 8, // max user↔agent exchanges per conversation
concurrency: 20, // max parallel LLM calls across all scenarios| Option | Type | Default | Description |
|---|---|---|---|
conversationsPerScenario | number | 3 | How many independent conversations to run per scenario. More conversations = more statistical confidence, but more LLM cost. |
maxTurns | number | 8 | Upper bound on user↔agent exchanges. The simulated user can stop earlier if the goal is met. |
concurrency | number | 20 | Max parallel LLM calls. Controls both simulation and evaluation parallelism. Lower this if you're hitting rate limits. |
Both conversationsPerScenario and maxTurns can be overridden per-scenario:
scenario('complex workflow', {
conversationsPerScenario: 10, // needs more confidence
maxTurns: 15, // needs more turns
// ...
})Evaluation
Configure which metrics to run and their pass/fail thresholds:
// Run only specific metrics (default: all)
metrics: [
'helpfulness',
'coherence',
'relevance',
'faithfulness',
'goal_completion',
'agent_behavior_failure',
],
// Set minimum score thresholds (fail the run if average is below)
thresholds: {
helpfulness: 3.5, // average must be >= 3.5
goal_completion: 0.7, // 70%+ of conversations must complete goal
coherence: 4.0,
},
// Minimum error severity that fails a scenario
failOnErrorSeverity: 'critical', // 'low' | 'medium' | 'high' | 'critical'
// Custom metrics (advanced)
customMetrics: [new ToneMetric(), new BrandVoiceMetric()],Available Metrics
Quantitative (1-5 scale):
helpfulness— How effectively the agent addresses user needscoherence— Logical flow and consistencyrelevance— How on-topic responses arefaithfulness— No contradictions with knowledge or tool resultsverbosity— Appropriate response length
Goal Completion (0/1):
goal_completion— Was the goal fully achieved?
Qualitative (failure detection):
agent_behavior_failure— Detects repetition, false info, unsafe actionstool_call_behavior_failure— Wrong tools, missing calls, ignored results
Evaluation Options
| Option | Type | Default | Description |
|---|---|---|---|
metrics | string[] | all 8 metrics | Which evaluation metrics to run. Omit to run all. |
thresholds | Record<string, number> | {} | Minimum average scores per metric. Values are 0-5 for quantitative, 0-1 for goal_completion. If any metric's average falls below its threshold, the run fails. |
failOnErrorSeverity | string | 'critical' | Minimum error severity that fails a scenario. One of 'low', 'medium', 'high', 'critical'. |
customMetrics | Metric[] | [] | Custom metric instances (see Custom Metrics). |
See Evaluation Metrics for detailed explanations of each metric.
File Discovery
Configure which files to search for scenarios:
include: ['**/*.sim.ts'], // glob patterns for scenario files
exclude: ['node_modules/**'], // glob patterns to ignore| Option | Type | Default | Description |
|---|---|---|---|
include | string[] | ['**/*.sim.ts'] | Glob patterns to find scenario files. Relative to the working directory. |
exclude | string[] | ['node_modules/**'] | Glob patterns to exclude from discovery. |
Scenario files are TypeScript files that call scenario() when imported. Agentest uses jiti to import them directly — no build step needed.
Reporters
Configure output formats:
reporters: ['console', 'json', 'github-actions'],| Reporter | Description |
|---|---|
console | Colored pass/fail output with live progress spinner, metric scores, and error summaries. Default. In non-TTY environments (CI), falls back to line-by-line progress output. |
json | Writes full results to .agentest/results.json including all turns, evaluations, and errors. Useful for programmatic analysis and custom reporting. |
github-actions | Writes a markdown summary table to $GITHUB_STEP_SUMMARY and emits ::error/::warning/::notice annotations that surface inline on PRs. |
See the Reporters Guide for detailed documentation.
Mock Behavior
Control what happens when the agent calls a tool that has no mock defined:
unmockedTools: 'error', // default: throw AgentestError
unmockedTools: 'passthrough', // return undefined (no-op)| Setting | Behavior |
|---|---|
'error' (default) | Throws AgentestError with a helpful message suggesting how to add the mock. The conversation is recorded as an error. |
'passthrough' | Returns undefined as the tool result. The agent sees null in the response. |
'error' mode is recommended — it catches unexpected tool usage early. Switch to 'passthrough' if your agent uses many tools and you only want to mock a subset.
See Mocks Guide for more details.
Comparison Mode
Run the same scenarios against multiple models or agent configurations side-by-side:
export default defineConfig({
agent: {
name: 'gpt-4o',
endpoint: 'http://localhost:3000/api/chat',
body: { model: 'gpt-4o' },
},
// Each entry inherits endpoint, headers, body from agent above
// Only specify what differs
compare: [
{ name: 'gpt-4o-mini', body: { model: 'gpt-4o-mini' } },
{ name: 'claude-sonnet', body: { model: 'claude-sonnet-4-20250514' } },
],
})See Comparison Mode for examples and output.
Full Config Example
import { defineConfig } from 'agentest'
import { ToneMetric } from './metrics/tone.js'
export default defineConfig({
// Agent configuration
agent: {
name: 'support-bot',
endpoint: 'http://localhost:3000/api/chat',
streaming: false,
headers: {
Authorization: 'Bearer ${AGENT_API_KEY}',
'X-Org-ID': '${ORG_ID}',
},
body: {
model: 'gpt-4o',
temperature: 0.7,
messages: [
{ role: 'system', content: 'You are a customer support agent for Acme Inc.' },
],
},
},
// LLM provider for simulation and evaluation
provider: 'anthropic',
model: 'claude-sonnet-4-20250514',
// Simulation settings
conversationsPerScenario: 5,
maxTurns: 10,
concurrency: 10,
// Evaluation
metrics: [
'helpfulness',
'coherence',
'relevance',
'faithfulness',
'goal_completion',
'agent_behavior_failure',
'tool_call_behavior_failure',
],
thresholds: {
helpfulness: 3.5,
coherence: 4.0,
goal_completion: 0.8,
},
failOnErrorSeverity: 'high',
customMetrics: [new ToneMetric()],
// File discovery
include: ['tests/**/*.sim.ts'],
exclude: ['node_modules/**', 'dist/**'],
// Output
reporters: ['console', 'json', 'github-actions'],
// Mock behavior
unmockedTools: 'error',
})Next Steps
- Scenarios - Write test scenarios
- Mocks - Control tool behavior
- Evaluation Metrics - Understand quality metrics
- Configuration API - Complete API reference
- YAML Config - Use YAML instead of TypeScript