CLI Reference
Complete reference for all Agentest CLI commands and options.
Commands
agentest run
Run all scenarios or filter by name.
npx agentest run [options]Options:
| Option | Alias | Description |
|---|---|---|
--config <path> | -c | Path to config file (default: auto-detect) |
--cwd <path> | Working directory (default: current directory) | |
--scenario <name> | -s | Filter scenarios by name (case-insensitive substring match) |
--verbose | -v | Print full conversation transcripts |
--watch | -w | Watch mode — re-run on file changes |
Examples:
# Run all scenarios
npx agentest run
# Custom config file (TypeScript or YAML)
npx agentest run --config path/to/config.ts
npx agentest run --config agentest.config.yaml
# Different working directory
npx agentest run --cwd ./packages/my-agent
# Filter scenarios by name
npx agentest run --scenario "booking"
npx agentest run --scenario "cancel"
# Print full conversation transcripts
npx agentest run --verbose
# Watch mode
npx agentest run --watch
npx agentest run -w
# Combine flags
npx agentest run --watch --verbose --scenario "booking"agentest show-prompts
Inspect the LLM judge prompts used for evaluation.
npx agentest show-prompts [options]Options:
| Option | Description |
|---|---|
--metric <name> | Show a specific metric's prompt |
Available metrics:
helpfulnesscoherencerelevancefaithfulnessverbositygoal_completionagent_behavior_failuretool_call_behavior_failureerror_deduplication
Examples:
# Show all prompts
npx agentest show-prompts
# Show a specific metric's prompt
npx agentest show-prompts --metric helpfulness
npx agentest show-prompts --metric agent_behavior_failureExit Codes
| Code | Meaning |
|---|---|
0 | All scenarios passed |
1 | One or more scenarios failed, or no scenarios found |
Use exit codes in CI to fail builds on test failures:
# GitHub Actions example
- name: Run agent tests
run: npx agentest run
# Fails the workflow if exit code is 1Watch Mode
Watch mode monitors:
- All
.sim.tsand.sim.jsscenario files - The
agentest.config.*config file
On any change, it clears the console, re-loads the config, re-discovers scenarios, and re-runs. Changes are debounced (300ms) to batch rapid saves.
npx agentest run --watchCombine with other flags:
npx agentest run --watch --scenario "booking" --verbosePress Ctrl+C to stop.
Configuration Discovery
Agentest auto-detects configuration files in this order:
agentest.config.tsagentest.config.yamlagentest.config.yml
You can override with --config:
npx agentest run --config custom-config.tsEnvironment Variables
Agentest uses environment variables for:
LLM provider API keys (for simulated user and evaluation):
ANTHROPIC_API_KEY(default provider)OPENAI_API_KEYGOOGLE_GENERATIVE_AI_API_KEY
Agent endpoint authentication (interpolated in config):
tsheaders: { Authorization: 'Bearer ${AGENT_API_KEY}' }
Set them in your shell or CI environment:
export ANTHROPIC_API_KEY=your-key-here
export AGENT_API_KEY=your-agent-key
npx agentest runReporters
Configure reporters in your config file:
export default defineConfig({
reporters: ['console', 'json', 'github-actions'],
})Console Reporter (default)
Colored pass/fail output with:
- Live progress spinner during simulation/evaluation
- Per-conversation results
- Metric scores
- Unique error summaries
- Threshold violations
In non-TTY environments (CI), falls back to line-by-line progress output.
JSON Reporter
Writes full results to .agentest/results.json:
{
"scenarios": [
{
"scenarioId": "...",
"scenarioName": "user books a morning slot",
"conversations": [...],
"summary": {...}
}
]
}Useful for:
- Programmatic analysis
- Custom reporting
- CI integrations
GitHub Actions Reporter
Writes markdown summary table to $GITHUB_STEP_SUMMARY and emits:
::errorannotations for critical failures::warningannotations for warnings::noticeannotations for informational messages
Annotations surface inline on PRs.
Next Steps
- Configuration API - Full configuration reference
- Scenario API - Scenario definition reference
- Metrics API - Custom metrics reference