Mocks

Control tool behavior with mocks and sequences. Mocks intercept tool calls from your agent and return controlled responses, allowing you to test agent behavior without real external services.

Overview

When your agent calls a tool during simulation, Agentest intercepts the call and resolves it through your mock instead of calling the real tool. This gives you complete control over:

What data the agent receives
How tools behave in different scenarios
Error conditions and edge cases
Multi-step workflows with sequences

No instrumentation needed — your agent code doesn't change. Agentest owns the tool-call loop and transparently injects mock results.

Function Mocks

The most common type of mock. Define a function that receives the tool arguments and returns the mock result:

mocks: {
  tools: {
    get_weather: (args) => ({
      temperature: 72,
      condition: 'sunny',
      location: args.city,
    }),
  },
},

Function Signature

type ToolMockFn = (
  args: Record<string, unknown>,
  ctx: MockContext
) => unknown | Promise<unknown>

Parameters:

Parameter	Type	Description
`args`	`Record<string, unknown>`	Parsed tool call arguments from the agent
`ctx`	`MockContext`	Context about the current call (see below)

Returns: The mock result (any JSON-serializable value, or a Promise)

Using Arguments

Access the agent's tool call arguments to return appropriate data:

mocks: {
  tools: {
    search_products: (args) => {
      // args.query, args.category, args.maxResults, etc.
      if (args.query === 'laptop') {
        return {
          results: [
            { id: 1, name: 'MacBook Pro', price: 2399 },
            { id: 2, name: 'ThinkPad X1', price: 1899 },
          ],
          total: 2,
        }
      }
      return { results: [], total: 0 }
    },
  },
},

Async Mocks

Mocks can be async to simulate latency or perform conditional logic:

mocks: {
  tools: {
    search_database: async (args) => {
      // Simulate network latency
      await new Promise(resolve => setTimeout(resolve, 100))

      // Conditional logic
      if (args.query === 'not found') {
        return { results: [] }
      }

      return {
        results: [
          { id: 1, title: 'Result 1' },
          { id: 2, title: 'Result 2' },
        ],
      }
    },
  },
},

Mock Context

Every mock receives a ctx parameter with information about the current call:

interface MockContext {
  callIndex: number       // How many times this tool has been called (0-indexed)
  conversationId: string  // Current conversation ID (e.g., 'conv-1-abc123')
  turnIndex: number       // Current turn number (0-indexed)
}

Use Cases for Context

1. Return different data on successive calls:

mocks: {
  tools: {
    check_status: (args, ctx) => {
      if (ctx.callIndex === 0) {
        return { status: 'processing' }
      } else if (ctx.callIndex === 1) {
        return { status: 'complete' }
      }
      return { status: 'archived' }
    },
  },
},

2. Identify specific conversations:

mocks: {
  tools: {
    get_user_id: (args, ctx) => ({
      userId: `user-${ctx.conversationId}`,
      // Each conversation gets a unique user ID
    }),
  },
},

3. Track turn-specific behavior:

mocks: {
  tools: {
    log_event: (args, ctx) => {
      console.log(`Turn ${ctx.turnIndex}: ${args.event}`)
      return { logged: true }
    },
  },
},

Important: ctx.callIndex resets to 0 at the start of each conversation, so every conversation gets a fresh call count.

Sequence Mocks

Use sequence() to return different values on successive calls. This is perfect for testing multi-step workflows.

import { sequence } from '@agentesting/agentest'

mocks: {
  tools: {
    create_order: sequence([
      { success: true, orderId: 'ORD-001' },   // first call
      { success: false, error: 'duplicate' },  // second call
      { success: true, orderId: 'ORD-002' },   // third call
    ]),
  },
},

How Sequences Work

First call returns the first value
Second call returns the second value
Third call returns the third value
Fourth call and beyond repeat the last value

Example:

create_order: sequence([
  { success: true, orderId: 'ORD-001' },
  { success: true, orderId: 'ORD-002' },
])

// Calls:
// 1st call → { success: true, orderId: 'ORD-001' }
// 2nd call → { success: true, orderId: 'ORD-002' }
// 3rd call → { success: true, orderId: 'ORD-002' }  ← repeats last
// 4th call → { success: true, orderId: 'ORD-002' }  ← repeats last

Sequence Reset Behavior

Sequences reset at the start of each conversation, so every conversation starts with the first value.

create_order: sequence([
  { success: true, orderId: 'ORD-001' },
  { success: false, error: 'duplicate' },
])

// Conversation 1:
//   1st call → ORD-001
//   2nd call → duplicate error

// Conversation 2:  ← resets
//   1st call → ORD-001  ← back to first value
//   2nd call → duplicate error

// Conversation 3:  ← resets again
//   1st call → ORD-001

This ensures consistent behavior across all conversations in a scenario.

When to Use Sequences

Use sequences when:

Testing retry logic (first call fails, second succeeds)
Simulating progressive state changes (pending → processing → complete)
Testing workflows that require multiple tool calls
Verifying the agent handles changing data correctly

Example: Testing retry logic

scenario('agent retries on failure', {
  profile: 'User trying to book an appointment.',
  goal: 'Successfully book an appointment.',

  mocks: {
    tools: {
      // First call fails, second succeeds
      create_booking: sequence([
        { success: false, error: 'Server busy, please retry' },
        { success: true, bookingId: 'BK-001' },
      ]),
    },
  },

  assertions: {
    toolCalls: {
      matchMode: 'strict',
      expected: [
        { name: 'create_booking', argMatchMode: 'ignore' },  // first attempt
        { name: 'create_booking', argMatchMode: 'ignore' },  // retry
      ],
    },
  },
})

Error Simulation

Throw from a mock to simulate a tool failure. The error message is injected back to the agent as a tool result.

mocks: {
  tools: {
    flaky_api: (args) => {
      throw new Error('Connection timeout')
    },
  },
},

The agent receives:

json

{
  "error": "Connection timeout"
}

Conditional Errors

Use context to simulate intermittent failures:

mocks: {
  tools: {
    unreliable_service: (args, ctx) => {
      // Fail on first call, succeed on retry
      if (ctx.callIndex === 0) {
        throw new Error('Service temporarily unavailable')
      }
      return { success: true, data: 'result' }
    },
  },
},

Testing Error Handling

Use errors to verify the agent:

Retries appropriately
Provides helpful error messages to the user
Doesn't crash or give up immediately
Falls back to alternative tools

Example:

scenario('agent handles API errors gracefully', {
  profile: 'User asking for weather information.',
  goal: 'Get the weather forecast.',

  mocks: {
    tools: {
      get_weather_api: (args, ctx) => {
        // First attempt fails
        if (ctx.callIndex === 0) {
          throw new Error('API rate limit exceeded')
        }
        // Retry succeeds
        return { temperature: 72, condition: 'sunny' }
      },
    },
  },
})

The agent should:

Call get_weather_api → receive error
Inform user about the issue
Retry the call → receive successful result
Provide weather information to user

Unmocked Tools

What happens when the agent calls a tool that has no mock defined?

Error Mode (default)

unmockedTools: 'error',  // default

Agentest throws an AgentestError with a helpful message:

ERROR: Tool 'send_confirmation_email' was called but no mock is defined.

Add a mock in your scenario:

mocks: {
  tools: {
    send_confirmation_email: (args) => ({
      success: true,
      messageId: 'msg-123',
    }),
  },
}

The conversation is recorded as an error and the scenario fails.

When to use:

You want to catch unexpected tool usage
You're mocking all tools the agent should call
You want strict control over tool behavior

Passthrough Mode

unmockedTools: 'passthrough',

Agentest returns undefined as the tool result. The agent sees null in the response.

When to use:

Your agent uses many tools and you only want to mock a subset
You're testing specific tool interactions and don't care about others
The agent can handle null tool results gracefully

Example:

export default defineConfig({
  unmockedTools: 'passthrough',
  // ...
})

scenario('test booking flow', {
  mocks: {
    tools: {
      // Only mock the critical tools
      check_availability: (args) => ({ available: true }),
      create_booking: (args) => ({ success: true }),
      // send_email, log_analytics, etc. will return undefined
    },
  },
})

Advanced Mock Patterns

Stateful Mocks

Track state across calls within a conversation:

// Outside the scenario definition
const cartState = new Map<string, any[]>()

scenario('user manages shopping cart', {
  mocks: {
    tools: {
      add_to_cart: (args, ctx) => {
        const cart = cartState.get(ctx.conversationId) || []
        cart.push(args.item)
        cartState.set(ctx.conversationId, cart)
        return { cartSize: cart.length }
      },

      get_cart: (args, ctx) => {
        const cart = cartState.get(ctx.conversationId) || []
        return { items: cart, total: cart.length }
      },
    },
  },
})

Note: State persists across turns within a conversation but not across conversations.

Dynamic Data Generation

Generate realistic data based on arguments:

mocks: {
  tools: {
    search: (args) => ({
      results: Array.from({ length: args.limit || 5 }, (_, i) => ({
        id: i + 1,
        title: `Result for "${args.query}" #${i + 1}`,
        score: Math.random(),
      })),
      query: args.query,
    }),
  },
},

Combining Sequences and Functions

You can't use both directly, but you can wrap a sequence in a function:

import { sequence } from '@agentesting/agentest'

const statusSequence = sequence(['pending', 'processing', 'complete'])

mocks: {
  tools: {
    check_status: (args, ctx) => {
      const status = statusSequence.next()  // Note: this won't work as shown
      return { status, orderId: args.orderId }
    },
  },
}

Better approach: Use context-based logic instead:

mocks: {
  tools: {
    check_status: (args, ctx) => {
      const statuses = ['pending', 'processing', 'complete']
      const status = statuses[Math.min(ctx.callIndex, statuses.length - 1)]
      return { status, orderId: args.orderId }
    },
  },
},

Testing Tool Call Order

Mocks work seamlessly with trajectory assertions:

scenario('user completes checkout', {
  profile: 'Customer ready to purchase.',
  goal: 'Complete the checkout process.',

  mocks: {
    tools: {
      validate_cart: (args) => ({ valid: true, total: 49.99 }),
      process_payment: (args) => ({ success: true, transactionId: 'TXN-001' }),
      send_confirmation: (args) => ({ sent: true, emailId: 'EMAIL-001' }),
    },
  },

  assertions: {
    toolCalls: {
      matchMode: 'strict',  // exact order required
      expected: [
        { name: 'validate_cart', argMatchMode: 'ignore' },
        { name: 'process_payment', argMatchMode: 'ignore' },
        { name: 'send_confirmation', argMatchMode: 'ignore' },
      ],
    },
  },
})

If the agent calls tools in the wrong order (e.g., process_payment before validate_cart), the trajectory assertion fails.

See Trajectory Assertions for match modes and argument matching.

Best Practices

1. Return Realistic Data

Mocks should return data that matches what the real tool would return:

// Good: realistic structure
get_user: (args) => ({
  id: args.userId,
  name: 'Alice Johnson',
  email: 'alice@example.com',
  preferences: {
    notifications: true,
    theme: 'dark',
  },
})

// Bad: minimal/unrealistic
get_user: (args) => ({ name: 'User' })

2. Test Edge Cases

Use mocks to test scenarios that are hard to reproduce with real tools:

mocks: {
  tools: {
    check_inventory: (args) => {
      // Test out-of-stock scenario
      if (args.productId === 'SKU-123') {
        return { available: false, stock: 0 }
      }
      // Test low stock warning
      if (args.productId === 'SKU-456') {
        return { available: true, stock: 2, lowStockWarning: true }
      }
      return { available: true, stock: 100 }
    },
  },
},

3. Use Knowledge to Document Expected Behavior

Combine knowledge items with mocks to set clear expectations:

scenario('test discount application', {
  knowledge: [
    { content: 'VIP customers get 20% discount on all orders.' },
    { content: 'Discount code SAVE10 gives 10% off.' },
  ],

  mocks: {
    tools: {
      apply_discount: (args) => {
        if (args.code === 'SAVE10') {
          return { discountPercent: 10, finalPrice: 44.99 }
        }
        if (args.customerType === 'VIP') {
          return { discountPercent: 20, finalPrice: 39.99 }
        }
        return { discountPercent: 0, finalPrice: 49.99 }
      },
    },
  },
})

The faithfulness metric will check if the agent's responses match the discount logic.

4. Keep Mocks Simple

Avoid complex logic in mocks — they should be predictable and easy to understand:

// Good: simple and clear
get_weather: (args) => ({
  temperature: 72,
  condition: 'sunny',
})

// Avoid: complex logic that's hard to reason about
get_weather: (args) => {
  const day = new Date(args.date).getDay()
  const isWeekend = day === 0 || day === 6
  const baseTemp = isWeekend ? 75 : 68
  const variance = Math.random() * 10
  // ... 20 more lines
}

If you need complex behavior, consider multiple scenarios instead.

5. Document Why, Not What

Add comments for non-obvious mock behavior:

mocks: {
  tools: {
    // Simulates eventual consistency — first call returns stale data
    get_order_status: (args, ctx) => {
      if (ctx.callIndex === 0) {
        return { status: 'pending' }  // stale cache
      }
      return { status: 'shipped' }  // updated status
    },
  },
},

Common Patterns

Pattern: Testing Idempotency

Verify the agent doesn't call the same tool redundantly:

let callCount = 0

mocks: {
  tools: {
    get_config: (args) => {
      callCount++
      return { setting: 'value' }
    },
  },
},

// After simulation, check callCount === 1

Better: use trajectory assertions with strict mode to catch repeated calls.

Pattern: Multi-Step Workflows

mocks: {
  tools: {
    start_process: (args) => ({ processId: 'PROC-001', status: 'started' }),
    check_progress: sequence([
      { processId: 'PROC-001', status: 'running', progress: 33 },
      { processId: 'PROC-001', status: 'running', progress: 66 },
      { processId: 'PROC-001', status: 'complete', progress: 100 },
    ]),
    finalize: (args) => ({ success: true, result: 'data' }),
  },
}

Pattern: Dependent Tool Calls

Mock results can reference data from previous calls:

mocks: {
  tools: {
    create_session: (args) => ({
      sessionId: 'SESSION-123',
      expiresAt: '2026-03-30T12:00:00Z',
    }),

    // Expects sessionId from create_session
    upload_file: (args) => {
      if (args.sessionId !== 'SESSION-123') {
        throw new Error('Invalid session')
      }
      return { fileId: 'FILE-456', uploaded: true }
    },
  },
}

Next Steps

Trajectory Assertions - Verify tool call sequences
Scenarios - Define test scenarios with mocks
Evaluation Metrics - Understand how mocks affect evaluation
Examples - See complex mock patterns in action

Mocks ​

Overview ​

Function Mocks ​

Function Signature ​

Using Arguments ​

Async Mocks ​

Mock Context ​

Use Cases for Context ​

Sequence Mocks ​

How Sequences Work ​

Sequence Reset Behavior ​

When to Use Sequences ​

Error Simulation ​

Conditional Errors ​

Testing Error Handling ​

Unmocked Tools ​

Error Mode (default) ​

Passthrough Mode ​

Advanced Mock Patterns ​

Stateful Mocks ​

Dynamic Data Generation ​

Combining Sequences and Functions ​

Testing Tool Call Order ​

Best Practices ​

1. Return Realistic Data ​

2. Test Edge Cases ​

3. Use Knowledge to Document Expected Behavior ​

4. Keep Mocks Simple ​

5. Document Why, Not What ​

Common Patterns ​

Pattern: Testing Idempotency ​

Pattern: Multi-Step Workflows ​

Pattern: Dependent Tool Calls ​

Next Steps ​

Mocks

Overview

Function Mocks

Function Signature

Using Arguments

Async Mocks

Mock Context

Use Cases for Context

Sequence Mocks

How Sequences Work

Sequence Reset Behavior

When to Use Sequences

Error Simulation

Conditional Errors

Testing Error Handling

Unmocked Tools

Error Mode (default)

Passthrough Mode

Advanced Mock Patterns

Stateful Mocks

Dynamic Data Generation

Combining Sequences and Functions

Testing Tool Call Order

Best Practices

1. Return Realistic Data

2. Test Edge Cases

3. Use Knowledge to Document Expected Behavior

4. Keep Mocks Simple

5. Document Why, Not What

Common Patterns

Pattern: Testing Idempotency

Pattern: Multi-Step Workflows

Pattern: Dependent Tool Calls

Next Steps