Why Test AI Agents?

AI agents are non-deterministic by nature — the same input can produce different outputs. Testing focuses on:

Behavioral correctness — does the agent call the right tools?
Tool execution — do tools produce expected results?
Workflow logic — do DAG nodes execute in the right order?
Swarm coordination — do agents communicate correctly?
Error handling — does the system recover gracefully?

Testing Strategy

Unit Tests

Test individual components in isolation:

import { tool } from '@cogitator-ai/core';
import { z } from 'zod';
import { describe, it, expect } from 'vitest';

const calculator = tool({
  name: 'calculator',
  description: 'Evaluate math expressions',
  parameters: z.object({ expression: z.string() }),
  execute: async ({ expression }) => {
    return { result: eval(expression) };
  },
});

describe('calculator tool', () => {
  it('evaluates simple expressions', async () => {
    const result = await calculator.execute({ expression: '2 + 2' });
    expect(result.result).toBe(4);
  });
});

Integration Tests with MockLLMBackend

Use the mock backend to test agent behavior without real LLM calls:

import { Cogitator, Agent } from '@cogitator-ai/core';
import { MockLLMBackend } from '@cogitator-ai/core/testing';

const mock = new MockLLMBackend({
  responses: [
    {
      text: 'I will check the weather.',
      toolCalls: [{ name: 'get_weather', arguments: { city: 'Tokyo' } }],
    },
    { text: 'The weather in Tokyo is sunny and 22°C.' },
  ],
});

const cogitator = new Cogitator({
  llm: { defaultProvider: 'mock', providers: { mock: { backend: mock } } },
});

const agent = new Agent({
  name: 'test-agent',
  instructions: 'You are a weather assistant.',
  tools: [weatherTool],
});

const result = await cogitator.run(agent, 'What is the weather in Tokyo?');
expect(result.text).toContain('Tokyo');
expect(mock.calls).toHaveLength(2);

Workflow Tests

Test workflow execution with deterministic node outputs:

import { WorkflowBuilder, WorkflowExecutor } from '@cogitator-ai/workflows';

const workflow = new WorkflowBuilder()
  .addNode(
    'fetch',
    functionNode(async () => ({ data: 'test' }))
  )
  .addNode(
    'process',
    functionNode(async (input) => ({ processed: true, ...input }))
  )
  .addEdge('fetch', 'process')
  .build();

const executor = new WorkflowExecutor(workflow);
const result = await executor.execute({});

expect(result.processed).toBe(true);
expect(result.data).toBe('test');

Running Tests

pnpm test              # run all tests
pnpm test --watch      # watch mode
pnpm test --coverage   # with coverage report

Testing Overview