Time-Travel Debugging
Capture execution snapshots at any step, replay from checkpoints, fork alternate timelines, and compare runs side-by-side to debug agent behavior.
Overview
Time-travel debugging lets you capture snapshots of agent execution at any point, then replay, fork, or compare runs. Instead of re-running an entire agent from scratch to debug a failure at step 47, you load the checkpoint at step 46 and explore what happens with different inputs, tools, or context.
Setting Up TimeTravel
import { Cogitator, Agent } from '@cogitator-ai/core';
import { TimeTravel } from '@cogitator-ai/core/time-travel';
const cogitator = new Cogitator({
/* ... */
});
const timeTravel = new TimeTravel(cogitator, {
config: {
enabled: true,
autoCheckpoint: true,
maxCheckpointsPerTrace: 50,
checkpointInterval: 1,
},
});Creating Checkpoints
Checkpoints capture the full state at a given step: messages, tool results, and pending tool calls.
const result = await cogitator.run(agent, { input: 'Research quantum computing advances' });
// checkpoint a specific step
const checkpoint = await timeTravel.checkpoint(result, 3, 'after-web-search');
// checkpoint every step
const allCheckpoints = await timeTravel.checkpointAll(result, 'research-run');
// checkpoint every N steps
const sparseCheckpoints = await timeTravel.checkpointEvery(result, 2, 'sparse');Each checkpoint stores:
- The conversation messages up to that step
- All tool results collected so far
- Any pending tool calls at that exact moment
- An optional label for easy retrieval
Browsing Checkpoints
Retrieve checkpoints by trace, agent, or label:
const checkpoints = await timeTravel.getCheckpoints(result.trace.traceId);
for (const cp of checkpoints) {
console.log(`Step ${cp.stepIndex}: ${cp.label ?? 'unlabeled'}`);
console.log(` Messages: ${cp.messages.length}`);
console.log(` Tool results: ${Object.keys(cp.toolResults).length}`);
console.log(` Created: ${cp.createdAt.toISOString()}`);
}
const specific = await timeTravel.getCheckpoint('ckpt_abc123');Replaying from a Checkpoint
Replay re-executes the agent from a saved checkpoint. Two modes are available:
Deterministic Replay
Returns the exact state at the checkpoint without making any new LLM calls:
const replay = await timeTravel.replayDeterministic(agent, checkpoint.id);
console.log('Output:', replay.output);
console.log('Steps replayed:', replay.stepsReplayed);
console.log('Steps executed:', replay.stepsExecuted); // 0 in deterministic modeLive Replay
Resumes execution from the checkpoint, letting the agent continue with fresh LLM calls:
const replay = await timeTravel.replayLive(agent, checkpoint.id);
console.log('Output:', replay.output);
console.log('Steps replayed:', replay.stepsReplayed);
console.log('Steps executed:', replay.stepsExecuted);
if (replay.divergedAt !== undefined) {
console.log(`Execution diverged at step ${replay.divergedAt}`);
}The divergence point tells you exactly where the new execution took a different path from the original.
Forking Execution
Forking creates an alternate timeline from a checkpoint with modified conditions. This is the core debugging primitive.
Fork with Additional Context
Inject new information into the system prompt:
const fork = await timeTravel.forkWithContext(
agent,
checkpoint.id,
'The user is a premium subscriber with access to advanced features',
'premium-context'
);
console.log('Fork output:', fork.result.output);Fork with Mocked Tools
Override tool results to test "what if" scenarios:
const fork = await timeTravel.forkWithMockedTool(
agent,
checkpoint.id,
'web_search',
{ results: [], error: 'Service unavailable' },
'search-failure'
);
// mock multiple tools at once
const multiFork = await timeTravel.forkWithMockedTools(
agent,
checkpoint.id,
{
web_search: { results: [] },
database_query: { rows: [], error: 'Connection timeout' },
},
'all-services-down'
);Fork with New Input
Change the user's original question while preserving all context up to the checkpoint:
const fork = await timeTravel.forkWithNewInput(
agent,
checkpoint.id,
'Now focus specifically on quantum error correction',
'refined-question'
);Fork Multiple Variants
Explore several alternatives from the same checkpoint in one call:
const forks = await timeTravel.forkMultiple(agent, checkpoint.id, [
{ additionalContext: 'Focus on practical applications', label: 'practical' },
{ additionalContext: 'Focus on theoretical foundations', label: 'theoretical' },
{ mockToolResults: { web_search: { results: [] } }, label: 'no-search' },
{ input: 'Explain quantum entanglement instead', label: 'different-topic' },
]);
for (const fork of forks) {
console.log(`${fork.checkpoint.label}: ${fork.result.output.slice(0, 100)}...`);
}Comparing Runs
The TraceComparator produces structured diffs between any two execution traces:
const diff = await timeTravel.compare(originalTraceId, replayTraceId);
console.log(`Common steps: ${diff.commonSteps}`);
console.log(`Only in original: ${diff.trace1OnlySteps}`);
console.log(`Only in replay: ${diff.trace2OnlySteps}`);
if (diff.divergencePoint !== undefined) {
console.log(`Diverged at step ${diff.divergencePoint}`);
}
console.log(`Score delta: ${diff.metricsDiff.score.delta}`);
console.log(`Token delta: ${diff.metricsDiff.tokens.delta}`);
console.log(`Duration delta: ${diff.metricsDiff.duration.delta}ms`);After a replay, compare directly against the original:
const diff = await timeTravel.compareWithOriginal(replayResult);
console.log(timeTravel.formatDiff(diff));The formatted diff output looks like:
═══════════════════════════════════════════
TRACE COMPARISON
═══════════════════════════════════════════
Trace 1: trace_abc123
Trace 2: trace_def456
⚠ Traces diverged at step 3
─── Summary ───
Common steps: 3
Only in trace 1: 2
Only in trace 2: 1
─── Metrics ───
Success: true → true
Score: 0.850 → 0.920 (+0.070)
Tokens: 4200 → 3800 (-400)
Duration: 2300ms → 1900ms (-400ms)
─── Step Differences ───
✗ Step 3: different
└─ Tool: web_search → database_query
≈ Step 4: similar
└─ LLM response differsStep diff statuses: identical (exact match), similar (same structure, different LLM text), different (structural change), only_in_1, or only_in_2.
Debugging Workflow
A typical debugging session:
- Run the agent and observe the failure
- Create checkpoints for the entire run with
checkpointAll() - Identify the last good step by browsing checkpoints
- Fork from that step with different conditions to isolate the cause
- Compare the fork against the original to quantify the improvement
- Use insights to fix the agent's instructions or tool configuration