Agent Learning & Optimization
Automatically improve agent performance over time with DSPy-style compilation, A/B testing, prompt monitoring, and metric-driven optimization.
Overview
The learning system captures execution traces, evaluates agent performance against metrics, and optimizes instructions and examples through an iterative compilation process. It brings ideas from DSPy — bootstrapped demonstrations, instruction optimization, and automated evaluation — into the Cogitator runtime.
AgentOptimizer
The AgentOptimizer is the central class that coordinates trace capture, demo selection, and instruction optimization:
import { AgentOptimizer } from '@cogitator-ai/core/learning';
const optimizer = new AgentOptimizer({
llm: backend,
model: 'openai/gpt-4o',
config: {
enabled: true,
captureTraces: true,
autoOptimize: false,
maxDemosPerAgent: 5,
minScoreForDemo: 0.8,
defaultMetrics: ['success', 'tool_accuracy', 'efficiency'],
},
});Capturing Traces
Every agent run can be captured as an ExecutionTrace with step-by-step details and computed metrics:
const result = await cogitator.run(agent, { input: 'Summarize this article...' });
const trace = await optimizer.captureTrace(result, 'Summarize this article...', {
expected: 'A concise 3-sentence summary covering key points',
labels: ['summarization', 'production'],
});
console.log(`Score: ${trace.score}`);
console.log(`Tool accuracy: ${trace.metrics.toolAccuracy}`);
console.log(`Efficiency: ${trace.metrics.efficiency}`);The trace records every LLM call, tool execution, and reflection — along with token usage, latency, and cost. Metrics like toolAccuracy (ratio of successful tool calls) and efficiency (token economy) are computed automatically.
Compiling (DSPy-Style Optimization)
The compile() method runs multi-round optimization that improves both instructions and in-context examples:
const result = await optimizer.compile(agent, trainset, {
maxRounds: 3,
maxBootstrappedDemos: 5,
optimizeInstructions: true,
});
console.log(`Score: ${result.scoreBefore} → ${result.scoreAfter}`);
console.log(`Improvement: ${(result.improvement * 100).toFixed(1)}%`);
console.log(`Demos added: ${result.demosAdded.length}`);
console.log(`New instructions: ${result.instructionsAfter}`);Each round: high-scoring traces become bootstrapped demos, the InstructionOptimizer analyzes failures and generates improved instruction candidates, and the best candidate is selected through LLM-based evaluation.
Bootstrapping Demos
Demos are high-quality execution examples that get injected into the agent's prompt as few-shot examples:
const demos = await optimizer.bootstrapDemos('research-agent');
const relevantDemos = await optimizer.getDemosForPrompt(
'research-agent',
'Find papers on transformer architectures',
3
);
const formatted = optimizer.formatDemosForPrompt(relevantDemos);Only traces scoring above minScoreForDemo (default 0.8) are promoted to demos. The DemoSelector picks the most relevant demos for each new input based on content similarity.
A/B Testing
Test instruction changes in production with statistical rigor using the ABTestingFramework:
import { ABTestingFramework } from '@cogitator-ai/core/learning';
const abTesting = new ABTestingFramework({
store: abTestStore,
defaultConfidenceLevel: 0.95,
defaultMinSampleSize: 50,
autoDeployWinner: false,
});
const test = await abTesting.createTest({
agentId: 'support-agent',
name: 'Concise vs detailed instructions',
controlInstructions: 'You are a helpful support agent...',
treatmentInstructions: 'You are a support agent. Be concise, use bullet points...',
treatmentAllocation: 0.5,
metricToOptimize: 'score',
});
await abTesting.startTest(test.id);For each incoming request, the framework selects a variant and tracks results:
const variant = abTesting.selectVariant(test);
const instructions = abTesting.getInstructionsForVariant(test, variant);
await abTesting.recordResult(test.id, variant, score, latency, cost);
const outcome = await abTesting.checkAndCompleteIfReady(test.id);
if (outcome?.isSignificant) {
console.log(`Winner: ${outcome.winner} (p=${outcome.pValue.toFixed(4)})`);
console.log(outcome.recommendation);
}The framework uses Welch's t-test for statistical significance with configurable confidence levels.
Prompt Monitoring
Track prompt performance in real-time and detect degradation with the PromptMonitor:
import { PromptMonitor } from '@cogitator-ai/core/learning';
const monitor = new PromptMonitor({
windowSize: 60 * 60 * 1000, // 1 hour window
scoreDropThreshold: 0.15, // 15% score drop triggers alert
latencySpikeThreshold: 2.0, // 2x latency increase
errorRateThreshold: 0.1, // 10% error rate
enableAutoRollback: true,
onAlert: (alert) => {
console.log(
`[${alert.severity}] ${alert.type}: ${alert.currentValue} vs baseline ${alert.baselineValue}`
);
},
});
monitor.setBaseline('support-agent', baselineMetrics);
const alerts = monitor.recordExecution(trace);
for (const alert of alerts) {
if (alert.severity === 'critical') {
console.log('Critical degradation detected!');
}
}The monitor tracks four alert types: score_drop, latency_spike, error_rate_increase, and cost_spike. When enableAutoRollback is set, critical alerts trigger automatic rollback to the previous instruction version.
AutoOptimizer
The AutoOptimizer ties everything together into a fully automated improvement loop:
import { AutoOptimizer } from '@cogitator-ai/core/learning';
const autoOptimizer = new AutoOptimizer({
enabled: true,
triggerAfterRuns: 100,
minRunsForOptimization: 20,
requireABTest: true,
maxOptimizationsPerDay: 3,
agentOptimizer: optimizer,
abTesting,
monitor,
rollbackManager,
onOptimizationComplete: (run) => {
console.log(`Optimization ${run.id}: ${run.status}`);
},
});
await autoOptimizer.recordExecution(trace);After every N runs, the AutoOptimizer triggers compilation, creates an A/B test for the new instructions, monitors performance, and either deploys the winner or rolls back — all without manual intervention.
Learning Stats
const stats = await optimizer.getStats('support-agent');
console.log(`Traces: ${stats.traces.total}`);
console.log(`Demos: ${stats.demos.total}`);
console.log(`Optimization runs: ${stats.optimization.runsOptimized}`);
console.log(`Avg improvement: ${(stats.optimization.averageImprovement * 100).toFixed(1)}%`);Tree-of-Thought Reasoning
Explore multiple reasoning paths simultaneously using the ThoughtTreeExecutor, evaluate branches with configurable strategies, and select the best reasoning chain.
Constitutional AI
Define safety principles, filter harmful inputs and outputs, guard tool calls, and apply critique-revise loops to ensure agents behave within defined boundaries.