Context Management
Automatically manage conversation length with truncation, summarization, sliding window, and smart hybrid compression strategies.
Overview
Long-running agents accumulate conversation history that can exceed a model's context window. The ContextManager automatically detects when messages are approaching the limit and compresses them using one of four strategies -- keeping agents running indefinitely without losing critical context.
import { Cogitator } from '@cogitator-ai/core';
const cog = new Cogitator({
llm: { providers: { openai: { apiKey: process.env.OPENAI_API_KEY! } } },
context: {
enabled: true,
strategy: 'hybrid',
compressionThreshold: 0.8,
outputReserve: 0.15,
windowSize: 10,
},
});Context management is built into the runtime -- when enabled, it runs transparently before each LLM call.
Configuration
interface ContextManagerConfig {
enabled?: boolean; // default: true
strategy?: 'truncate' | 'sliding-window' | 'summarize' | 'hybrid';
compressionThreshold?: number; // 0-1, triggers at this % of max tokens (default: 0.8)
outputReserve?: number; // fraction of context reserved for output (default: 0.15)
summaryModel?: string; // model for summarization (defaults to agent's model)
windowSize?: number; // messages to keep in sliding window (default: 10)
windowOverlap?: number; // overlap between windows (default: 2)
}The compressionThreshold controls when compression kicks in. At 0.8, compression triggers when token usage exceeds 80% of the available context (after reserving space for output tokens).
Token Counting
Cogitator uses a fast heuristic token counter (4 characters per token + message overhead) for real-time decisions, and knows the context window sizes for all major models:
import { ContextManager } from '@cogitator-ai/core';
const manager = new ContextManager({
enabled: true,
strategy: 'hybrid',
compressionThreshold: 0.8,
});
const limit = manager.getModelContextLimit('openai/gpt-4o'); // 128000
const limit2 = manager.getModelContextLimit('anthropic/claude-sonnet-4-20250514'); // 200000Checking Context State
Before compressing, you can inspect the current state:
const state = manager.checkState(messages, 'openai/gpt-4o');
console.log(state.currentTokens); // estimated tokens used
console.log(state.maxTokens); // effective limit (minus output reserve)
console.log(state.availableTokens); // remaining space
console.log(state.utilizationPercent); // e.g. 85.3
console.log(state.needsCompression); // true if above thresholdStrategies
Truncate
Drops the oldest non-system messages to fit within the token budget. Fast and stateless -- no LLM call required.
const cog = new Cogitator({
context: { strategy: 'truncate', compressionThreshold: 0.8 },
// ...
});System messages are always preserved. History is trimmed from the beginning, keeping the most recent exchanges intact. Best for scenarios where older context is rarely needed.
Sliding Window
Keeps a fixed window of recent messages and optionally summarizes older ones. Provides a good balance between context preservation and token efficiency.
const cog = new Cogitator({
context: {
strategy: 'sliding-window',
windowSize: 15,
summaryModel: 'openai/gpt-4o-mini',
},
// ...
});When an LLM backend is available, older messages outside the window are summarized into a single system message. Without an LLM, a basic extractive summary is generated from the most recent user and assistant messages.
Summarize
Uses an LLM to generate a comprehensive summary of older conversation history, preserving key facts, decisions, and pending tasks.
const cog = new Cogitator({
context: {
strategy: 'summarize',
summaryModel: 'openai/gpt-4o-mini',
windowSize: 5,
},
// ...
});The summarizer keeps the most recent 20% of messages (minimum 2) as-is and compresses everything else. The summary prompt instructs the LLM to preserve user preferences, key decisions, and critical context. Falls back to extractive summarization if the LLM call fails.
Hybrid (Default)
Automatically selects the best strategy based on how far over the limit the conversation is:
| Utilization | Strategy | Reasoning |
|---|---|---|
| Up to 1.5x | Sliding Window | Mild overflow, window is sufficient |
| 1.5x+ (with LLM) | Summarize | Heavy overflow, need intelligent compression |
| 1.5x-2x (no LLM) | Sliding Window | Best effort without LLM |
| 2x+ | Truncate | Extreme overflow, aggressive trimming needed |
const cog = new Cogitator({
context: { strategy: 'hybrid' },
// ...
});This is the recommended default -- it adapts to the situation without manual tuning.
Compression Results
Every compression call returns detailed metrics:
const result = await manager.compress(messages, 'openai/gpt-4o');
console.log(result.messages); // compressed message array
console.log(result.originalTokens); // tokens before compression
console.log(result.compressedTokens); // tokens after compression
console.log(result.strategy); // which strategy was used
console.log(result.truncated); // messages dropped (truncate)
console.log(result.summarized); // messages summarized (summarize/sliding-window)Standalone Usage
You can use ContextManager outside of the Cogitator runtime for custom pipelines:
import { ContextManager } from '@cogitator-ai/core';
const manager = new ContextManager(
{ enabled: true, strategy: 'sliding-window', windowSize: 10 },
{ getBackend: (model) => myBackendRegistry.get(model) }
);
if (manager.shouldCompress(messages, 'openai/gpt-4o')) {
const { messages: compressed } = await manager.compress(messages, 'openai/gpt-4o');
messages = compressed;
}The second constructor argument provides a getBackend function so the manager can access an LLM for summarization strategies.