Tool Caching
Cache tool results with TTL, LRU eviction, and semantic similarity matching using in-memory or Redis storage.
Overview
The withCache() function wraps any tool with a caching layer. Repeated calls with the same (or semantically similar) parameters return cached results instantly, saving API calls, database queries, and compute time.
import { withCache } from '@cogitator-ai/core';
const cachedSearch = withCache(webSearch, {
strategy: 'exact',
ttl: '10m',
maxSize: 500,
storage: 'memory',
});
const agent = new Agent({
name: 'assistant',
model: 'openai/gpt-4o',
tools: [cachedSearch],
});The cached tool is a drop-in replacement -- it has the same name, description, and parameters as the original, plus a .cache object for management.
withCache() API
function withCache<TParams, TResult>(
tool: Tool<TParams, TResult>,
config: WithCacheOptions
): CachedTool<TParams, TResult>;Configuration
interface WithCacheOptions {
strategy: 'exact' | 'semantic';
ttl: DurationString;
maxSize: number;
storage: 'memory' | 'redis';
similarity?: number;
keyPrefix?: string;
embeddingService?: EmbeddingService;
redisClient?: RedisClientLike;
onHit?: (key: string, params: unknown) => void;
onMiss?: (key: string, params: unknown) => void;
onEvict?: (key: string) => void;
}| Field | Type | Default | Description |
|---|---|---|---|
strategy | 'exact' | 'semantic' | required | Cache matching strategy |
ttl | DurationString | required | Time-to-live: "30s", "5m", "2h", "1d" |
maxSize | number | required | Maximum number of cached entries |
storage | 'memory' | 'redis' | required | Storage backend |
similarity | number | 0.95 | Cosine similarity threshold for semantic matching |
keyPrefix | string | "toolcache" | Key prefix for namespacing |
embeddingService | EmbeddingService | -- | Required for semantic strategy |
redisClient | RedisClientLike | -- | Required for redis storage |
onHit | function | -- | Callback on cache hit |
onMiss | function | -- | Callback on cache miss |
onEvict | function | -- | Callback on entry eviction |
Duration Strings
TTL values use human-readable duration strings:
| Suffix | Meaning | Example |
|---|---|---|
ms | Milliseconds | "500ms" |
s | Seconds | "30s" |
m | Minutes | "10m" |
h | Hours | "2h" |
d | Days | "1d" |
w | Weeks | "1w" |
Cache Strategies
Exact Matching
The default strategy. Generates a deterministic cache key from the tool name and serialized parameters. Two calls with identical parameters hit the same cache entry.
const cachedHash = withCache(hash, {
strategy: 'exact',
ttl: '1h',
maxSize: 1000,
storage: 'memory',
});Semantic Matching
Uses embedding vectors to find cached results for semantically similar inputs. Requires an embeddingService that converts parameter strings to vectors.
const cachedSearch = withCache(webSearch, {
strategy: 'semantic',
ttl: '30m',
maxSize: 200,
storage: 'memory',
similarity: 0.92,
embeddingService: myEmbeddingService,
});With semantic matching, a query like "TypeScript best practices" might match a cached result for "best practices for TypeScript development" if their cosine similarity exceeds the threshold.
The lookup flow:
- Check for an exact key match first
- If no exact match, embed the current parameters and search for similar entries above the
similaritythreshold - If a similar entry is found, return its cached result
- Otherwise, execute the tool and cache the result with its embedding
Storage Backends
In-Memory (LRU)
The default storage. Entries are stored in a Map with LRU eviction -- when maxSize is reached, the least recently accessed entry is evicted.
const cached = withCache(sqlQuery, {
strategy: 'exact',
ttl: '5m',
maxSize: 500,
storage: 'memory',
});In-memory storage is fast and requires no external dependencies, but entries are lost when the process restarts.
Redis
For persistent, shared caching across multiple processes or server instances. Uses Redis sorted sets for LRU tracking and key-value storage for entries.
import { createClient } from 'redis';
const redis = createClient({ url: 'redis://localhost:6379' });
await redis.connect();
const cached = withCache(webScrape, {
strategy: 'exact',
ttl: '1h',
maxSize: 5000,
storage: 'redis',
redisClient: redis,
keyPrefix: 'myapp:tools',
});The RedisClientLike interface is compatible with the standard redis npm package. It requires these methods:
interface RedisClientLike {
get(key: string): Promise<string | null>;
set(key: string, value: string): Promise<string>;
setex(key: string, seconds: number, value: string): Promise<string>;
del(...keys: string[]): Promise<number>;
keys(pattern: string): Promise<string[]>;
mget(...keys: string[]): Promise<(string | null)[]>;
zadd(key: string, score: number, member: string): Promise<number>;
zrange(key: string, start: number, stop: number): Promise<string[]>;
zrem(key: string, ...members: string[]): Promise<number>;
}You can also create storage instances directly:
import { createToolCacheStorage } from '@cogitator-ai/core';
const memoryStorage = createToolCacheStorage('memory', { maxSize: 1000 });
const redisStorage = createToolCacheStorage('redis', {
redisClient: redis,
keyPrefix: 'cache:',
maxSize: 5000,
});Cache Management
Every cached tool exposes a .cache object for runtime management:
Stats
const stats = cachedSearch.cache.stats();
// {
// hits: 142,
// misses: 38,
// size: 38,
// evictions: 12,
// hitRate: 0.789,
// }Clear
await cachedSearch.cache.clear();Invalidate
Remove a specific entry by its parameters:
const existed = await cachedSearch.cache.invalidate({
query: 'TypeScript tutorials',
maxResults: 5,
});Warmup
Pre-populate the cache with known parameter-result pairs:
await cachedSearch.cache.warmup([
{
params: { query: 'React hooks guide', maxResults: 5 },
result: { query: 'React hooks guide', provider: 'tavily', results: [...] },
},
{
params: { query: 'Node.js streams', maxResults: 5 },
result: { query: 'Node.js streams', provider: 'tavily', results: [...] },
},
]);When to Cache
Cache tools that:
- Call external APIs (web search, scraping) -- save quota and reduce latency
- Run expensive queries (SQL, vector search) -- avoid duplicate database load
- Perform deterministic computation (hashing, math) -- same input always gives same output
- Have rate limits -- cache prevents hitting API rate limits on repeated queries
Avoid caching tools that:
- Have side effects (file_write, send_email, exec) -- caching would silently skip the action
- Must return real-time data (datetime, random_number) -- cached values would be stale
- Depend on external state that changes frequently between calls
Observability Callbacks
Use the onHit, onMiss, and onEvict callbacks to integrate with your monitoring:
const cached = withCache(webSearch, {
strategy: 'exact',
ttl: '10m',
maxSize: 500,
storage: 'memory',
onHit: (key, params) => {
metrics.increment('tool_cache.hit', { tool: 'web_search' });
},
onMiss: (key, params) => {
metrics.increment('tool_cache.miss', { tool: 'web_search' });
},
onEvict: (key) => {
metrics.increment('tool_cache.evict', { tool: 'web_search' });
},
});