Reranking
Improve retrieval precision by re-scoring results with an LLM or Cohere's reranking API.
Why Rerank?
Vector similarity is a good first pass, but it can miss nuance. A document might be semantically close to the query but not actually answer the question. Reranking takes the top retrieval results and re-scores them with a more powerful model that reads both the query and each document, producing a more accurate ranking.
Rerankers implement the Reranker interface:
interface Reranker {
rerank(
query: string,
results: RetrievalResult[],
topN?: number
): Promise<RetrievalResult[]>;
}LLM Reranker
Uses any LLM to score each document's relevance to the query on a 0–10 scale. You provide a generateFn that takes a prompt string and returns the LLM's text response.
import { LLMReranker } from '@cogitator-ai/rag';
const reranker = new LLMReranker({
generateFn: async (prompt) => {
const response = await agent.run(prompt);
return response.text;
},
});
const reranked = await reranker.rerank(
'How do agents handle tool errors?',
retrievalResults,
5
);The reranker builds a prompt asking the LLM to score each document and return a JSON array of { index, score } objects. Scores are normalized to 0–1 in the output. If the LLM response can't be parsed, the original order is preserved as a fallback.
| Option | Type | Description |
|---|---|---|
generateFn | (prompt: string) => Promise<string> | LLM generation function |
When to use: When you already have an LLM backend in your stack and want better precision without an external reranking service. Works with any model — local Ollama, OpenAI, Anthropic.
Cohere Reranker
Uses the Cohere Rerank API for high-quality cross-encoder reranking. No LLM prompt engineering needed — Cohere's model is trained specifically for relevance scoring.
import { CohereReranker } from '@cogitator-ai/rag';
const reranker = new CohereReranker({
apiKey: process.env.COHERE_API_KEY!,
model: 'rerank-v3.5',
});
const reranked = await reranker.rerank(
'How do agents handle tool errors?',
retrievalResults,
5
);| Option | Type | Default | Description |
|---|---|---|---|
apiKey | string | — | Cohere API key |
model | string | rerank-v3.5 | Cohere rerank model name |
When to use: Production deployments where retrieval quality is critical. Cohere's reranking models are fast, accurate, and purpose-built for this task.
Using with RAGPipeline
Pass a reranker to the builder and enable it in the config:
import { RAGPipelineBuilder, TextLoader, CohereReranker } from '@cogitator-ai/rag';
const pipeline = new RAGPipelineBuilder()
.withLoader(new TextLoader())
.withEmbeddingService(embeddings)
.withEmbeddingAdapter(store)
.withReranker(new CohereReranker({
apiKey: process.env.COHERE_API_KEY!,
}))
.withConfig({
chunking: { strategy: 'recursive', chunkSize: 512, chunkOverlap: 50 },
reranking: { enabled: true, topN: 5 },
})
.build();
const results = await pipeline.query('What is the agent lifecycle?');The pipeline first retrieves candidates using the configured retriever, then reranks them if reranking.enabled is true. The topN parameter controls how many results survive reranking.
Comparison
| Reranker | Latency | Quality | Cost | Dependencies |
|---|---|---|---|---|
| LLM | Variable | Good | Per-token LLM cost | Any LLM backend |
| Cohere | ~100ms | Excellent | Per-request | Cohere API key |
Custom Rerankers
Implement the Reranker interface:
import type { Reranker, RetrievalResult } from '@cogitator-ai/types';
class CrossEncoderReranker implements Reranker {
async rerank(
query: string,
results: RetrievalResult[],
topN?: number
): Promise<RetrievalResult[]> {
const scored = await Promise.all(
results.map(async (result) => {
const score = await this.crossEncode(query, result.content);
return { ...result, score };
})
);
scored.sort((a, b) => b.score - a.score);
return topN ? scored.slice(0, topN) : scored;
}
private async crossEncode(query: string, document: string): Promise<number> {
// call your cross-encoder model
return 0;
}
}Retrieval Strategies
Retrieve relevant document chunks using similarity search, MMR diversity, hybrid vector+keyword search, or multi-query expansion.
Evaluation Framework
Systematically evaluate LLM agents with @cogitator-ai/evals — dataset-driven testing, deterministic and LLM-as-judge metrics, assertions, A/B comparison, and CI-ready reporters.