You’ll use this when…
- Queries are nuanced and require semantic understanding beyond vector distance.
- Large memory collections produce too many near matches to review manually.
- You want consistent scoring across providers by delegating ranking to a dedicated model.
Reranking raises latency and, for hosted models, API spend. Benchmark with production traffic and define a fallback path for latency-sensitive requests.
All configuration snippets translate directly to the TypeScript SDK—swap dictionaries for objects while keeping the same keys (
provider, config, rerank flags).Feature anatomy
- Initial vector search: Retrieve candidate memories by similarity.
- Reranker pass: A specialized model scores each candidate against the original query.
- Reordered results: Mem0 sorts responses using the reranker’s scores before returning them.
- Optional fallbacks: Toggle reranking per request or disable it entirely if performance or cost becomes a concern.
Supported providers
Supported providers
- Cohere – Multilingual hosted reranker with API-based scoring.
- Sentence Transformer – Local Hugging Face cross-encoders for GPU or CPU.
- Hugging Face – Bring any hosted or on-prem reranker model ID.
- LLM Reranker – Use your preferred LLM (OpenAI, etc.) for prompt-driven scoring.
- Zero Entropy – High-quality neural reranking tuned for retrieval tasks.
Provider comparison
Provider comparison
| Provider | Latency | Quality | Cost | Local deploy |
|---|---|---|---|---|
| Cohere | Medium | High | API cost | ❌ |
| Sentence Transformer | Low | Good | Free | ✅ |
| Hugging Face | Low–Medium | Variable | Free | ✅ |
| LLM Reranker | High | Very high | API cost | Depends |
Configure it
Basic setup
Confirm
results["results"][0]["score"] reflects the reranker output—if the field is missing, the reranker was not applied.Set
top_k to the smallest candidate pool that still captures relevant hits. Smaller pools keep reranking costs down.Provider-specific options
Keep authentication keys in environment variables when you plug these configs into production projects.
Full stack example
A quick search should now return results with both vector and reranker scores, letting you compare improvements immediately.
Async support
Inspect the async response to confirm reranking still applies; the scores should match the synchronous implementation.
Tune performance and cost
Use heuristics (query length, user tier) to decide when to rerank so high-signal queries benefit without taxing every request.
Handle failures gracefully
Always fall back to vector-only search—dropped queries introduce bigger accuracy issues than slightly less relevant ordering.
Migrate from v0.x
See it in action
Basic reranked search
Expect each result to list the reranker-adjusted score so you can compare ordering against baseline vector results.
Toggle reranking per request
Log the reranked vs. non-reranked lists during rollout so stakeholders can see the improvement before enforcing it everywhere.
You should see the same memories in both lists, but the reranked response will reorder them based on semantic relevance.
Combine with metadata filters
Verify filtered reranked searches still respect every metadata clause—reranking only reorders candidates, it never bypasses filters.
Real-world playbooks
Customer support
Top results should highlight tickets matching the login issue context so agents can respond faster.
Content recommendation
Expect high-scoring recommendations that match both the requested theme and any metadata limits you applied.
Personal assistant
Reuse this pattern for other lifestyle queries—swap the filters and prompt text without changing the rerank configuration.
Each workflow keeps the same
m.search(...) signature, so you can template these queries across agents with only the prompt and filters changing.Verify the feature is working
- Inspect result payloads for both
score(vector) and reranker scores; mismatched fields indicate the reranker didn’t execute. - Track latency before and after enabling reranking to ensure SLAs hold.
- Review provider logs or dashboards for throttling or quota warnings.
- Run A/B comparisons (rerank on/off) to validate improved relevance before defaulting to reranked responses.
Best practices
- Start local: Try Sentence Transformer models to prove value before paying for hosted APIs.
- Monitor latency: Add metrics around reranker duration so you notice regressions quickly.
- Control spend: Use
top_kand selective toggles to cap hosted reranker costs. - Keep a fallback: Always catch reranker failures and continue with vector-only ordering.
- Experiment often: Swap providers or models to find the best fit for your domain and language mix.