Skip to main content
Reranker-enhanced search adds a second scoring pass after vector retrieval so Mem0 can return the most relevant memories first. Enable it when keyword similarity alone misses nuance or when you need the highest-confidence context for an agent decision.
You’ll use this when…
  • Queries are nuanced and require semantic understanding beyond vector distance.
  • Large memory collections produce too many near matches to review manually.
  • You want consistent scoring across providers by delegating ranking to a dedicated model.
Reranking raises latency and, for hosted models, API spend. Benchmark with production traffic and define a fallback path for latency-sensitive requests.
All configuration snippets translate directly to the TypeScript SDK—swap dictionaries for objects while keeping the same keys (provider, config, rerank flags).

Feature anatomy

  • Initial vector search: Retrieve candidate memories by similarity.
  • Reranker pass: A specialized model scores each candidate against the original query.
  • Reordered results: Mem0 sorts responses using the reranker’s scores before returning them.
  • Optional fallbacks: Toggle reranking per request or disable it entirely if performance or cost becomes a concern.
  • Cohere – Multilingual hosted reranker with API-based scoring.
  • Sentence Transformer – Local Hugging Face cross-encoders for GPU or CPU.
  • Hugging Face – Bring any hosted or on-prem reranker model ID.
  • LLM Reranker – Use your preferred LLM (OpenAI, etc.) for prompt-driven scoring.
  • Zero Entropy – High-quality neural reranking tuned for retrieval tasks.
ProviderLatencyQualityCostLocal deploy
CohereMediumHighAPI cost
Sentence TransformerLowGoodFree
Hugging FaceLow–MediumVariableFree
LLM RerankerHighVery highAPI costDepends

Configure it

Basic setup

from mem0 import Memory

config = {
    "reranker": {
        "provider": "cohere",
        "config": {
            "model": "rerank-english-v3.0",
            "api_key": "your-cohere-api-key"
        }
    }
}

m = Memory.from_config(config)
Confirm results["results"][0]["score"] reflects the reranker output—if the field is missing, the reranker was not applied.
Set top_k to the smallest candidate pool that still captures relevant hits. Smaller pools keep reranking costs down.

Provider-specific options

# Cohere reranker
config = {
    "reranker": {
        "provider": "cohere",
        "config": {
            "model": "rerank-english-v3.0",
            "api_key": "your-cohere-api-key",
            "top_k": 10,
            "return_documents": True
        }
    }
}

# Sentence Transformer reranker
config = {
    "reranker": {
        "provider": "sentence_transformer",
        "config": {
            "model": "cross-encoder/ms-marco-MiniLM-L-6-v2",
            "device": "cuda",
            "max_length": 512
        }
    }
}

# Hugging Face reranker
config = {
    "reranker": {
        "provider": "huggingface",
        "config": {
            "model": "BAAI/bge-reranker-base",
            "device": "cuda",
            "batch_size": 32
        }
    }
}

# LLM-based reranker
config = {
    "reranker": {
        "provider": "llm_reranker",
        "config": {
            "llm": {
                "provider": "openai",
                "config": {
                    "model": "gpt-4",
                    "api_key": "your-openai-api-key"
                }
            },
            "top_k": 5
        }
    }
}
Keep authentication keys in environment variables when you plug these configs into production projects.

Full stack example

config = {
    "vector_store": {
        "provider": "qdrant",
        "config": {
            "host": "localhost",
            "port": 6333
        }
    },
    "llm": {
        "provider": "openai",
        "config": {
            "model": "gpt-4",
            "api_key": "your-openai-api-key"
        }
    },
    "embedder": {
        "provider": "openai",
        "config": {
            "model": "text-embedding-3-small",
            "api_key": "your-openai-api-key"
        }
    },
    "reranker": {
        "provider": "cohere",
        "config": {
            "model": "rerank-english-v3.0",
            "api_key": "your-cohere-api-key",
            "top_k": 15,
            "return_documents": True
        }
    }
}

m = Memory.from_config(config)
A quick search should now return results with both vector and reranker scores, letting you compare improvements immediately.

Async support

from mem0 import AsyncMemory

async_memory = AsyncMemory.from_config(config)

async def search_with_rerank():
    return await async_memory.search(
        "What are my preferences?",
        user_id="alice",
        rerank=True
    )

import asyncio
results = asyncio.run(search_with_rerank())
Inspect the async response to confirm reranking still applies; the scores should match the synchronous implementation.

Tune performance and cost

# GPU-friendly local reranker configuration
config = {
    "reranker": {
        "provider": "sentence_transformer",
        "config": {
            "model": "cross-encoder/ms-marco-MiniLM-L-6-v2",
            "device": "cuda",
            "batch_size": 32,
            "top_k": 10,
            "max_length": 256
        }
    }
}

# Smart toggle for hosted rerankers
def smart_search(query, user_id, use_rerank=None):
    if use_rerank is None:
        use_rerank = len(query.split()) > 3
    return m.search(query, user_id=user_id, rerank=use_rerank)
Use heuristics (query length, user tier) to decide when to rerank so high-signal queries benefit without taxing every request.

Handle failures gracefully

try:
    results = m.search("test query", user_id="alice", rerank=True)
except Exception as exc:
    print(f"Reranking failed: {exc}")
    results = m.search("test query", user_id="alice", rerank=False)
Always fall back to vector-only search—dropped queries introduce bigger accuracy issues than slightly less relevant ordering.

Migrate from v0.x

# Before: basic vector search
results = m.search("query", user_id="alice")

# After: same API with reranking enabled via config
config = {
    "reranker": {
        "provider": "sentence_transformer",
        "config": {
            "model": "cross-encoder/ms-marco-MiniLM-L-6-v2"
        }
    }
}

m = Memory.from_config(config)
results = m.search("query", user_id="alice")

See it in action

results = m.search(
    "What are my food preferences?",
    user_id="alice"
)

for result in results["results"]:
    print(f"Memory: {result['memory']}")
    print(f"Score: {result['score']}")
Expect each result to list the reranker-adjusted score so you can compare ordering against baseline vector results.

Toggle reranking per request

results_with_rerank = m.search(
    "What movies do I like?",
    user_id="alice",
    rerank=True
)

results_without_rerank = m.search(
    "What movies do I like?",
    user_id="alice",
    rerank=False
)
Log the reranked vs. non-reranked lists during rollout so stakeholders can see the improvement before enforcing it everywhere.
You should see the same memories in both lists, but the reranked response will reorder them based on semantic relevance.

Combine with metadata filters

results = m.search(
    "important work tasks",
    user_id="alice",
    filters={
        "AND": [
            {"category": "work"},
            {"priority": {"gte": 7}}
        ]
    },
    rerank=True,
    limit=20
)
Verify filtered reranked searches still respect every metadata clause—reranking only reorders candidates, it never bypasses filters.

Real-world playbooks

Customer support

config = {
    "reranker": {
        "provider": "cohere",
        "config": {
            "model": "rerank-english-v3.0",
            "api_key": "your-cohere-api-key"
        }
    }
}

m = Memory.from_config(config)

results = m.search(
    "customer having login issues with mobile app",
    agent_id="support_bot",
    filters={"category": "technical_support"},
    rerank=True
)
Top results should highlight tickets matching the login issue context so agents can respond faster.

Content recommendation

results = m.search(
    "science fiction books with space exploration themes",
    user_id="reader123",
    filters={"content_type": "book_recommendation"},
    rerank=True,
    limit=10
)

for result in results["results"]:
    print(f"Recommendation: {result['memory']}")
    print(f"Relevance: {result['score']:.3f}")
Expect high-scoring recommendations that match both the requested theme and any metadata limits you applied.

Personal assistant

results = m.search(
    "What restaurants did I enjoy last month that had good vegetarian options?",
    user_id="foodie_user",
    filters={
        "AND": [
            {"category": "dining"},
            {"rating": {"gte": 4}},
            {"date": {"gte": "2024-01-01"}}
        ]
    },
    rerank=True
)
Reuse this pattern for other lifestyle queries—swap the filters and prompt text without changing the rerank configuration.
Each workflow keeps the same m.search(...) signature, so you can template these queries across agents with only the prompt and filters changing.

Verify the feature is working

  • Inspect result payloads for both score (vector) and reranker scores; mismatched fields indicate the reranker didn’t execute.
  • Track latency before and after enabling reranking to ensure SLAs hold.
  • Review provider logs or dashboards for throttling or quota warnings.
  • Run A/B comparisons (rerank on/off) to validate improved relevance before defaulting to reranked responses.

Best practices

  1. Start local: Try Sentence Transformer models to prove value before paying for hosted APIs.
  2. Monitor latency: Add metrics around reranker duration so you notice regressions quickly.
  3. Control spend: Use top_k and selective toggles to cap hosted reranker costs.
  4. Keep a fallback: Always catch reranker failures and continue with vector-only ordering.
  5. Experiment often: Swap providers or models to find the best fit for your domain and language mix.