> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mem0.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Reranker-Enhanced Search

> Boost relevance by reordering vector hits with reranking models.

Reranker-enhanced search adds a second scoring pass after vector retrieval so Mem0 can return the most relevant memories first. Enable it when keyword similarity alone misses nuance or when you need the highest-confidence context for an agent decision.

<Info>
  **You’ll use this when…**

  * Queries are nuanced and require semantic understanding beyond vector distance.
  * Large memory collections produce too many near matches to review manually.
  * You want consistent scoring across providers by delegating ranking to a dedicated model.
</Info>

<Warning>
  Reranking raises latency and, for hosted models, API spend. Benchmark with production traffic and define a fallback path for latency-sensitive requests.
</Warning>

<Note>
  All configuration snippets translate directly to the TypeScript SDK—swap dictionaries for objects while keeping the same keys (`provider`, `config`, `rerank` flags).
</Note>

***

## Feature anatomy

* **Initial vector search:** Retrieve candidate memories by similarity.
* **Reranker pass:** A specialized model scores each candidate against the original query.
* **Reordered results:** Mem0 sorts responses using the reranker’s scores before returning them.
* **Optional fallbacks:** Toggle reranking per request or disable it entirely if performance or cost becomes a concern.

<AccordionGroup>
  <Accordion title="Supported providers">
    - **[Cohere](/components/rerankers/models/cohere)** – Multilingual hosted reranker with API-based scoring.
    - **[Sentence Transformer](/components/rerankers/models/sentence_transformer)** – Local Hugging Face cross-encoders for GPU or CPU.
    - **[Hugging Face](/components/rerankers/models/huggingface)** – Bring any hosted or on-prem reranker model ID.
    - **[LLM Reranker](/components/rerankers/models/llm_reranker)** – Use your preferred LLM (OpenAI, etc.) for prompt-driven scoring.
    - **[Zero Entropy](/components/rerankers/models/zero_entropy)** – High-quality neural reranking tuned for retrieval tasks.
  </Accordion>

  <Accordion title="Provider comparison">
    | Provider             | Latency    | Quality   | Cost     | Local deploy |
    | -------------------- | ---------- | --------- | -------- | ------------ |
    | Cohere               | Medium     | High      | API cost | ❌            |
    | Sentence Transformer | Low        | Good      | Free     | ✅            |
    | Hugging Face         | Low–Medium | Variable  | Free     | ✅            |
    | LLM Reranker         | High       | Very high | API cost | Depends      |
  </Accordion>
</AccordionGroup>

***

## Configure it

### Basic setup

```python theme={null}
from mem0 import Memory

config = {
    "reranker": {
        "provider": "cohere",
        "config": {
            "model": "rerank-english-v3.0",
            "api_key": "your-cohere-api-key"
        }
    }
}

m = Memory.from_config(config)
```

<Info icon="check">
  Confirm `results["results"][0]["score"]` reflects the reranker output—if the field is missing, the reranker was not applied.
</Info>

<Tip>
  Set `top_k` to the smallest candidate pool that still captures relevant hits. Smaller pools keep reranking costs down.
</Tip>

### Provider-specific options

```python theme={null}
# Cohere reranker
config = {
    "reranker": {
        "provider": "cohere",
        "config": {
            "model": "rerank-english-v3.0",
            "api_key": "your-cohere-api-key",
            "top_k": 10,
            "return_documents": True
        }
    }
}

# Sentence Transformer reranker
config = {
    "reranker": {
        "provider": "sentence_transformer",
        "config": {
            "model": "cross-encoder/ms-marco-MiniLM-L-6-v2",
            "device": "cuda",
            "max_length": 512
        }
    }
}

# Hugging Face reranker
config = {
    "reranker": {
        "provider": "huggingface",
        "config": {
            "model": "BAAI/bge-reranker-base",
            "device": "cuda",
            "batch_size": 32
        }
    }
}

# LLM-based reranker
config = {
    "reranker": {
        "provider": "llm_reranker",
        "config": {
            "provider": "openai",
            "model": "gpt-5-mini",
            "api_key": "your-openai-api-key",
            "top_k": 5
        }
    }
}
```

<Note>
  Keep authentication keys in environment variables when you plug these configs into production projects.
</Note>

### Full stack example

```python theme={null}
config = {
    "vector_store": {
        "provider": "qdrant",
        "config": {
            "host": "localhost",
            "port": 6333
        }
    },
    "llm": {
        "provider": "openai",
        "config": {
            "model": "gpt-5-mini",
            "api_key": "your-openai-api-key"
        }
    },
    "embedder": {
        "provider": "openai",
        "config": {
            "model": "text-embedding-3-small",
            "api_key": "your-openai-api-key"
        }
    },
    "reranker": {
        "provider": "cohere",
        "config": {
            "model": "rerank-english-v3.0",
            "api_key": "your-cohere-api-key",
            "top_k": 15,
            "return_documents": True
        }
    }
}

m = Memory.from_config(config)
```

<Info icon="check">
  A quick search should now return results with both vector and reranker scores, letting you compare improvements immediately.
</Info>

### Async support

```python theme={null}
from mem0 import AsyncMemory

async_memory = AsyncMemory.from_config(config)

async def search_with_rerank():
    return await async_memory.search(
        "What are my preferences?",
        filters={"user_id": "alice"},
        rerank=True
    )

import asyncio
results = asyncio.run(search_with_rerank())
```

<Info icon="check">
  Inspect the async response to confirm reranking still applies; the scores should match the synchronous implementation.
</Info>

### Tune performance and cost

```python theme={null}
# GPU-friendly local reranker configuration
config = {
    "reranker": {
        "provider": "sentence_transformer",
        "config": {
            "model": "cross-encoder/ms-marco-MiniLM-L-6-v2",
            "device": "cuda",
            "batch_size": 32,
            "top_k": 10,
            "max_length": 256
        }
    }
}

# Smart toggle for hosted rerankers
def smart_search(query, user_id, use_rerank=None):
    if use_rerank is None:
        use_rerank = len(query.split()) > 3
    return m.search(query, filters={"user_id": user_id}, rerank=use_rerank)
```

<Tip>
  Use heuristics (query length, user tier) to decide when to rerank so high-signal queries benefit without taxing every request.
</Tip>

### Handle failures gracefully

```python theme={null}
try:
    results = m.search("test query", filters={"user_id": "alice"}, rerank=True)
except Exception as exc:
    print(f"Reranking failed: {exc}")
    results = m.search("test query", filters={"user_id": "alice"}, rerank=False)
```

<Warning>
  Always fall back to vector-only search—dropped queries introduce bigger accuracy issues than slightly less relevant ordering.
</Warning>

### Migrate from v0.x

```python theme={null}
# Before: basic vector search
results = m.search("query", filters={"user_id": "alice"})

# After: same API with reranking enabled via config
config = {
    "reranker": {
        "provider": "sentence_transformer",
        "config": {
            "model": "cross-encoder/ms-marco-MiniLM-L-6-v2"
        }
    }
}

m = Memory.from_config(config)
results = m.search("query", filters={"user_id": "alice"})
```

***

## See it in action

### Basic reranked search

```python theme={null}
results = m.search(
    "What are my food preferences?",
    filters={"user_id": "alice"}
)

for result in results["results"]:
    print(f"Memory: {result['memory']}")
    print(f"Score: {result['score']}")
```

<Info icon="check">
  Expect each result to list the reranker-adjusted score so you can compare ordering against baseline vector results.
</Info>

### Toggle reranking per request

```python theme={null}
results_with_rerank = m.search(
    "What movies do I like?",
    filters={"user_id": "alice"},
    rerank=True
)

results_without_rerank = m.search(
    "What movies do I like?",
    filters={"user_id": "alice"},
    rerank=False
)
```

<Tip>
  Log the reranked vs. non-reranked lists during rollout so stakeholders can see the improvement before enforcing it everywhere.
</Tip>

<Info icon="check">
  You should see the same memories in both lists, but the reranked response will reorder them based on semantic relevance.
</Info>

### Combine with metadata filters

```python theme={null}
results = m.search(
    "important work tasks",
    filters={
        "AND": [
            {"user_id": "alice"},
            {"category": "work"},
            {"priority": {"gte": 7}}
        ]
    },
    rerank=True,
    top_k=20
)
```

<Info icon="check">
  Verify filtered reranked searches still respect every metadata clause—reranking only reorders candidates, it never bypasses filters.
</Info>

### Real-world playbooks

#### Customer support

```python theme={null}
config = {
    "reranker": {
        "provider": "cohere",
        "config": {
            "model": "rerank-english-v3.0",
            "api_key": "your-cohere-api-key"
        }
    }
}

m = Memory.from_config(config)

results = m.search(
    "customer having login issues with mobile app",
    filters={"agent_id": "support_bot", "category": "technical_support"},
    rerank=True
)
```

<Info icon="check">
  Top results should highlight tickets matching the login issue context so agents can respond faster.
</Info>

#### Content recommendation

```python theme={null}
results = m.search(
    "science fiction books with space exploration themes",
    filters={"user_id": "reader123", "content_type": "book_recommendation"},
    rerank=True,
    top_k=10
)

for result in results["results"]:
    print(f"Recommendation: {result['memory']}")
    print(f"Relevance: {result['score']:.3f}")
```

<Info icon="check">
  Expect high-scoring recommendations that match both the requested theme and any metadata limits you applied.
</Info>

#### Personal assistant

```python theme={null}
results = m.search(
    "What restaurants did I enjoy last month that had good vegetarian options?",
    filters={
        "AND": [
            {"user_id": "foodie_user"},
            {"category": "dining"},
            {"rating": {"gte": 4}},
            {"date": {"gte": "2024-01-01"}}
        ]
    },
    rerank=True
)
```

<Tip>
  Reuse this pattern for other lifestyle queries—swap the filters and prompt text without changing the rerank configuration.
</Tip>

<Note>
  Each workflow keeps the same `m.search(...)` signature, so you can template these queries across agents with only the prompt and filters changing.
</Note>

***

## Verify the feature is working

* Inspect result payloads for both `score` (vector) and reranker scores; mismatched fields indicate the reranker didn’t execute.
* Track latency before and after enabling reranking to ensure SLAs hold.
* Review provider logs or dashboards for throttling or quota warnings.
* Run A/B comparisons (rerank on/off) to validate improved relevance before defaulting to reranked responses.

***

## Best practices

1. **Start local:** Try Sentence Transformer models to prove value before paying for hosted APIs.
2. **Monitor latency:** Add metrics around reranker duration so you notice regressions quickly.
3. **Control spend:** Use `top_k` and selective toggles to cap hosted reranker costs.
4. **Keep a fallback:** Always catch reranker failures and continue with vector-only ordering.
5. **Experiment often:** Swap providers or models to find the best fit for your domain and language mix.

***

<CardGroup cols={2}>
  <Card title="Configure Rerankers" icon="sliders" href="/components/rerankers/config">
    Review provider fields, defaults, and environment variables before going live.
  </Card>

  <Card title="Build a Custom LLM Reranker" icon="sparkles" href="/components/rerankers/models/llm_reranker">
    Extend scoring with prompt-tuned LLM rerankers for niche workflows.
  </Card>
</CardGroup>
