Sentence Transformer

Sentence Transformer reranker provides local reranking using HuggingFace cross-encoder models, perfect for privacy-focused deployments where you want to keep data on-premises.

Models

Any HuggingFace cross-encoder model can be used. Popular choices include:

cross-encoder/ms-marco-MiniLM-L-6-v2: Default, good balance of speed and accuracy
cross-encoder/ms-marco-TinyBERT-L-2-v2: Fastest, smaller model size
cross-encoder/ms-marco-electra-base: Higher accuracy, larger model
cross-encoder/stsb-distilroberta-base: Good for semantic similarity tasks

Installation

pip install sentence-transformers

Configuration

Python

from mem0 import Memory

config = {
    "vector_store": {
        "provider": "chroma",
        "config": {
            "collection_name": "my_memories",
            "path": "./chroma_db"
        }
    },
    "llm": {
        "provider": "openai",
        "config": {
            "model": "gpt-4o-mini"
        }
    },
    "rerank": {
        "provider": "sentence_transformer",
        "config": {
            "model": "cross-encoder/ms-marco-MiniLM-L-6-v2",
            "device": "cpu",  # or "cuda" for GPU
            "batch_size": 32,
            "show_progress_bar": False,
            "top_k": 5
        }
    }
}

memory = Memory.from_config(config)

GPU Acceleration

For better performance, use GPU acceleration:

Python

config = {
    "rerank": {
        "provider": "sentence_transformer",
        "config": {
            "model": "cross-encoder/ms-marco-MiniLM-L-6-v2",
            "device": "cuda",  # Use GPU
            "batch_size": 64   # high batch size for high memory GPUs
        }
    }
}

Usage Example

Python

from mem0 import Memory

# Initialize memory with local reranker
config = {
    "vector_store": {"provider": "chroma"},
    "llm": {"provider": "openai", "config": {"model": "gpt-4o-mini"}},
    "rerank": {
        "provider": "sentence_transformer",
        "config": {
            "model": "cross-encoder/ms-marco-MiniLM-L-6-v2",
            "device": "cpu"
        }
    }
}

memory = Memory.from_config(config)

# Add memories
messages = [
    {"role": "user", "content": "I love reading science fiction novels"},
    {"role": "user", "content": "My favorite author is Isaac Asimov"},
    {"role": "user", "content": "I also enjoy watching sci-fi movies"}
]

memory.add(messages, user_id="charlie")

# Search with local reranking
results = memory.search("What books does the user like?", user_id="charlie")

for result in results['results']:
    print(f"Memory: {result['memory']}")
    print(f"Vector Score: {result['score']:.3f}")
    print(f"Rerank Score: {result['rerank_score']:.3f}")
    print()

Custom Models

You can use any HuggingFace cross-encoder model:

Python

# Using a different model
config = {
    "rerank": {
        "provider": "sentence_transformer", 
        "config": {
            "model": "cross-encoder/stsb-distilroberta-base",
            "device": "cpu"
        }
    }
}

Configuration Parameters

Parameter	Description	Type	Default
`model`	HuggingFace cross-encoder model name	`str`	`"cross-encoder/ms-marco-MiniLM-L-6-v2"`
`device`	Device to run model on (`cpu`, `cuda`, etc.)	`str`	`None`
`batch_size`	Batch size for processing documents	`int`	`32`
`show_progress_bar`	Show progress bar during processing	`bool`	`False`
`top_k`	Maximum documents to return	`int`	`None`

Advantages

Privacy: Complete local processing, no external API calls
Cost: No per-token charges after initial model download
Customization: Use any HuggingFace cross-encoder model
Offline: Works without internet connection after model download

Performance Considerations

First Run: Model download may take time initially
Memory Usage: Models require GPU/CPU memory
Batch Size: Optimize batch size based on available memory
Device: GPU acceleration significantly improves speed

Best Practices

Model Selection: Choose model based on accuracy vs speed requirements
Device Management: Use GPU when available for better performance
Batch Processing: Process multiple documents together for efficiency
Memory Monitoring: Monitor system memory usage with larger models

Getting Started

Self-Hosting Features

Configuration

Community & Support

Sentence Transformer

Models

Installation

Configuration

GPU Acceleration

Usage Example

Custom Models

Configuration Parameters

Advantages

Performance Considerations

Best Practices

Getting Started

Self-Hosting Features

Configuration

Community & Support

​Models

​Installation

​Configuration

​GPU Acceleration

​Usage Example

​Custom Models

​Configuration Parameters

​Advantages

​Performance Considerations

​Best Practices

Models

Installation

Configuration

GPU Acceleration

Usage Example

Custom Models

Configuration Parameters

Advantages

Performance Considerations

Best Practices