Models
Any HuggingFace cross-encoder model can be used. Popular choices include:cross-encoder/ms-marco-MiniLM-L-6-v2: Default, good balance of speed and accuracycross-encoder/ms-marco-TinyBERT-L-2-v2: Fastest, smaller model sizecross-encoder/ms-marco-electra-base: Higher accuracy, larger modelcross-encoder/stsb-distilroberta-base: Good for semantic similarity tasks
Installation
Configuration
Python
GPU Acceleration
For better performance, use GPU acceleration:Python
Usage Example
Python
Custom Models
You can use any HuggingFace cross-encoder model:Python
Configuration Parameters
| Parameter | Description | Type | Default |
|---|---|---|---|
model | HuggingFace cross-encoder model name | str | "cross-encoder/ms-marco-MiniLM-L-6-v2" |
device | Device to run model on (cpu, cuda, etc.) | str | None |
batch_size | Batch size for processing documents | int | 32 |
show_progress_bar | Show progress bar during processing | bool | False |
top_k | Maximum documents to return | int | None |
Advantages
- Privacy: Complete local processing, no external API calls
- Cost: No per-token charges after initial model download
- Customization: Use any HuggingFace cross-encoder model
- Offline: Works without internet connection after model download
Performance Considerations
- First Run: Model download may take time initially
- Memory Usage: Models require GPU/CPU memory
- Batch Size: Optimize batch size based on available memory
- Device: GPU acceleration significantly improves speed
Best Practices
- Model Selection: Choose model based on accuracy vs speed requirements
- Device Management: Use GPU when available for better performance
- Batch Processing: Process multiple documents together for efficiency
- Memory Monitoring: Monitor system memory usage with larger models