Overview
The graph store threshold parameter controls how strictly nodes are matched during graph data ingestion based on embedding similarity. This feature allows you to customize the matching behavior to prevent false matches or enable entity merging based on your specific use case.Configuration
Add thethreshold
parameter to your graph store configuration:
Parameters
Parameter | Type | Default | Range | Description |
---|---|---|---|---|
threshold | float | 0.7 | 0.0 - 1.0 | Minimum embedding similarity score required to match existing nodes during graph ingestion |
Use Cases
Strict Matching (UUIDs, IDs)
Use higher thresholds (0.95-0.99) when working with identifiers that should remain distinct:MXxBUE18QVBQTElDQVRJT058MjM3MTM4NjI5
being matched with MXxBUE18QVBQTElDQVRJT058MjA2OTYxMzM
Permissive Matching (Natural Language)
Use lower thresholds (0.6-0.7) when entity variations should be merged:Threshold Guidelines
Use Case | Recommended Threshold | Behavior |
---|---|---|
UUIDs, IDs, Keys | 0.95 - 0.99 | Prevent false matches between similar identifiers |
Structured Data | 0.85 - 0.9 | Balanced precision and recall |
General Purpose | 0.7 - 0.8 | Default recommendation |
Natural Language | 0.6 - 0.7 | Allow entity variations to merge |
Examples
Example 1: Preventing Data Loss with UUIDs
Example 2: Merging Entity Variations
Example 3: Different Thresholds for Different Clients
Supported Graph Providers
The threshold parameter works with all graph store providers:- ✅ Neo4j
- ✅ Memgraph
- ✅ Kuzu
- ✅ Neptune (both Analytics and DB)
How It Works
When adding a relation to the graph:- Embedding Generation: The system generates embeddings for source and destination entities
- Node Search: Searches for existing nodes with similar embeddings
- Threshold Comparison: Compares similarity scores against the configured threshold
- Decision:
- If similarity ≥ threshold: Uses the existing node
- If similarity < threshold: Creates a new node
Troubleshooting
Issue: Duplicate nodes being created
Symptom: Expected nodes to merge but they’re created separately Solution: Lower the thresholdIssue: Unrelated entities being merged
Symptom: Different entities incorrectly matched as the same node Solution: Raise the thresholdIssue: Validation error
Symptom:ValidationError: threshold must be between 0.0 and 1.0
Solution: Ensure threshold is in valid range
Backward Compatibility
- Default Value: 0.7 (maintains existing behavior)
- Optional Parameter: Existing code works without any changes
- No Breaking Changes: Graceful fallback if not specified