Usage
Config
Here are the parameters available for configuring Databricks Vector Search:Parameter | Description | Default Value |
---|---|---|
workspace_url | The URL of your Databricks workspace | Required |
access_token | Personal Access Token for authentication | None |
service_principal_client_id | Service principal client ID (alternative to access_token) | None |
service_principal_client_secret | Service principal client secret (required with client_id) | None |
endpoint_name | Name of the Vector Search endpoint | Required |
index_name | Name of the vector index (Unity Catalog format: catalog.schema.index) | Required |
source_table_name | Name of the source Delta table (Unity Catalog format: catalog.schema.table) | Required |
embedding_dimension | Dimension of self-managed embeddings | 1536 |
embedding_source_column | Column name for text when using Databricks-computed embeddings | None |
embedding_model_endpoint_name | Databricks serving endpoint for embeddings | None |
embedding_vector_column | Column name for self-managed embedding vectors | embedding |
endpoint_type | Type of endpoint (STANDARD or STORAGE_OPTIMIZED ) | STANDARD |
sync_computed_embeddings | Whether to sync computed embeddings automatically | True |
Authentication
Databricks Vector Search supports two authentication methods:Service Principal (Recommended for Production)
Personal Access Token (for Development)
Embedding Options
Self-Managed Embeddings (Default)
Use your own embedding model and provide vectors directly:Databricks-Computed Embeddings
Let Databricks compute embeddings from text using a serving endpoint:Important Notes
- Delta Sync Index: This implementation uses Delta Sync Index, which automatically syncs with your source Delta table. Direct vector insertion/deletion/update operations will log warnings as they’re not supported with Delta Sync.
- Unity Catalog: Both the source table and index must be in Unity Catalog format (
catalog.schema.table_name
). - Endpoint Auto-Creation: If the specified endpoint doesn’t exist, it will be created automatically.
- Index Auto-Creation: If the specified index doesn’t exist, it will be created automatically with the provided configuration.
- Filter Support: Supports filtering by metadata fields, with different syntax for STANDARD vs STORAGE_OPTIMIZED endpoints.