Usage
Config
Here are the parameters available for configuring Databricks Vector Search:| Parameter | Description | Default Value |
|---|---|---|
workspace_url | The URL of your Databricks workspace | Required |
access_token | Personal Access Token for authentication | None |
client_id | Service principal client ID (alternative to access_token) | None |
client_secret | Service principal client secret (required with client_id) | None |
azure_client_id | Azure AD application client ID (for Azure Databricks) | None |
azure_client_secret | Azure AD application client secret (for Azure Databricks) | None |
endpoint_name | Name of the Vector Search endpoint | Required |
catalog | Unity Catalog catalog name | Required |
schema | Unity Catalog schema name | Required |
table_name | Source Delta table name | Required |
collection_name | Vector search index name | mem0 |
index_type | Index type: DELTA_SYNC or DIRECT_ACCESS | DELTA_SYNC |
embedding_model_endpoint_name | Databricks serving endpoint for embeddings | None |
embedding_dimension | Dimension of self-managed embeddings | 1536 |
endpoint_type | Type of endpoint (STANDARD or STORAGE_OPTIMIZED) | STANDARD |
pipeline_type | Sync pipeline type: TRIGGERED or CONTINUOUS | TRIGGERED |
warehouse_name | Databricks SQL warehouse name (if using SQL warehouse) | None |
query_type | Query type: ANN or HYBRID | ANN |
Authentication
Databricks Vector Search supports two authentication methods:Service Principal (Recommended for Production)
Personal Access Token (for Development)
Embedding Options
Self-Managed Embeddings (Default)
Use your own embedding model and provide vectors directly:Databricks-Computed Embeddings
Let Databricks compute embeddings from text using a serving endpoint:Important Notes
- Index Types: This implementation supports both
DELTA_SYNC(auto-syncs with source Delta table) andDIRECT_ACCESS(manage vectors directly) index types. - Unity Catalog: The source table and index are created under the specified
catalog.schemanamespace. - Endpoint Auto-Creation: If the specified endpoint doesn’t exist, it will be created automatically.
- Index Auto-Creation: If the specified index doesn’t exist, it will be created automatically with the provided configuration.
- Filter Support: Supports filtering by metadata fields, with different syntax for STANDARD vs STORAGE_OPTIMIZED endpoints.