Databricks Vector Search is a serverless similarity search engine that allows you to store a vector representation of your data, including metadata, in a vector database. With Vector Search, you can create auto-updating vector search indexes from Delta tables managed by Unity Catalog and query them with a simple API to return the most similar vectors.

Usage

import os
from mem0 import Memory

config = {
    "vector_store": {
        "provider": "databricks",
        "config": {
            "workspace_url": "https://your-workspace.databricks.com",
            "access_token": "your-access-token",
            "endpoint_name": "your-vector-search-endpoint",
            "index_name": "catalog.schema.index_name",
            "source_table_name": "catalog.schema.source_table",
            "embedding_dimension": 1536
        }
    }
}

m = Memory.from_config(config)
messages = [
    {"role": "user", "content": "I'm planning to watch a movie tonight. Any recommendations?"},
    {"role": "assistant", "content": "How about thriller movies? They can be quite engaging."},
    {"role": "user", "content": "I'm not a big fan of thriller movies but I love sci-fi movies."},
    {"role": "assistant", "content": "Got it! I'll avoid thriller recommendations and suggest sci-fi movies in the future."}
]
m.add(messages, user_id="alice", metadata={"category": "movies"})

Config

Here are the parameters available for configuring Databricks Vector Search:
ParameterDescriptionDefault Value
workspace_urlThe URL of your Databricks workspaceRequired
access_tokenPersonal Access Token for authenticationNone
service_principal_client_idService principal client ID (alternative to access_token)None
service_principal_client_secretService principal client secret (required with client_id)None
endpoint_nameName of the Vector Search endpointRequired
index_nameName of the vector index (Unity Catalog format: catalog.schema.index)Required
source_table_nameName of the source Delta table (Unity Catalog format: catalog.schema.table)Required
embedding_dimensionDimension of self-managed embeddings1536
embedding_source_columnColumn name for text when using Databricks-computed embeddingsNone
embedding_model_endpoint_nameDatabricks serving endpoint for embeddingsNone
embedding_vector_columnColumn name for self-managed embedding vectorsembedding
endpoint_typeType of endpoint (STANDARD or STORAGE_OPTIMIZED)STANDARD
sync_computed_embeddingsWhether to sync computed embeddings automaticallyTrue

Authentication

Databricks Vector Search supports two authentication methods:
config = {
    "vector_store": {
        "provider": "databricks",
        "config": {
            "workspace_url": "https://your-workspace.databricks.com",
            "service_principal_client_id": "your-service-principal-id",
            "service_principal_client_secret": "your-service-principal-secret",
            "endpoint_name": "your-endpoint",
            "index_name": "catalog.schema.index_name",
            "source_table_name": "catalog.schema.source_table"
        }
    }
}

Personal Access Token (for Development)

config = {
    "vector_store": {
        "provider": "databricks",
        "config": {
            "workspace_url": "https://your-workspace.databricks.com",
            "access_token": "your-personal-access-token",
            "endpoint_name": "your-endpoint",
            "index_name": "catalog.schema.index_name",
            "source_table_name": "catalog.schema.source_table"
        }
    }
}

Embedding Options

Self-Managed Embeddings (Default)

Use your own embedding model and provide vectors directly:
config = {
    "vector_store": {
        "provider": "databricks",
        "config": {
            # ... authentication config ...
            "embedding_dimension": 768,  # Match your embedding model
            "embedding_vector_column": "embedding"
        }
    }
}

Databricks-Computed Embeddings

Let Databricks compute embeddings from text using a serving endpoint:
config = {
    "vector_store": {
        "provider": "databricks",
        "config": {
            # ... authentication config ...
            "embedding_source_column": "text",
            "embedding_model_endpoint_name": "e5-small-v2"
        }
    }
}

Important Notes

  • Delta Sync Index: This implementation uses Delta Sync Index, which automatically syncs with your source Delta table. Direct vector insertion/deletion/update operations will log warnings as they’re not supported with Delta Sync.
  • Unity Catalog: Both the source table and index must be in Unity Catalog format (catalog.schema.table_name).
  • Endpoint Auto-Creation: If the specified endpoint doesn’t exist, it will be created automatically.
  • Index Auto-Creation: If the specified index doesn’t exist, it will be created automatically with the provided configuration.
  • Filter Support: Supports filtering by metadata fields, with different syntax for STANDARD vs STORAGE_OPTIMIZED endpoints.