Skip to main content
Databricks Vector Search is a serverless similarity search engine that allows you to store a vector representation of your data, including metadata, in a vector database. With Vector Search, you can create auto-updating vector search indexes from Delta tables managed by Unity Catalog and query them with a simple API to return the most similar vectors.

Usage

import os
from mem0 import Memory

config = {
    "vector_store": {
        "provider": "databricks",
        "config": {
            "workspace_url": "https://your-workspace.databricks.com",
            "access_token": "your-access-token",
            "endpoint_name": "your-vector-search-endpoint",
            "catalog": "your_catalog",
            "schema": "your_schema",
            "table_name": "your_table",
            "collection_name": "your_index_name",
            "embedding_dimension": 1536
        }
    }
}

m = Memory.from_config(config)
messages = [
    {"role": "user", "content": "I'm planning to watch a movie tonight. Any recommendations?"},
    {"role": "assistant", "content": "How about thriller movies? They can be quite engaging."},
    {"role": "user", "content": "I'm not a big fan of thriller movies but I love sci-fi movies."},
    {"role": "assistant", "content": "Got it! I'll avoid thriller recommendations and suggest sci-fi movies in the future."}
]
m.add(messages, user_id="alice", metadata={"category": "movies"})

Config

Here are the parameters available for configuring Databricks Vector Search:
ParameterDescriptionDefault Value
workspace_urlThe URL of your Databricks workspaceRequired
access_tokenPersonal Access Token for authenticationNone
client_idService principal client ID (alternative to access_token)None
client_secretService principal client secret (required with client_id)None
azure_client_idAzure AD application client ID (for Azure Databricks)None
azure_client_secretAzure AD application client secret (for Azure Databricks)None
endpoint_nameName of the Vector Search endpointRequired
catalogUnity Catalog catalog nameRequired
schemaUnity Catalog schema nameRequired
table_nameSource Delta table nameRequired
collection_nameVector search index namemem0
index_typeIndex type: DELTA_SYNC or DIRECT_ACCESSDELTA_SYNC
embedding_model_endpoint_nameDatabricks serving endpoint for embeddingsNone
embedding_dimensionDimension of self-managed embeddings1536
endpoint_typeType of endpoint (STANDARD or STORAGE_OPTIMIZED)STANDARD
pipeline_typeSync pipeline type: TRIGGERED or CONTINUOUSTRIGGERED
warehouse_nameDatabricks SQL warehouse name (if using SQL warehouse)None
query_typeQuery type: ANN or HYBRIDANN

Authentication

Databricks Vector Search supports two authentication methods:
config = {
    "vector_store": {
        "provider": "databricks",
        "config": {
            "workspace_url": "https://your-workspace.databricks.com",
            "client_id": "your-service-principal-id",
            "client_secret": "your-service-principal-secret",
            "endpoint_name": "your-endpoint",
            "catalog": "your_catalog",
            "schema": "your_schema",
            "table_name": "your_table",
            "collection_name": "your_index_name",
        }
    }
}

Personal Access Token (for Development)

config = {
    "vector_store": {
        "provider": "databricks",
        "config": {
            "workspace_url": "https://your-workspace.databricks.com",
            "access_token": "your-personal-access-token",
            "endpoint_name": "your-endpoint",
            "catalog": "your_catalog",
            "schema": "your_schema",
            "table_name": "your_table",
            "collection_name": "your_index_name",
        }
    }
}

Embedding Options

Self-Managed Embeddings (Default)

Use your own embedding model and provide vectors directly:
config = {
    "vector_store": {
        "provider": "databricks",
        "config": {
            # ... authentication config ...
            "embedding_dimension": 768,  # Match your embedding model
        }
    }
}

Databricks-Computed Embeddings

Let Databricks compute embeddings from text using a serving endpoint:
config = {
    "vector_store": {
        "provider": "databricks",
        "config": {
            # ... authentication config ...
            "embedding_model_endpoint_name": "e5-small-v2"
        }
    }
}

Important Notes

  • Index Types: This implementation supports both DELTA_SYNC (auto-syncs with source Delta table) and DIRECT_ACCESS (manage vectors directly) index types.
  • Unity Catalog: The source table and index are created under the specified catalog.schema namespace.
  • Endpoint Auto-Creation: If the specified endpoint doesn’t exist, it will be created automatically.
  • Index Auto-Creation: If the specified index doesn’t exist, it will be created automatically with the provided configuration.
  • Filter Support: Supports filtering by metadata fields, with different syntax for STANDARD vs STORAGE_OPTIMIZED endpoints.