Multimodal Support

Multimodal support lets Mem0 extract facts from images alongside regular text. Add screenshots, receipts, or product photos and Mem0 will store the insights as searchable memories so agents can recall them later.

You’ll use this when…

Users share screenshots, menus, or documents and you want the details to become memories.
You already collect text conversations but need visual context for better answers.
You want a single workflow that handles both URLs and local image files.

Images larger than 20 MB are rejected. Compress or resize files before sending them to avoid errors.

Feature anatomy

Vision processing: Mem0 runs the image through a vision model that extracts text and key details.
Memory creation: Extracted information is stored as standard memories so search, filters, and analytics continue to work.
Context linking: Visual and textual turns in the same conversation stay linked, giving agents richer context.
Flexible inputs: Accept publicly accessible URLs or base64-encoded local files in both Python and JavaScript SDKs.

Supported formats

Format	Used for	Notes
JPEG / JPG	Photos and screenshots	Default option for camera captures.
PNG	Images with transparency	Keeps sharp text and UI elements crisp.
WebP	Web-optimized images	Smaller payloads for faster uploads.
GIF	Static or animated graphics	Works for simple graphics and short loops.

Configure it

Add image messages from URLs

from mem0 import Memory

client = Memory()

messages = [
    {"role": "user", "content": "Hi, my name is Alice."},
    {
        "role": "user",
        "content": {
            "type": "image_url",
            "image_url": {
                "url": "https://example.com/menu.jpg"
            }
        }
    }
]

client.add(messages, user_id="alice")

Inspect the response payload—the memories list should include entries extracted from the menu image as well as the text turns.

Upload local images as base64

import base64
from mem0 import Memory

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

client = Memory()
base64_image = encode_image("path/to/your/image.jpg")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{base64_image}"
                }
            }
        ]
    }
]

client.add(messages, user_id="alice")

Keep base64 payloads under 5 MB to speed up uploads and avoid hitting the 20 MB limit.

See it in action

from mem0 import Memory

client = Memory()

messages = [
    {
        "role": "user",
        "content": "Help me remember which dishes I liked."
    },
    {
        "role": "user",
        "content": {
            "type": "image_url",
            "image_url": {
                "url": "https://example.com/restaurant-menu.jpg"
            }
        }
    },
    {
        "role": "user",
        "content": "I’m allergic to peanuts and prefer vegetarian meals."
    }
]

result = client.add(messages, user_id="user123")
print(result)

The response should capture both the allergy note and menu items extracted from the photo so future searches can combine them.

Document capture

messages = [
    {
        "role": "user",
        "content": "Store this receipt information for expenses."
    },
    {
        "role": "user",
        "content": {
            "type": "image_url",
            "image_url": {
                "url": "https://example.com/receipt.jpg"
            }
        }
    }
]

client.add(messages, user_id="user123")

Combine the receipt upload with structured metadata (tags, categories) if you need to filter expenses later.

Error handling

from mem0 import Memory
from mem0.exceptions import InvalidImageError, FileSizeError

client = Memory()

try:
    messages = [{
        "role": "user",
        "content": {
            "type": "image_url",
            "image_url": {"url": "https://example.com/image.jpg"}
        }
    }]

    client.add(messages, user_id="user123")
    print("Image processed successfully")

except InvalidImageError:
    print("Invalid image format or corrupted file")
except FileSizeError:
    print("Image file too large")
except Exception as exc:
    print(f"Unexpected error: {exc}")

Fail fast on invalid formats so you can prompt users to re-upload before losing their context.

Verify the feature is working

After calling add, inspect the returned memories and confirm they include image-derived text (menu items, receipt totals, etc.).
Run a follow-up search for a detail from the image; the memory should surface alongside related text.
Monitor image upload latency—large files should still complete under your acceptable response time.
Log file size and URL sources to troubleshoot repeated failures.

Best practices

Ask for intent: Prompt users to explain why they sent an image so the memory includes the right context.
Keep images readable: Encourage clear photos without heavy filters or shadows for better extraction.
Split bulk uploads: Send multiple images as separate add calls to isolate failures and improve reliability.
Watch privacy: Avoid uploading sensitive documents unless your environment is secured for that data.
Validate file size early: Check file size before encoding to save bandwidth and time.

Troubleshooting

Issue	Cause	Fix
Upload rejected	File larger than 20 MB	Compress or resize before sending.
Memory missing image data	Low-quality or blurry image	Retake the photo with better lighting.
Invalid format error	Unsupported file type	Convert to JPEG or PNG first.
Slow processing	High-resolution images	Downscale or compress to under 5 MB.
Base64 errors	Incorrect prefix or encoding	Ensure `data:image/<type>;base64,` is present and the string is valid.

Connect Vision Models

Review supported vision-capable models and configuration details.

Build Multimodal Retrieval

Follow an end-to-end workflow pairing text and image memories.

Getting Started

Self-Hosting Features

Configuration

Community & Support

Multimodal Support

Feature anatomy

Configure it

Add image messages from URLs

Upload local images as base64

See it in action

Restaurant menu memory

Document capture

Error handling

Verify the feature is working

Best practices

Troubleshooting

Connect Vision Models

Build Multimodal Retrieval

Getting Started

Self-Hosting Features

Configuration

Community & Support

​Feature anatomy

​Configure it

​Add image messages from URLs

​Upload local images as base64

​See it in action

​Restaurant menu memory

​Document capture

​Error handling

​Verify the feature is working

​Best practices

​Troubleshooting

Connect Vision Models

Build Multimodal Retrieval

Feature anatomy

Configure it

Add image messages from URLs

Upload local images as base64

See it in action

Restaurant menu memory

Document capture

Error handling

Verify the feature is working

Best practices

Troubleshooting