Enable multimodal support in Mem0 to process images alongside text and extract visual information into memory.
Mem0 extends its capabilities beyond text by supporting multimodal data, including images. You can seamlessly integrate images into your interactions, allowing Mem0 to extract pertinent information from visual content and enrich the memory system.
When you provide an image, Mem0 processes it to extract textual information and relevant details, which are then added to your memory. This feature enhances the system’s ability to understand and remember details based on visual inputs.
To enable multimodal support, you must set enable_vision = True in your configuration. The vision_details parameter can be set to “auto” (default), “low”, or “high” to control the level of detail in image processing.
from mem0 import Memoryconfig = { "llm": { "provider": "openai", "config": { "enable_vision": True, "vision_details": "high" } }}client = Memory.from_config(config=config)messages = [ { "role": "user", "content": "Hi, my name is Alice." }, { "role": "assistant", "content": "Nice to meet you, Alice! What do you like to eat?" }, { "role": "user", "content": { "type": "image_url", "image_url": { "url": "https://www.superhealthykids.com/wp-content/uploads/2021/10/best-veggie-pizza-featured-image-square-2.jpg" } } },]# Calling the add method to ingest messages into the memory systemclient.add(messages, user_id="alice")
Mem0 allows you to add images to user interactions through two primary methods: by providing an image URL or by using a Base64-encoded image. Below are examples demonstrating each approach.
You can also use the OpenAI-compatible format to combine text and images in a single message:
import base64# Path to the image fileimage_path = "path/to/your/image.jpg"# Encode the image in Base64with open(image_path, "rb") as image_file: base64_image = base64.b64encode(image_file.read()).decode("utf-8")# Create the message using OpenAI-compatible formatmessage = { "role": "user", "content": [ { "type": "text", "text": "What is in this image?", }, { "type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}, }, ],}# Add the message to memoryclient.add([message], user_id="alice")
This format allows you to combine text and images in a single message, making it easier to provide context along with visual content.By utilizing these methods, you can effectively incorporate images into user interactions, enhancing the multimodal capabilities of your Mem0 instance.If you have any questions, please feel free to reach out to us using one of the following methods: