Mem0 with OpenAI Agents SDK for Voice
Integrate memory capabilities into your voice agents using Mem0 and OpenAI Agents SDK
Building Voice Agents with Memory using Mem0 and OpenAI Agents SDK
This guide demonstrates how to combine OpenAI’s Agents SDK for voice applications with Mem0’s memory capabilities to create a voice assistant that remembers user preferences and past interactions.
Prerequisites
Before you begin, make sure you have:
- Installed OpenAI Agents SDK with voice dependencies:
- Installed Mem0 SDK:
- Installed other required dependencies:
- Set up your API keys:
- OpenAI API key for the Agents SDK
- Mem0 API key from the Mem0 Platform
Code Breakdown
Let’s break down the key components of this implementation:
1. Setting Up Dependencies and Environment
This section handles:
- Importing required modules from OpenAI Agents SDK and Mem0
- Setting up environment variables for API keys
- Defining a simple user identification system (using a global variable)
- Initializing the Mem0 client that will handle memory operations
2. Memory Tools with Function Decorators
The @function_tool
decorator transforms Python functions into callable tools for the OpenAI agent. Here are the key memory tools:
Storing User Memories
This function:
- Takes a memory string
- Creates a formatted memory string
- Stores it in Mem0 using the
add()
method - Includes metadata to categorize the memory for easier retrieval
- Returns a confirmation message that the agent will speak
Finding Relevant Memories
This tool:
- Takes a search query string
- Passes it to Mem0’s semantic search to find related memories
- Sets a threshold for relevance to ensure quality results
- Returns a formatted list of relevant memories or a default message
3. Creating the Voice Agent
This function:
- Creates an OpenAI Agent with specific instructions
- Configures it to use gpt-4o (you can use other models)
- Registers the memory-related tools with the agent
- Uses
prompt_with_handoff_instructions
to include standard voice agent behaviors
4. Microphone Recording Functionality
This function:
- Creates a simple asynchronous microphone recording function
- Uses the sounddevice library to capture audio input
- Stores frames in a buffer during recording
- Combines frames into a single numpy array when complete
- Returns the audio data for processing
5. Main Loop and Voice Processing
This main function orchestrates the entire process:
- Creates the memory-enabled voice agent
- Sets up the voice pipeline with TTS settings
- Implements an interactive loop for recording and processing voice input
- Handles streaming of response events (both audio and text)
- Automatically saves the agent’s responses to memory
- Includes proper error handling and exit mechanisms
Create a Memory-Enabled Voice Agent
Now that we’ve explained each component, here’s the complete implementation that combines OpenAI Agents SDK for voice with Mem0’s memory capabilities:
Key Features of This Implementation
This implementation offers several key features:
-
Simplified User Management: Uses a global
USER_ID
variable for simplicity, but can be extended to manage multiple users. -
Real Microphone Input: Includes a
record_from_microphone()
function that captures actual voice input from your microphone. -
Interactive Voice Loop: Implements a continuous interaction loop, allowing for multiple back-and-forth exchanges.
-
Memory Management Tools:
save_memories
: Stores user memories in Mem0search_memories
: Searches for relevant past information
-
Voice Configuration: Demonstrates how to configure TTS settings for the voice response.
Running the Example
To run this example:
- Replace the placeholder API keys with your actual keys
- Make sure your microphone is properly connected
- Run the script with Python 3.8 or newer
- Press Enter to start recording, then speak your request
- Press ‘q’ to quit the application
The agent will listen to your request, process it through the OpenAI model, utilize Mem0 for memory operations as needed, and respond both through text output and voice speech.
Best Practices for Voice Agents with Memory
-
Optimizing Memory for Voice: Keep memories concise and relevant for voice responses.
-
Forgetting Mechanism: Implement a way to delete or expire memories that are no longer relevant.
-
Context Preservation: Store enough context with each memory to make retrieval effective.
-
Error Handling: Implement robust error handling for memory operations, as voice interactions should continue smoothly even if memory operations fail.
Conclusion
By combining OpenAI’s Agents SDK with Mem0’s memory capabilities, you can create voice agents that maintain persistent memory of user preferences and past interactions. This significantly enhances the user experience by making conversations more natural and personalized.
As you build your voice application, experiment with different memory strategies and filtering approaches to find the optimal balance between comprehensive memory and efficient retrieval for your specific use case.
Debugging Function Tools
When working with the OpenAI Agents SDK, you might notice that regular print()
statements inside @function_tool
decorated functions don’t appear in your console output. This is because the Agents SDK captures and redirects standard output when executing these functions.
To effectively debug your function tools, use Python’s logging
module instead: