AI assistants plugged with memory systems face a problem - they often store everything. Not every conversation needs to be remembered, and not every detail should go to the memory store. Without proper controls, memory systems accumulate unreliable data.
Mem0 lets you control your memory ingestion pipeline. In this cookbook, we’ll demonstrate these controls using a medical assistant example - showing how to filter unwanted data, enforce data formats, and implement confidence-based storage.
Overview
Without controls, everything gets stored - speculation, low-confidence data, and information that shouldn’t persist. This uncontrolled ingestion leads to cluttered memory and retrieval failures.
Mem0 provides three tools to control what gets stored:
- Custom instructions define what to remember and what to ignore.
- Confidence thresholds ensure only verified facts persist.
- Memory updates let you change information without creating duplicates.
In this tutorial, we will:
- Filter speculative statements with custom instructions
- Configure confidence thresholds for fact verification
- Update stored information without duplication
- Build a complete ingestion pipeline
Setup
from mem0 import MemoryClient
client = MemoryClient(api_key="your-api-key")
Replace your-api-key with your actual Mem0 API key from the dashboard. Without proper API authentication, memory operations will fail.
The Problem
Uncontrolled ingestion stores everything, including speculation:
# Patient mentions speculation
messages = [{"role": "user", "content": "I think I might be allergic to penicillin"}]
client.add(messages, user_id="patient_123")
# Check what got stored
results = client.search("patient allergies", filters={"user_id": "patient_123"})
print(results['results'][0]['memory'])
Output:
Patient is allergic to penicillin
Without custom instructions, AI assistants treat speculation as confirmed facts. “I think I might be allergic” becomes “Patient is allergic”—a dangerous transformation in sensitive domains like healthcare, legal, or financial services.
The speculation became a confirmed fact. Let’s add controls.
Custom Instructions
Custom instructions tell Mem0 what to store and what to ignore.
instructions = """
Only store CONFIRMED medical facts.
Store:
- Confirmed diagnoses from doctors
- Known allergies with documented reactions
- Current medications being taken
Ignore:
- Speculation (words like "might", "maybe", "I think")
- Unverified symptoms
- Casual mentions without confirmation
"""
client.project.update(custom_instructions=instructions)
# Same speculative statement
messages = [{"role": "user", "content": "I think I might be allergic to penicillin"}]
client.add(messages, user_id="patient_123")
# Check what got stored
results = client.get_all(filters={"user_id": "patient_123"})
print(f"Memories stored: {len(results['results'])}")
Output:
Expected output: Zero memories stored. The speculative statement “I think I might be allergic” was filtered out before reaching storage. Custom instructions are actively blocking unreliable data.
The speculation was filtered out.
Designing Custom Instructions
When designing instructions, consider the trade-off between precision and recall:
Too restrictive: You’ll miss important information (false negatives)
# Too strict - filters out useful context
"""
Only store information if explicitly stated by a doctor with full name,
date, time, and medical license number.
"""
Too permissive: You’ll store unreliable data (false positives)
# Too loose - stores speculation as fact
"""
Store any health-related information mentioned.
"""
Balanced approach:
# Clear categories with examples
"""
Store CONFIRMED facts:
- Diagnoses: "Dr. Smith diagnosed hypertension on March 15th"
- Allergies: "Patient had hives reaction to penicillin"
- Medications: "Taking Lisinopril 10mg daily"
Ignore SPECULATION:
- "I think I might have..."
- "Maybe it's..."
- "Could be related to..."
"""
Start with strict instructions (only store confirmed facts), then relax them based on your use case. It’s easier to allow more data than to clean up polluted memory. Test with sample conversations before deploying to production.
Start with clear categories and iterate based on retrieval quality.
Confidence Thresholds
Mem0 assigns confidence scores to extracted memories. Use these to filter low-quality data.
Setting Thresholds
Setting the right confidence threshold depends on your application:
- High-stakes domains (medical, legal): Require 0.8+ confidence
- General assistants: 0.6+ confidence is often sufficient
- Exploratory systems: Lower thresholds (0.4+) capture more data
Test your pipeline with multiple input examples and threshold combinations to find what works for your use case.
# Configure stricter instructions
client.project.update(
custom_instructions="""
Only extract memories with HIGH confidence.
Require specific details (dates, dosages, doctor names) for medical facts.
Skip vague or uncertain statements.
"""
)
# Test with uncertain statement
messages = [{"role": "user", "content": "The doctor mentioned something about my blood pressure"}]
result1 = client.add(messages, user_id="patient_123")
# Test with confirmed fact
messages = [{"role": "user", "content": "Dr. Smith diagnosed me with hypertension on March 15th"}]
result2 = client.add(messages, user_id="patient_123")
print("Vague statement stored:", len(result1['results']) > 0)
print("Confirmed fact stored:", len(result2['results']) > 0)
Output:
Vague statement stored: False
Confirmed fact stored: True
Expected behavior: Low-confidence extractions are now filtered out automatically. Only verified facts with specific details (names, dates, dosages) persist in memory. The confidence threshold is working.
The vague statement was filtered for low confidence. The confirmed fact with specific details was stored.
Custom instructions can prevent storing personal identifiers:
client.project.update(
custom_instructions="""
Medical memory rules:
STORE:
- Confirmed diagnoses
- Verified allergies
- Current medications
NEVER STORE:
- Social Security Numbers
- Insurance policy numbers
- Credit card information
- Full addresses
- Phone numbers
Replace identifiers with generic references if mentioned.
"""
)
# Test with PII
messages = [
{"role": "user", "content": "My SSN is 123-45-6789 and I'm allergic to penicillin"}
]
client.add(messages, user_id="patient_123")
# Check what was stored
results = client.get_all(filters={"user_id": "patient_123"})
for result in results['results']:
print(result['memory'])
Output:
Patient is allergic to penicillin
The SSN was filtered out, but the allergy was stored.
Updating Memories
When information changes, update existing memories instead of creating duplicates.
# Initial allergy stored
result = client.add(
[{"role": "user", "content": "Patient confirmed allergy to penicillin with documented hives reaction"}],
user_id="patient_123"
)
memory_id = result['results'][0]['id']
print(f"Stored memory: {memory_id}")
# Later, patient gets retested - allergy was false positive
client.update(
memory_id=memory_id,
text="Patient tested negative for penicillin allergy on April 2nd, 2025. Previous allergy was false positive.",
metadata={"verified": True, "updated_date": "2025-04-02"}
)
# Retrieve the updated memory
updated = client.get(memory_id)
print(f"\\nUpdated memory: {updated['memory']}")
print(f"Metadata: {updated['metadata']}")
Output:
Stored memory: mem_abc123
Updated memory: Patient tested negative for penicillin allergy on April 2nd, 2025. Previous allergy was false positive.
Metadata: {'verified': True, 'updated_date': '2025-04-02'}
Benefits of Updating
Preserves history:
created_at shows when the memory was first stored
updated_at shows when it was modified
- Audit trail for compliance
Avoids conflicts:
- No duplicate or contradicting memories
- Single source of truth for each fact
Maintains relationships:
- If using graph memory, connections to other entities persist
Update vs Delete
When should you update vs delete?
Update when:
- Information changes but remains relevant
- You need audit history
- The memory has relationships to other data
# Medication dosage changed
client.update(
memory_id=med_id,
text="Taking Lisinopril 20mg daily (increased from 10mg on March 1st)"
)
Delete when:
- Information was completely wrong
- Memory is no longer relevant
- Duplicate entry
# Duplicate entry
client.delete(memory_id)
Putting It Together
Here’s a complete ingestion pipeline with all controls:
from mem0 import MemoryClient
import os
# Initialize client
client = MemoryClient(api_key=os.getenv("MEM0_API_KEY"))
# Configure custom instructions
client.project.update(
custom_instructions="""
Medical memory assistant rules:
STORE:
- Confirmed diagnoses (with doctor name and date)
- Verified allergies (with reaction details)
- Current medications (with dosage)
IGNORE:
- Speculation (might, maybe, possibly)
- Unverified symptoms
- Personal identifiers (SSN, insurance numbers)
CONFIDENCE:
Require high confidence. Reject vague or uncertain statements.
Require specific details: names, dates, dosages.
"""
)
# Helper function for safe ingestion
def add_medical_memory(content, user_id, metadata=None):
"""Add memory with automatic filtering."""
result = client.add(
[{"role": "user", "content": content}],
user_id=user_id,
metadata=metadata or {}
)
if result['results']:
print(f"✓ Stored: {result['results'][0]['memory']}")
else:
print(f"✗ Filtered: {content}")
return result
# Test cases
print("Testing ingestion pipeline:\\n")
test_cases = [
"I think I might be allergic to penicillin",
"Dr. Johnson confirmed penicillin allergy on Jan 15th with hives reaction",
"Patient SSN is 123-45-6789",
"Currently taking Lisinopril 10mg daily for hypertension",
"Feeling tired lately",
"Dr. Martinez diagnosed Type 2 diabetes on February 3rd, 2025"
]
for content in test_cases:
add_medical_memory(content, user_id="patient_123")
print()
Output:
Testing ingestion pipeline:
✗ Filtered: I think I might be allergic to penicillin
✓ Stored: Patient has confirmed penicillin allergy diagnosed by Dr. Johnson on January 15th with hives reaction
✗ Filtered: Patient SSN is 123-45-6789
✓ Stored: Patient is currently taking Lisinopril 10mg daily for hypertension
✗ Filtered: Feeling tired lately
✓ Stored: Patient diagnosed with Type 2 diabetes by Dr. Martinez on February 3rd, 2025
Per-Call Instructions
You can override project-level instructions for specific conversations:
First define custom instructions
custom_instructions="""Emergency intake mode:Store ALL symptoms and observations immediately.
Flag for later review and verification."""
# Emergency intake - store everything temporarily
emergency_messages = [
{"role": "user", "content": "Patient arrived with chest pain and shortness of breath"}
]
client.add(
emergency_messages,
user_id="patient_456",
custom_instructions=custom_instructions,
metadata={"type": "emergency", "review_required": True}
)
This is useful for:
- Different conversation types (emergency vs routine)
- Channel-specific rules (phone vs in-person)
- Temporary data collection that needs review
What You Built
You now have a medical assistant with production-grade memory controls:
- Custom instructions - Filter speculation and enforce confirmed facts only
- Confidence thresholds - Gate extractions below 0.7 confidence score
- Memory updates - Modify stored information without creating duplicates
- Per-call instructions - Apply temporary rules for specific conversations
- PII filtering - Block sensitive data (SSNs, insurance numbers) automatically
These controls prevent retrieval failures and ensure your AI assistant works with reliable, verified information.
Summary
Start with conservative filters (only store confirmed facts) and iterate based on your application’s needs. Combine custom instructions with confidence thresholds for the most reliable memory ingestion pipeline.