Filter speculation, enforce formats, and gate low-confidence data before it persists.
AI assistants plugged with memory systems face a problem - they often store everything. Not every conversation needs to be remembered, and not every detail should go to the memory store. Without proper controls, memory systems accumulate unreliable data.Mem0 lets you control your memory ingestion pipeline. In this cookbook, we’ll demonstrate these controls using a medical assistant example - showing how to filter unwanted data, enforce data formats, and implement confidence-based storage.
Without controls, everything gets stored - speculation, low-confidence data, and information that shouldn’t persist. This uncontrolled ingestion leads to cluttered memory and retrieval failures.Mem0 provides three tools to control what gets stored:
Custom instructions define what to remember and what to ignore.
Confidence thresholds ensure only verified facts persist.
Memory updates let you change information without creating duplicates.
In this tutorial, we will:
Filter speculative statements with custom instructions
Configure confidence thresholds for fact verification
Uncontrolled ingestion stores everything, including speculation:
Copy
Ask AI
# Patient mentions speculationmessages = [{"role": "user", "content": "I think I might be allergic to penicillin"}]client.add(messages, user_id="patient_123")# Check what got storedresults = client.search("patient allergies", filters={"user_id": "patient_123"})print(results['results'][0]['memory'])
Output:
Copy
Ask AI
Patient is allergic to penicillin
Without custom instructions, AI assistants treat speculation as confirmed facts. “I think I might be allergic” becomes “Patient is allergic”—a dangerous transformation in sensitive domains like healthcare, legal, or financial services.
The speculation became a confirmed fact. Let’s add controls.
Custom instructions tell Mem0 what to store and what to ignore.
Copy
Ask AI
instructions = """Only store CONFIRMED medical facts.Store:- Confirmed diagnoses from doctors- Known allergies with documented reactions- Current medications being takenIgnore:- Speculation (words like "might", "maybe", "I think")- Unverified symptoms- Casual mentions without confirmation"""client.project.update(custom_instructions=instructions)# Same speculative statementmessages = [{"role": "user", "content": "I think I might be allergic to penicillin"}]client.add(messages, user_id="patient_123")# Check what got storedresults = client.get_all(filters={"user_id": "patient_123"})print(f"Memories stored: {len(results['results'])}")
Output:
Copy
Ask AI
Memories stored: 0
Expected output: Zero memories stored. The speculative statement “I think I might be allergic” was filtered out before reaching storage. Custom instructions are actively blocking unreliable data.
When designing instructions, consider the trade-off between precision and recall:Too restrictive: You’ll miss important information (false negatives)
Copy
Ask AI
# Too strict - filters out useful context"""Only store information if explicitly stated by a doctor with full name,date, time, and medical license number."""
Too permissive: You’ll store unreliable data (false positives)
Copy
Ask AI
# Too loose - stores speculation as fact"""Store any health-related information mentioned."""
Balanced approach:
Copy
Ask AI
# Clear categories with examples"""Store CONFIRMED facts:- Diagnoses: "Dr. Smith diagnosed hypertension on March 15th"- Allergies: "Patient had hives reaction to penicillin"- Medications: "Taking Lisinopril 10mg daily"Ignore SPECULATION:- "I think I might have..."- "Maybe it's..."- "Could be related to...""""
Start with strict instructions (only store confirmed facts), then relax them based on your use case. It’s easier to allow more data than to clean up polluted memory. Test with sample conversations before deploying to production.
Start with clear categories and iterate based on retrieval quality.
General assistants: 0.6+ confidence is often sufficient
Exploratory systems: Lower thresholds (0.4+) capture more data
Test your pipeline with multiple input examples and threshold combinations to find what works for your use case.
Copy
Ask AI
# Configure stricter instructionsclient.project.update( custom_instructions="""Only extract memories with HIGH confidence.Require specific details (dates, dosages, doctor names) for medical facts.Skip vague or uncertain statements.""")# Test with uncertain statementmessages = [{"role": "user", "content": "The doctor mentioned something about my blood pressure"}]result1 = client.add(messages, user_id="patient_123")# Test with confirmed factmessages = [{"role": "user", "content": "Dr. Smith diagnosed me with hypertension on March 15th"}]result2 = client.add(messages, user_id="patient_123")print("Vague statement stored:", len(result1['results']) > 0)print("Confirmed fact stored:", len(result2['results']) > 0)
Expected behavior: Low-confidence extractions are now filtered out automatically. Only verified facts with specific details (names, dates, dosages) persist in memory. The confidence threshold is working.
The vague statement was filtered for low confidence. The confirmed fact with specific details was stored.
Custom instructions can prevent storing personal identifiers:
Copy
Ask AI
client.project.update( custom_instructions="""Medical memory rules:STORE:- Confirmed diagnoses- Verified allergies- Current medicationsNEVER STORE:- Social Security Numbers- Insurance policy numbers- Credit card information- Full addresses- Phone numbersReplace identifiers with generic references if mentioned.""")# Test with PIImessages = [ {"role": "user", "content": "My SSN is 123-45-6789 and I'm allergic to penicillin"}]client.add(messages, user_id="patient_123")# Check what was storedresults = client.get_all(filters={"user_id": "patient_123"})for result in results['results']: print(result['memory'])
Output:
Copy
Ask AI
Patient is allergic to penicillin
The SSN was filtered out, but the allergy was stored.
That “no duplicates” promise comes from the inference pipeline. Keep infer=True when you rely on automatic updates. Raw imports (infer=False) skip conflict checks, so mixing the two modes for the same fact will create duplicates.
Maintains relationships:
If using graph memory, connections to other entities persist
Runs the LLM pipeline so Mem0 extracts structured facts and resolves conflicts automatically.
Daily conversations, preference tracking, anything you want deduped.
Slightly slower because inference runs on every write.
infer=False
Stores your payload exactly as-is—no inference, no dedupe.
Bulk imports, compliance snapshots, curated facts you already trust.
Later infer=True calls for the same fact will create duplicates you must clean manually.
Stay consistent per data source. If you need both behaviors, keep them in separate scopes (e.g., different app_id or run_id) so you always know which memories are inferred vs direct imports.
Here’s a complete ingestion pipeline with all controls:
Copy
Ask AI
from mem0 import MemoryClientimport os# Initialize clientclient = MemoryClient(api_key=os.getenv("MEM0_API_KEY"))# Configure custom instructionsclient.project.update( custom_instructions="""Medical memory assistant rules:STORE:- Confirmed diagnoses (with doctor name and date)- Verified allergies (with reaction details)- Current medications (with dosage)IGNORE:- Speculation (might, maybe, possibly)- Unverified symptoms- Personal identifiers (SSN, insurance numbers)CONFIDENCE:Require high confidence. Reject vague or uncertain statements.Require specific details: names, dates, dosages.""")# Helper function for safe ingestiondef add_medical_memory(content, user_id, metadata=None): """Add memory with automatic filtering.""" result = client.add( [{"role": "user", "content": content}], user_id=user_id, metadata=metadata or {} ) if result['results']: print(f"✓ Stored: {result['results'][0]['memory']}") else: print(f"✗ Filtered: {content}") return result# Test casesprint("Testing ingestion pipeline:\\n")test_cases = [ "I think I might be allergic to penicillin", "Dr. Johnson confirmed penicillin allergy on Jan 15th with hives reaction", "Patient SSN is 123-45-6789", "Currently taking Lisinopril 10mg daily for hypertension", "Feeling tired lately", "Dr. Martinez diagnosed Type 2 diabetes on February 3rd, 2025"]for content in test_cases: add_medical_memory(content, user_id="patient_123") print()
Output:
Copy
Ask AI
Testing ingestion pipeline:✗ Filtered: I think I might be allergic to penicillin✓ Stored: Patient has confirmed penicillin allergy diagnosed by Dr. Johnson on January 15th with hives reaction✗ Filtered: Patient SSN is 123-45-6789✓ Stored: Patient is currently taking Lisinopril 10mg daily for hypertension✗ Filtered: Feeling tired lately✓ Stored: Patient diagnosed with Type 2 diabetes by Dr. Martinez on February 3rd, 2025
Start with conservative filters (only store confirmed facts) and iterate based on your application’s needs. Combine custom instructions with confidence thresholds for the most reliable memory ingestion pipeline.
Expire Short-Term Data
Automatically clean up session context before it clutters retrieval.
Choose Your Memory Architecture
Learn when to layer graph memory alongside vectors for multi-hop queries.