Control Memory Ingestion

AI assistants plugged with memory systems face a problem - they often store everything. Not every conversation needs to be remembered, and not every detail should go to the memory store. Without proper controls, memory systems accumulate unreliable data. Mem0 lets you control your memory ingestion pipeline. In this cookbook, we’ll demonstrate these controls using a medical assistant example - showing how to filter unwanted data, enforce data formats, and implement confidence-based storage.

Overview

Without controls, everything gets stored - speculation, low-confidence data, and information that shouldn’t persist. This uncontrolled ingestion leads to cluttered memory and retrieval failures. Mem0 provides three tools to control what gets stored:

Custom instructions define what to remember and what to ignore.
Confidence thresholds ensure only verified facts persist.
Memory updates let you change information without creating duplicates.

In this tutorial, we will:

Filter speculative statements with custom instructions
Configure confidence thresholds for fact verification
Update stored information without duplication
Build a complete ingestion pipeline

Setup

from mem0 import MemoryClient

client = MemoryClient(api_key="your-api-key")

Replace your-api-key with your actual Mem0 API key from the dashboard. Without proper API authentication, memory operations will fail.

The Problem

Uncontrolled ingestion stores everything, including speculation:

# Patient mentions speculation
messages = [{"role": "user", "content": "I think I might be allergic to penicillin"}]
client.add(messages, user_id="patient_123")

# Check what got stored
results = client.search("patient allergies", filters={"user_id": "patient_123"})
print(results['results'][0]['memory'])

Output:

Patient is allergic to penicillin

Without custom instructions, AI assistants treat speculation as confirmed facts. “I think I might be allergic” becomes “Patient is allergic”—a dangerous transformation in sensitive domains like healthcare, legal, or financial services.

The speculation became a confirmed fact. Let’s add controls.

Custom Instructions

Custom instructions tell Mem0 what to store and what to ignore.

instructions = """
Only store CONFIRMED medical facts.

Store:
- Confirmed diagnoses from doctors
- Known allergies with documented reactions
- Current medications being taken

Ignore:
- Speculation (words like "might", "maybe", "I think")
- Unverified symptoms
- Casual mentions without confirmation
"""

client.project.update(custom_instructions=instructions)

# Same speculative statement
messages = [{"role": "user", "content": "I think I might be allergic to penicillin"}]
client.add(messages, user_id="patient_123")

# Check what got stored
results = client.get_all(filters={"user_id": "patient_123"})
print(f"Memories stored: {len(results['results'])}")

Output:

Memories stored: 0

Expected output: Zero memories stored. The speculative statement “I think I might be allergic” was filtered out before reaching storage. Custom instructions are actively blocking unreliable data.

The speculation was filtered out.

Designing Custom Instructions

When designing instructions, consider the trade-off between precision and recall: Too restrictive: You’ll miss important information (false negatives)

# Too strict - filters out useful context
"""
Only store information if explicitly stated by a doctor with full name,
date, time, and medical license number.
"""

Too permissive: You’ll store unreliable data (false positives)

# Too loose - stores speculation as fact
"""
Store any health-related information mentioned.
"""

Balanced approach:

# Clear categories with examples
"""
Store CONFIRMED facts:
- Diagnoses: "Dr. Smith diagnosed hypertension on March 15th"
- Allergies: "Patient had hives reaction to penicillin"
- Medications: "Taking Lisinopril 10mg daily"

Ignore SPECULATION:
- "I think I might have..."
- "Maybe it's..."
- "Could be related to..."
"""

Start with strict instructions (only store confirmed facts), then relax them based on your use case. It’s easier to allow more data than to clean up polluted memory. Test with sample conversations before deploying to production.

Start with clear categories and iterate based on retrieval quality.

Confidence Thresholds

Mem0 assigns confidence scores to extracted memories. Use these to filter low-quality data.

Setting Thresholds

Setting the right confidence threshold depends on your application:

High-stakes domains (medical, legal): Require 0.8+ confidence
General assistants: 0.6+ confidence is often sufficient
Exploratory systems: Lower thresholds (0.4+) capture more data

Test your pipeline with multiple input examples and threshold combinations to find what works for your use case.

# Configure stricter instructions
client.project.update(
    custom_instructions="""
Only extract memories with HIGH confidence.
Require specific details (dates, dosages, doctor names) for medical facts.
Skip vague or uncertain statements.
"""
)

# Test with uncertain statement
messages = [{"role": "user", "content": "The doctor mentioned something about my blood pressure"}]
result1 = client.add(messages, user_id="patient_123")

# Test with confirmed fact
messages = [{"role": "user", "content": "Dr. Smith diagnosed me with hypertension on March 15th"}]
result2 = client.add(messages, user_id="patient_123")

print("Vague statement stored:", len(result1['results']) > 0)
print("Confirmed fact stored:", len(result2['results']) > 0)

Output:

Vague statement stored: False
Confirmed fact stored: True

Expected behavior: Low-confidence extractions are now filtered out automatically. Only verified facts with specific details (names, dates, dosages) persist in memory. The confidence threshold is working.

The vague statement was filtered for low confidence. The confirmed fact with specific details was stored.

Filtering Sensitive Information

Custom instructions can prevent storing personal identifiers:

client.project.update(
    custom_instructions="""
Medical memory rules:

STORE:
- Confirmed diagnoses
- Verified allergies
- Current medications

NEVER STORE:
- Social Security Numbers
- Insurance policy numbers
- Credit card information
- Full addresses
- Phone numbers

Replace identifiers with generic references if mentioned.
"""
)

# Test with PII
messages = [
    {"role": "user", "content": "My SSN is 123-45-6789 and I'm allergic to penicillin"}
]
client.add(messages, user_id="patient_123")

# Check what was stored
results = client.get_all(filters={"user_id": "patient_123"})
for result in results['results']:
    print(result['memory'])

Output:

Patient is allergic to penicillin

The SSN was filtered out, but the allergy was stored.

Updating Memories

When information changes, update existing memories instead of creating duplicates.

# Initial allergy stored
result = client.add(
    [{"role": "user", "content": "Patient confirmed allergy to penicillin with documented hives reaction"}],
    user_id="patient_123"
)

memory_id = result['results'][0]['id']
print(f"Stored memory: {memory_id}")

# Later, patient gets retested - allergy was false positive
client.update(
    memory_id=memory_id,
    text="Patient tested negative for penicillin allergy on April 2nd, 2025. Previous allergy was false positive.",
    metadata={"verified": True, "updated_date": "2025-04-02"}
)

# Retrieve the updated memory
updated = client.get(memory_id)
print(f"\\nUpdated memory: {updated['memory']}")
print(f"Metadata: {updated['metadata']}")

Output:

Stored memory: mem_abc123

Updated memory: Patient tested negative for penicillin allergy on April 2nd, 2025. Previous allergy was false positive.
Metadata: {'verified': True, 'updated_date': '2025-04-02'}

Benefits of Updating

Preserves history:

created_at shows when the memory was first stored
updated_at shows when it was modified
Audit trail for compliance

Avoids conflicts:

No duplicate or contradicting memories
Single source of truth for each fact

That “no duplicates” promise comes from the inference pipeline. Keep infer=True when you rely on automatic updates. Raw imports (infer=False) skip conflict checks, so mixing the two modes for the same fact will create duplicates.

Maintains relationships:

If using graph memory, connections to other entities persist

Pick the right inference mode

Mode	What it does	Best for	Watch out for
`infer=True` (default)	Runs the LLM pipeline so Mem0 extracts structured facts and resolves conflicts automatically.	Daily conversations, preference tracking, anything you want deduped.	Slightly slower because inference runs on every write.
`infer=False`	Stores your payload exactly as-is—no inference, no dedupe.	Bulk imports, compliance snapshots, curated facts you already trust.	Later `infer=True` calls for the same fact will create duplicates you must clean manually.

Stay consistent per data source. If you need both behaviors, keep them in separate scopes (e.g., different app_id or run_id) so you always know which memories are inferred vs direct imports.

Update vs Delete

When should you update vs delete?

Update when:

Information changes but remains relevant
You need audit history
The memory has relationships to other data

# Medication dosage changed
client.update(
    memory_id=med_id,
    text="Taking Lisinopril 20mg daily (increased from 10mg on March 1st)"
)

Delete when:

Information was completely wrong
Memory is no longer relevant
Duplicate entry

# Duplicate entry
client.delete(memory_id)

Putting It Together

Here’s a complete ingestion pipeline with all controls:

from mem0 import MemoryClient
import os

# Initialize client
client = MemoryClient(api_key=os.getenv("MEM0_API_KEY"))

# Configure custom instructions
client.project.update(
    custom_instructions="""
Medical memory assistant rules:

STORE:
- Confirmed diagnoses (with doctor name and date)
- Verified allergies (with reaction details)
- Current medications (with dosage)

IGNORE:
- Speculation (might, maybe, possibly)
- Unverified symptoms
- Personal identifiers (SSN, insurance numbers)

CONFIDENCE:
Require high confidence. Reject vague or uncertain statements.
Require specific details: names, dates, dosages.
"""
)

# Helper function for safe ingestion
def add_medical_memory(content, user_id, metadata=None):
    """Add memory with automatic filtering."""
    result = client.add(
        [{"role": "user", "content": content}],
        user_id=user_id,
        metadata=metadata or {}
    )

    if result['results']:
        print(f"✓ Stored: {result['results'][0]['memory']}")
    else:
        print(f"✗ Filtered: {content}")

    return result

# Test cases
print("Testing ingestion pipeline:\\n")

test_cases = [
    "I think I might be allergic to penicillin",
    "Dr. Johnson confirmed penicillin allergy on Jan 15th with hives reaction",
    "Patient SSN is 123-45-6789",
    "Currently taking Lisinopril 10mg daily for hypertension",
    "Feeling tired lately",
    "Dr. Martinez diagnosed Type 2 diabetes on February 3rd, 2025"
]

for content in test_cases:
    add_medical_memory(content, user_id="patient_123")
    print()

Output:

Testing ingestion pipeline:

✗ Filtered: I think I might be allergic to penicillin

✓ Stored: Patient has confirmed penicillin allergy diagnosed by Dr. Johnson on January 15th with hives reaction

✗ Filtered: Patient SSN is 123-45-6789

✓ Stored: Patient is currently taking Lisinopril 10mg daily for hypertension

✗ Filtered: Feeling tired lately

✓ Stored: Patient diagnosed with Type 2 diabetes by Dr. Martinez on February 3rd, 2025

Per-Call Instructions

You can override project-level instructions for specific conversations: First define custom instructions

custom_instructions="""Emergency intake mode:Store ALL symptoms and observations immediately.
Flag for later review and verification."""
 

# Emergency intake - store everything temporarily
emergency_messages = [
    {"role": "user", "content": "Patient arrived with chest pain and shortness of breath"}
]

client.add(
    emergency_messages,
    user_id="patient_456",
    custom_instructions=custom_instructions,
    metadata={"type": "emergency", "review_required": True}
)

This is useful for:

Different conversation types (emergency vs routine)
Channel-specific rules (phone vs in-person)
Temporary data collection that needs review

What You Built

You now have a medical assistant with production-grade memory controls:

Custom instructions - Filter speculation and enforce confirmed facts only
Confidence thresholds - Gate extractions below 0.7 confidence score
Memory updates - Modify stored information without creating duplicates
Per-call instructions - Apply temporary rules for specific conversations
PII filtering - Block sensitive data (SSNs, insurance numbers) automatically

These controls prevent retrieval failures and ensure your AI assistant works with reliable, verified information.

Summary

Start with conservative filters (only store confirmed facts) and iterate based on your application’s needs. Combine custom instructions with confidence thresholds for the most reliable memory ingestion pipeline.

Expire Short-Term Data

Automatically clean up session context before it clutters retrieval.

Choose Your Memory Architecture

Learn when to layer graph memory alongside vectors for multi-hop queries.

Getting Started

Essentials

Companion Playbooks

Ops & Automations

Integrations & Platforms

Frameworks & Multimodal

Control Memory Ingestion

Overview

Setup

The Problem

Custom Instructions

Designing Custom Instructions

Confidence Thresholds

Setting Thresholds

Filtering Sensitive Information

Updating Memories

Benefits of Updating

Pick the right inference mode

Update vs Delete

Update when:

Delete when:

Putting It Together

Per-Call Instructions

What You Built

Summary

Expire Short-Term Data

Choose Your Memory Architecture

Getting Started

Essentials

Companion Playbooks

Ops & Automations

Integrations & Platforms

Frameworks & Multimodal

​Overview

​Setup

​The Problem

​Custom Instructions

​Designing Custom Instructions

​Confidence Thresholds

​Setting Thresholds

​Filtering Sensitive Information

​Updating Memories

​Benefits of Updating

​Pick the right inference mode

​Update vs Delete

​Update when:

​Delete when:

​Putting It Together

​Per-Call Instructions

​What You Built

​Summary

Expire Short-Term Data

Choose Your Memory Architecture

Overview

Setup

The Problem

Custom Instructions

Designing Custom Instructions

Confidence Thresholds

Setting Thresholds

Filtering Sensitive Information

Updating Memories

Benefits of Updating

Pick the right inference mode

Update vs Delete

Update when:

Delete when:

Putting It Together

Per-Call Instructions

What You Built

Summary