Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Episodic Memory

Episodic memory automatically manages conversation history to stay within LLM token limits while preserving important information through intelligent summarization. This system handles context compaction transparently, ensuring conversations remain coherent without manual intervention.

Overview

Large Language Models have fixed context windows (e.g., 8K, 32K, 128K tokens). Long conversations can exceed these limits, causing truncation or errors. Episodic memory solves this by:

  1. Monitoring conversation length
  2. Summarizing older exchanges when thresholds are reached
  3. Keeping recent messages in full detail
  4. Storing summaries as “episodic memory” for continuity

Configuration

Episodic memory is controlled by parameters in config.csv:

name,value
episodic-memory-enabled,true
episodic-memory-threshold,4
episodic-memory-history,2
episodic-memory-model,fast
episodic-memory-max-episodes,100
episodic-memory-retention-days,365
episodic-memory-auto-summarize,true

Parameter Reference

ParameterDefaultTypeDescription
episodic-memory-enabledtrueBooleanEnable/disable episodic memory system
episodic-memory-threshold4IntegerNumber of exchanges before compaction triggers
episodic-memory-history2IntegerRecent exchanges to keep in full detail
episodic-memory-modelfastStringModel for generating summaries (fast, quality, or model name)
episodic-memory-max-episodes100IntegerMaximum episode summaries per user
episodic-memory-retention-days365IntegerDays to retain episode summaries
episodic-memory-auto-summarizetrueBooleanAutomatically summarize when threshold reached

How It Works

Context Compaction Process

  1. Monitor: System tracks message count since last summary
  2. Trigger: When count reaches episodic-memory-threshold, compaction starts
  3. Summarize: Older messages are summarized using the configured LLM
  4. Preserve: Last episodic-memory-history exchanges remain in full
  5. Store: Summary saved with role “episodic” for future context

Example Timeline

With defaults (episodic-memory-threshold=4, episodic-memory-history=2):

ExchangeActionContext State
1-2NormalMessages 1-2 in full
3-4NormalMessages 1-4 in full
5CompactionSummary of 1-2 + Messages 3-5 in full
6-7NormalSummary + Messages 3-7 in full
8CompactionSummary of 1-5 + Messages 6-8 in full

Automatic Behavior

The system automatically:

  1. Tracks conversation length
  2. Triggers compaction when exchanges exceed episodic-memory-threshold
  3. Summarizes older messages using the configured LLM
  4. Keeps only the last episodic-memory-history exchanges in full
  5. Stores the summary as an “episodic memory” for future context

The scheduler runs every 60 seconds, checking all active sessions and processing those that exceed the threshold.

Tuning Guidelines

High-Context Conversations

For complex discussions requiring more history:

name,value
episodic-memory-history,5
episodic-memory-threshold,10

Token-Constrained Environments

For smaller context windows or cost optimization:

name,value
episodic-memory-history,1
episodic-memory-threshold,2

Disable Compaction

Set threshold to 0 to disable automatic compaction:

name,value
episodic-memory-threshold,0

Extended Retention

For long-term memory across sessions:

name,value
episodic-memory-max-episodes,500
episodic-memory-retention-days,730

Use Case Recommendations

Use CaseHistoryThresholdRationale
FAQ Bot12Questions are independent
Customer Support24Some context needed
Technical Discussion48Complex topics require history
Therapy/Coaching510Continuity is critical
Long-term Assistant36Balance memory and context

Token Savings

Compaction significantly reduces token usage:

ScenarioWithout CompactionWith CompactionSavings
10 exchanges~5,000 tokens~2,000 tokens60%
20 exchanges~10,000 tokens~3,000 tokens70%
50 exchanges~25,000 tokens~5,000 tokens80%

Actual savings depend on message length and summary quality.

Summary Storage

Summaries are stored with special role identifiers:

  • Role episodic or compact marks summary messages
  • Summaries include key points from compacted exchanges
  • Original messages are not deleted, just excluded from active context
  • Episodes are searchable for context retrieval across sessions

Benefits

  • Automatic management - No manual intervention needed
  • Token efficiency - Stay within model context limits
  • Context preservation - Important information kept via summaries
  • Relevant context - Recent exchanges kept in full detail
  • Cost savings - Fewer tokens = lower API costs
  • Long-term memory - Episode storage enables recall across sessions

Interaction with Caching

Episodic memory works alongside semantic caching:

  • Caching: Reuses responses for similar queries (see Semantic Caching)
  • Episodic Memory: Manages conversation length over time

Both features reduce costs and improve performance independently.

Best Practices

  1. Start with defaults - Work well for most use cases
  2. Monitor token usage - Adjust if hitting context limits
  3. Consider conversation type - Support vs complex discussion
  4. Test different values - Find optimal balance for your users
  5. Set retention appropriately - Balance memory vs privacy requirements

Troubleshooting

IssueCauseSolution
Context too longThreshold too highLower episodic-memory-threshold
Lost contextHistory too lowIncrease episodic-memory-history
Summaries missing infoModel limitationsUse quality instead of fast
No compaction occurringThreshold is 0 or disabledSet positive threshold, enable feature
Old episodes not deletedRetention too longLower episodic-memory-retention-days

See Also