Episodic Memory
Episodic memory automatically manages conversation history to stay within LLM token limits while preserving important information through intelligent summarization. This system handles context compaction transparently, ensuring conversations remain coherent without manual intervention.
Overview
Large Language Models have fixed context windows (e.g., 8K, 32K, 128K tokens). Long conversations can exceed these limits, causing truncation or errors. Episodic memory solves this by:
- Monitoring conversation length
- Summarizing older exchanges when thresholds are reached
- Keeping recent messages in full detail
- Storing summaries as “episodic memory” for continuity
Configuration
Episodic memory is controlled by parameters in config.csv:
name,value
episodic-memory-enabled,true
episodic-memory-threshold,4
episodic-memory-history,2
episodic-memory-model,fast
episodic-memory-max-episodes,100
episodic-memory-retention-days,365
episodic-memory-auto-summarize,true
Parameter Reference
| Parameter | Default | Type | Description |
|---|---|---|---|
episodic-memory-enabled | true | Boolean | Enable/disable episodic memory system |
episodic-memory-threshold | 4 | Integer | Number of exchanges before compaction triggers |
episodic-memory-history | 2 | Integer | Recent exchanges to keep in full detail |
episodic-memory-model | fast | String | Model for generating summaries (fast, quality, or model name) |
episodic-memory-max-episodes | 100 | Integer | Maximum episode summaries per user |
episodic-memory-retention-days | 365 | Integer | Days to retain episode summaries |
episodic-memory-auto-summarize | true | Boolean | Automatically summarize when threshold reached |
How It Works
Context Compaction Process
- Monitor: System tracks message count since last summary
- Trigger: When count reaches
episodic-memory-threshold, compaction starts - Summarize: Older messages are summarized using the configured LLM
- Preserve: Last
episodic-memory-historyexchanges remain in full - Store: Summary saved with role “episodic” for future context
Example Timeline
With defaults (episodic-memory-threshold=4, episodic-memory-history=2):
| Exchange | Action | Context State |
|---|---|---|
| 1-2 | Normal | Messages 1-2 in full |
| 3-4 | Normal | Messages 1-4 in full |
| 5 | Compaction | Summary of 1-2 + Messages 3-5 in full |
| 6-7 | Normal | Summary + Messages 3-7 in full |
| 8 | Compaction | Summary of 1-5 + Messages 6-8 in full |
Automatic Behavior
The system automatically:
- Tracks conversation length
- Triggers compaction when exchanges exceed
episodic-memory-threshold - Summarizes older messages using the configured LLM
- Keeps only the last
episodic-memory-historyexchanges in full - Stores the summary as an “episodic memory” for future context
The scheduler runs every 60 seconds, checking all active sessions and processing those that exceed the threshold.
Tuning Guidelines
High-Context Conversations
For complex discussions requiring more history:
name,value
episodic-memory-history,5
episodic-memory-threshold,10
Token-Constrained Environments
For smaller context windows or cost optimization:
name,value
episodic-memory-history,1
episodic-memory-threshold,2
Disable Compaction
Set threshold to 0 to disable automatic compaction:
name,value
episodic-memory-threshold,0
Extended Retention
For long-term memory across sessions:
name,value
episodic-memory-max-episodes,500
episodic-memory-retention-days,730
Use Case Recommendations
| Use Case | History | Threshold | Rationale |
|---|---|---|---|
| FAQ Bot | 1 | 2 | Questions are independent |
| Customer Support | 2 | 4 | Some context needed |
| Technical Discussion | 4 | 8 | Complex topics require history |
| Therapy/Coaching | 5 | 10 | Continuity is critical |
| Long-term Assistant | 3 | 6 | Balance memory and context |
Token Savings
Compaction significantly reduces token usage:
| Scenario | Without Compaction | With Compaction | Savings |
|---|---|---|---|
| 10 exchanges | ~5,000 tokens | ~2,000 tokens | 60% |
| 20 exchanges | ~10,000 tokens | ~3,000 tokens | 70% |
| 50 exchanges | ~25,000 tokens | ~5,000 tokens | 80% |
Actual savings depend on message length and summary quality.
Summary Storage
Summaries are stored with special role identifiers:
- Role
episodicorcompactmarks summary messages - Summaries include key points from compacted exchanges
- Original messages are not deleted, just excluded from active context
- Episodes are searchable for context retrieval across sessions
Benefits
- Automatic management - No manual intervention needed
- Token efficiency - Stay within model context limits
- Context preservation - Important information kept via summaries
- Relevant context - Recent exchanges kept in full detail
- Cost savings - Fewer tokens = lower API costs
- Long-term memory - Episode storage enables recall across sessions
Interaction with Caching
Episodic memory works alongside semantic caching:
- Caching: Reuses responses for similar queries (see Semantic Caching)
- Episodic Memory: Manages conversation length over time
Both features reduce costs and improve performance independently.
Best Practices
- Start with defaults - Work well for most use cases
- Monitor token usage - Adjust if hitting context limits
- Consider conversation type - Support vs complex discussion
- Test different values - Find optimal balance for your users
- Set retention appropriately - Balance memory vs privacy requirements
Troubleshooting
| Issue | Cause | Solution |
|---|---|---|
| Context too long | Threshold too high | Lower episodic-memory-threshold |
| Lost context | History too low | Increase episodic-memory-history |
| Summaries missing info | Model limitations | Use quality instead of fast |
| No compaction occurring | Threshold is 0 or disabled | Set positive threshold, enable feature |
| Old episodes not deleted | Retention too long | Lower episodic-memory-retention-days |
See Also
- Semantic Caching - Response caching system
- Configuration Parameters - Full parameter reference
- LLM Configuration - Model settings