Episodic Memory

Episodic memory automatically manages conversation history to stay within LLM token limits while preserving important information through intelligent summarization. This system handles context compaction transparently, ensuring conversations remain coherent without manual intervention.

Overview

Large Language Models have fixed context windows (e.g., 8K, 32K, 128K tokens). Long conversations can exceed these limits, causing truncation or errors. Episodic memory solves this by:

Monitoring conversation length
Summarizing older exchanges when thresholds are reached
Keeping recent messages in full detail
Storing summaries as “episodic memory” for continuity

Configuration

Episodic memory is controlled by parameters in config.csv:

name,value
episodic-memory-enabled,true
episodic-memory-threshold,4
episodic-memory-history,2
episodic-memory-model,fast
episodic-memory-max-episodes,100
episodic-memory-retention-days,365
episodic-memory-auto-summarize,true

Parameter Reference

Parameter	Default	Type	Description
`episodic-memory-enabled`	`true`	Boolean	Enable/disable episodic memory system
`episodic-memory-threshold`	`4`	Integer	Number of exchanges before compaction triggers
`episodic-memory-history`	`2`	Integer	Recent exchanges to keep in full detail
`episodic-memory-model`	`fast`	String	Model for generating summaries (`fast`, `quality`, or model name)
`episodic-memory-max-episodes`	`100`	Integer	Maximum episode summaries per user
`episodic-memory-retention-days`	`365`	Integer	Days to retain episode summaries
`episodic-memory-auto-summarize`	`true`	Boolean	Automatically summarize when threshold reached

How It Works

Context Compaction Process

Monitor: System tracks message count since last summary
Trigger: When count reaches episodic-memory-threshold, compaction starts
Summarize: Older messages are summarized using the configured LLM
Preserve: Last episodic-memory-history exchanges remain in full
Store: Summary saved with role “episodic” for future context

Example Timeline

With defaults (episodic-memory-threshold=4, episodic-memory-history=2):

Exchange	Action	Context State
1-2	Normal	Messages 1-2 in full
3-4	Normal	Messages 1-4 in full
5	Compaction	Summary of 1-2 + Messages 3-5 in full
6-7	Normal	Summary + Messages 3-7 in full
8	Compaction	Summary of 1-5 + Messages 6-8 in full

Automatic Behavior

The system automatically:

Tracks conversation length
Triggers compaction when exchanges exceed episodic-memory-threshold
Summarizes older messages using the configured LLM
Keeps only the last episodic-memory-history exchanges in full
Stores the summary as an “episodic memory” for future context

The scheduler runs every 60 seconds, checking all active sessions and processing those that exceed the threshold.

Tuning Guidelines

High-Context Conversations

For complex discussions requiring more history:

name,value
episodic-memory-history,5
episodic-memory-threshold,10

Token-Constrained Environments

For smaller context windows or cost optimization:

name,value
episodic-memory-history,1
episodic-memory-threshold,2

Disable Compaction

Set threshold to 0 to disable automatic compaction:

name,value
episodic-memory-threshold,0

Extended Retention

For long-term memory across sessions:

name,value
episodic-memory-max-episodes,500
episodic-memory-retention-days,730

Use Case Recommendations

Use Case	History	Threshold	Rationale
FAQ Bot	1	2	Questions are independent
Customer Support	2	4	Some context needed
Technical Discussion	4	8	Complex topics require history
Therapy/Coaching	5	10	Continuity is critical
Long-term Assistant	3	6	Balance memory and context

Token Savings

Compaction significantly reduces token usage:

Scenario	Without Compaction	With Compaction	Savings
10 exchanges	~5,000 tokens	~2,000 tokens	60%
20 exchanges	~10,000 tokens	~3,000 tokens	70%
50 exchanges	~25,000 tokens	~5,000 tokens	80%

Actual savings depend on message length and summary quality.

Summary Storage

Summaries are stored with special role identifiers:

Role episodic or compact marks summary messages
Summaries include key points from compacted exchanges
Original messages are not deleted, just excluded from active context
Episodes are searchable for context retrieval across sessions

Benefits

Automatic management - No manual intervention needed
Token efficiency - Stay within model context limits
Context preservation - Important information kept via summaries
Relevant context - Recent exchanges kept in full detail
Cost savings - Fewer tokens = lower API costs
Long-term memory - Episode storage enables recall across sessions

Interaction with Caching

Episodic memory works alongside semantic caching:

Caching: Reuses responses for similar queries (see Semantic Caching)
Episodic Memory: Manages conversation length over time

Both features reduce costs and improve performance independently.

Best Practices

Start with defaults - Work well for most use cases
Monitor token usage - Adjust if hitting context limits
Consider conversation type - Support vs complex discussion
Test different values - Find optimal balance for your users
Set retention appropriately - Balance memory vs privacy requirements

Troubleshooting

Issue	Cause	Solution
Context too long	Threshold too high	Lower `episodic-memory-threshold`
Lost context	History too low	Increase `episodic-memory-history`
Summaries missing info	Model limitations	Use `quality` instead of `fast`
No compaction occurring	Threshold is 0 or disabled	Set positive threshold, enable feature
Old episodes not deleted	Retention too long	Lower `episodic-memory-retention-days`

General Bots Documentation