Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Configuration Parameters

Complete reference of all available parameters in config.csv.

Server Parameters

Web Server

ParameterDescriptionDefaultType
server-hostServer bind address0.0.0.0IP address
server-portServer listen port8080Number (1-65535)
sites-rootGenerated sites directory/tmpPath

MCP Server

ParameterDescriptionDefaultType
mcp-serverEnable MCP protocol serverfalseBoolean

LLM Parameters

Core LLM Settings

ParameterDescriptionDefaultType
llm-keyAPI key for LLM servicenoneString
llm-urlLLM service endpointhttp://localhost:8081URL
llm-modelModel path or identifierRequiredPath/String
llm-modelsAvailable model aliases for routingdefaultSemicolon-separated

LLM Cache

ParameterDescriptionDefaultType
llm-cacheEnable response cachingfalseBoolean
llm-cache-ttlCache time-to-live3600Seconds
llm-cache-semanticSemantic similarity cachetrueBoolean
llm-cache-thresholdSimilarity threshold0.95Float (0-1)

Embedded LLM Server

ParameterDescriptionDefaultType
llm-serverRun embedded serverfalseBoolean
llm-server-pathServer binary pathbotserver-stack/bin/llm/build/binPath
llm-server-hostServer bind address0.0.0.0IP address
llm-server-portServer port8081Number
llm-server-gpu-layersGPU offload layers0Number
llm-server-n-moeMoE experts count0Number
llm-server-ctx-sizeContext size4096Tokens
llm-server-n-predictMax predictions1024Tokens
llm-server-parallelParallel requests6Number
llm-server-cont-batchingContinuous batchingtrueBoolean
llm-server-mlockLock in memoryfalseBoolean
llm-server-no-mmapDisable mmapfalseBoolean
llm-server-reasoning-formatReasoning output format for llama.cppnoneString

Hardware-Specific LLM Tuning

For RTX 3090 (24GB VRAM)

You can run impressive models with proper configuration:

  • DeepSeek-R3-Distill-Qwen-7B: Set llm-server-gpu-layers to 35-40
  • Qwen2.5-32B-Instruct (Q4_K_M): Fits with llm-server-gpu-layers to 40-45
  • DeepSeek-V3 (with MoE): Set llm-server-n-moe to 2-4 to run even 120B models! MoE only loads active experts
  • Optimization: Use llm-server-ctx-size of 8192 for longer contexts

For RTX 4070/4070Ti (12-16GB VRAM)

Mid-range cards work great with quantized models:

  • Qwen2.5-14B (Q4_K_M): Set llm-server-gpu-layers to 25-30
  • DeepSeek-R3-Distill-Llama-8B: Fully fits with layers at 32
  • Tips: Keep llm-server-ctx-size at 4096 to save VRAM

For CPU-Only (No GPU)

Modern CPUs can still run capable models:

  • DeepSeek-R3-Distill-Qwen-1.5B: Fast on CPU, great for testing
  • Phi-3-mini (3.8B): Excellent CPU performance
  • Settings: Set llm-server-mlock to true to prevent swapping
  • Parallel: Increase llm-server-parallel to CPU cores -2
  • Best Overall: DeepSeek-R3-Distill series (1.5B to 70B)
  • Best Small: Qwen2.5-3B-Instruct-Q5_K_M
  • Best Medium: DeepSeek-R3-Distill-Qwen-14B-Q4_K_M
  • Best Large: DeepSeek-V3, Qwen2.5-32B, or GPT2-120B-GGUF (with MoE enabled)

Pro Tip: The llm-server-n-moe parameter is magic for large models - it enables Mixture of Experts, letting you run 120B+ models on consumer hardware by only loading the experts needed for each token!

Local vs Cloud: A Practical Note

General Bots excels at local deployment - you own your hardware, your data stays private, and there are no recurring costs. However, if you need cloud inference:

Groq is the speed champion - They use custom LPU (Language Processing Unit) chips instead of GPUs, delivering 10x faster inference than traditional cloud providers. Their hardware is purpose-built for transformers, avoiding the general-purpose overhead of NVIDIA GPUs.

This isn’t about market competition - it’s about architecture. NVIDIA GPUs are designed for many tasks, while Groq’s chips do one thing incredibly well: transformer inference. If speed matters and you’re using cloud, Groq is currently the fastest option available.

For local deployment, stick with General Bots and the configurations above. For cloud bursts or when you need extreme speed, consider Groq’s API with these settings:

llm-url,https://api.groq.com/openai/v1
llm-key,your-groq-api-key
llm-model,mixtral-8x7b-32768

Embedding Parameters

ParameterDescriptionDefaultType
embedding-urlEmbedding service endpointhttp://localhost:8082URL
embedding-modelEmbedding model pathRequired for KBPath

Email Parameters

ParameterDescriptionDefaultType
email-fromSender addressRequired for emailEmail
email-serverSMTP hostnameRequired for emailHostname
email-portSMTP port587Number
email-userSMTP usernameRequired for emailString
email-passSMTP passwordRequired for emailString
email-read-pixelEnable read tracking pixel in HTML emailsfalseBoolean

Email Read Tracking

When email-read-pixel is enabled, a 1x1 transparent tracking pixel is automatically injected into HTML emails sent via the API. This allows you to:

  • Track when emails are opened
  • See how many times an email was opened
  • Get the approximate location (IP) and device (user agent) of the reader

API Endpoints for tracking:

EndpointMethodDescription
/api/email/tracking/pixel/{tracking_id}GETServes the tracking pixel (called by email client)
/api/email/tracking/status/{tracking_id}GETGet read status for a specific email
/api/email/tracking/listGETList all sent emails with tracking status
/api/email/tracking/statsGETGet overall tracking statistics

Example configuration:

email-read-pixel,true
server-url,https://yourdomain.com

Note: The server-url parameter is used to generate the tracking pixel URL. Make sure it’s accessible from the recipient’s email client.

Privacy considerations: Email tracking should be used responsibly. Consider disclosing tracking in your email footer for transparency.

Theme Parameters

ParameterDescriptionDefaultType
theme-color1Primary colorNot setHex color
theme-color2Secondary colorNot setHex color
theme-logoLogo URLNot setURL
theme-titleBot display titleNot setString
bot-nameBot display nameNot setString
welcome-messageInitial greeting messageNot setString

Custom Database Parameters

These parameters configure external database connections for use with BASIC keywords like MariaDB/MySQL connections.

ParameterDescriptionDefaultType
custom-serverDatabase server hostnamelocalhostHostname
custom-portDatabase port5432Number
custom-databaseDatabase nameNot setString
custom-usernameDatabase userNot setString
custom-passwordDatabase passwordNot setString

Website Crawling Parameters

ParameterDescriptionDefaultType
website-expiresCache expiration for crawled content1dDuration
website-max-depthMaximum crawl depth3Number
website-max-pagesMaximum pages to crawl100Number

Image Generator Parameters

ParameterDescriptionDefaultType
image-generator-modelDiffusion model pathNot setPath
image-generator-stepsInference steps4Number
image-generator-widthOutput width512Pixels
image-generator-heightOutput height512Pixels
image-generator-gpu-layersGPU offload layers20Number
image-generator-batch-sizeBatch size1Number

Video Generator Parameters

ParameterDescriptionDefaultType
video-generator-modelVideo model pathNot setPath
video-generator-framesFrames to generate24Number
video-generator-fpsFrames per second8Number
video-generator-widthOutput width320Pixels
video-generator-heightOutput height576Pixels
video-generator-gpu-layersGPU offload layers15Number
video-generator-batch-sizeBatch size1Number

BotModels Service Parameters

ParameterDescriptionDefaultType
botmodels-enabledEnable BotModels servicetrueBoolean
botmodels-hostBotModels bind address0.0.0.0IP address
botmodels-portBotModels port8085Number

Generator Parameters

ParameterDescriptionDefaultType
default-generatorDefault content generatorallString

Teams Channel Parameters

ParameterDescriptionDefaultType
teams-app-idMicrosoft Teams App IDNot setString
teams-app-passwordMicrosoft Teams App PasswordNot setString
teams-tenant-idMicrosoft Teams Tenant IDNot setString
teams-bot-idMicrosoft Teams Bot IDNot setString

SMS Parameters

ParameterDescriptionDefaultType
sms-providerSMS provider nameNot setString

Example: MariaDB Connection

custom-server,db.example.com
custom-port,3306
custom-database,myapp
custom-username,botuser
custom-password,secretpass

Multi-Agent Parameters

Agent-to-Agent (A2A) Communication

ParameterDescriptionDefaultType
a2a-enabledEnable agent-to-agent communicationtrueBoolean
a2a-timeoutDefault delegation timeout30Seconds
a2a-max-hopsMaximum delegation chain depth5Number
a2a-retry-countRetry attempts on failure3Number
a2a-queue-sizeMaximum pending messages100Number
a2a-protocol-versionA2A protocol version1.0String
a2a-persist-messagesPersist A2A messages to databasefalseBoolean

Bot Reflection

ParameterDescriptionDefaultType
bot-reflection-enabledEnable bot self-analysistrueBoolean
bot-reflection-intervalMessages between reflections10Number
bot-reflection-promptCustom reflection prompt(none)String
bot-reflection-typesReflection types to performConversationQualitySemicolon-separated
bot-improvement-auto-applyAuto-apply suggested improvementsfalseBoolean
bot-improvement-thresholdScore threshold for improvements (0-10)6.0Float

Reflection Types

Available values for bot-reflection-types:

  • ConversationQuality - Analyze conversation quality and user satisfaction
  • ResponseAccuracy - Analyze response accuracy and relevance
  • ToolUsage - Analyze tool usage effectiveness
  • KnowledgeRetrieval - Analyze knowledge retrieval performance
  • Performance - Analyze overall bot performance

Example:

bot-reflection-enabled,true
bot-reflection-interval,10
bot-reflection-types,ConversationQuality;ResponseAccuracy;ToolUsage
bot-improvement-auto-apply,false
bot-improvement-threshold,7.0

Memory Parameters

User Memory (Cross-Bot)

ParameterDescriptionDefaultType
user-memory-enabledEnable user-level memorytrueBoolean
user-memory-max-keysMaximum keys per user1000Number
user-memory-default-ttlDefault time-to-live (0=no expiry)0Seconds

Episodic Memory (Context Compaction)

ParameterDescriptionDefaultType
episodic-memory-enabledEnable episodic memory systemtrueBoolean
episodic-memory-thresholdExchanges before compaction triggers4Number
episodic-memory-historyRecent exchanges to keep in full2Number
episodic-memory-modelModel for summarizationfastString
episodic-memory-max-episodesMaximum episodes per user100Number
episodic-memory-retention-daysDays to retain episodes365Number
episodic-memory-auto-summarizeEnable automatic summarizationtrueBoolean

Episodic memory automatically manages conversation context to stay within LLM token limits. When conversation exchanges exceed episodic-memory-threshold, older messages are summarized and only the last episodic-memory-history exchanges are kept in full. See Chapter 03 - Episodic Memory for details.

Model Routing Parameters

These parameters configure multi-model routing for different task types. Requires multiple llama.cpp server instances.

ParameterDescriptionDefaultType
llm-modelsAvailable model aliasesdefaultSemicolon-separated
model-routing-strategyRouting strategy (manual/auto/load-balanced/fallback)autoString
model-defaultDefault model aliasdefaultString
model-fastModel for fast/simple tasks(configured)Path/String
model-qualityModel for quality/complex tasks(configured)Path/String
model-codeModel for code generation(configured)Path/String
model-fallback-enabledEnable automatic fallbacktrueBoolean
model-fallback-orderOrder to try on failurequality,fast,localComma-separated

Multi-Model Example

llm-models,default;fast;quality;code
llm-url,http://localhost:8081
model-routing-strategy,auto
model-default,fast
model-fallback-enabled,true
model-fallback-order,quality,fast

Hybrid RAG Search Parameters

General Bots uses hybrid search combining dense (embedding) and sparse (BM25 keyword) search for optimal retrieval. The BM25 implementation is powered by Tantivy, a full-text search engine library similar to Apache Lucene.

ParameterDescriptionDefaultType
rag-hybrid-enabledEnable hybrid dense+sparse searchtrueBoolean
rag-dense-weightWeight for semantic results0.7Float (0-1)
rag-sparse-weightWeight for keyword results0.3Float (0-1)
rag-reranker-enabledEnable LLM rerankingfalseBoolean
rag-reranker-modelModel for rerankingcross-encoder/ms-marco-MiniLM-L-6-v2String
rag-reranker-top-nCandidates for reranking20Number
rag-max-resultsMaximum results to return10Number
rag-min-scoreMinimum relevance score threshold0.0Float (0-1)
rag-rrf-kRRF smoothing constant60Number
rag-cache-enabledEnable search result cachingtrueBoolean
rag-cache-ttlCache time-to-live3600Seconds

BM25 Sparse Search (Tantivy)

BM25 is a keyword-based ranking algorithm that excels at finding exact term matches. It’s powered by Tantivy when the vectordb feature is enabled.

ParameterDescriptionDefaultType
bm25-enabledEnable/disable BM25 sparse searchtrueBoolean
bm25-k1Term frequency saturation (0.5-3.0 typical)1.2Float
bm25-bDocument length normalization (0.0-1.0)0.75Float
bm25-stemmingApply word stemming (running→run)trueBoolean
bm25-stopwordsFilter common words (the, a, is)trueBoolean

Switching Search Modes

Hybrid Search (Default - Best for most use cases)

bm25-enabled,true
rag-dense-weight,0.7
rag-sparse-weight,0.3

Uses both semantic understanding AND keyword matching. Best for general queries.

Dense Only (Semantic Search)

bm25-enabled,false
rag-dense-weight,1.0
rag-sparse-weight,0.0

Uses only embedding-based search. Faster, good for conceptual/semantic queries where exact words don’t matter.

Sparse Only (Keyword Search)

bm25-enabled,true
rag-dense-weight,0.0
rag-sparse-weight,1.0

Uses only BM25 keyword matching. Good for exact term searches, technical documentation, or when embeddings aren’t available.

BM25 Parameter Tuning

The k1 and b parameters control BM25 behavior:

  • bm25-k1 (Term Saturation): Controls how much additional term occurrences contribute to the score

    • Lower values (0.5-1.0): Diminishing returns for repeated terms
    • Higher values (1.5-2.0): More weight to documents with many term occurrences
    • Default 1.2 works well for most content
  • bm25-b (Length Normalization): Controls document length penalty

    • 0.0: No length penalty (long documents scored equally)
    • 1.0: Full length normalization (strongly penalizes long documents)
    • Default 0.75 balances length fairness

Tuning for specific content:

# For short documents (tweets, titles)
bm25-b,0.3

# For long documents (articles, manuals)
bm25-b,0.9

# For code search (exact matches important)
bm25-k1,1.5
bm25-stemming,false

Code Sandbox Parameters

ParameterDescriptionDefaultType
sandbox-enabledEnable code sandboxtrueBoolean
sandbox-runtimeIsolation backend (lxc/docker/firecracker/process)lxcString
sandbox-timeoutMaximum execution time30Seconds
sandbox-memory-mbMemory limit in megabytes256MB
sandbox-cpu-percentCPU usage limit50Percent
sandbox-networkAllow network accessfalseBoolean
sandbox-python-packagesPre-installed Python packages(none)Comma-separated
sandbox-allowed-pathsAccessible filesystem paths/data,/tmpComma-separated

Example: Python Sandbox

sandbox-enabled,true
sandbox-runtime,lxc
sandbox-timeout,60
sandbox-memory-mb,512
sandbox-cpu-percent,75
sandbox-network,false
sandbox-python-packages,numpy,pandas,requests,matplotlib
sandbox-allowed-paths,/data,/tmp,/uploads

SSE Streaming Parameters

ParameterDescriptionDefaultType
sse-enabledEnable Server-Sent EventstrueBoolean
sse-heartbeatHeartbeat interval30Seconds
sse-max-connectionsMaximum concurrent connections1000Number

Parameter Types

Boolean

Values: true or false (case-sensitive)

Number

Integer values, must be within valid ranges:

  • Ports: 1-65535
  • Tokens: Positive integers
  • Percentages: 0-100

Float

Decimal values:

  • Thresholds: 0.0 to 1.0
  • Weights: 0.0 to 1.0

Path

File system paths:

  • Relative: ../../../../data/model.gguf
  • Absolute: /opt/models/model.gguf

URL

Valid URLs:

  • HTTP: http://localhost:8081
  • HTTPS: https://api.example.com

String

Any text value (no quotes needed in CSV)

Email

Valid email format: user@domain.com

Hex Color

HTML color codes: #RRGGBB format

Semicolon-separated

Multiple values separated by semicolons: value1;value2;value3

Comma-separated

Multiple values separated by commas: value1,value2,value3

Required vs Optional

Always Required

  • None - all parameters have defaults or are optional

Required for Features

  • LLM: llm-model must be set
  • Email: email-from, email-server, email-user
  • Embeddings: embedding-model for knowledge base
  • Custom DB: custom-database if using external database

Configuration Precedence

  1. Built-in defaults (hardcoded)
  2. config.csv values (override defaults)
  3. Environment variables (if implemented, override config)

Special Values

  • none - Explicitly no value (for llm-key)
  • Empty string - Unset/use default
  • false - Feature disabled
  • true - Feature enabled

Performance Tuning

For Local Models

llm-server-ctx-size,8192
llm-server-n-predict,2048
llm-server-parallel,4
llm-cache,true
llm-cache-ttl,7200

For Production

llm-server-cont-batching,true
llm-cache-semantic,true
llm-cache-threshold,0.90
llm-server-parallel,8
sse-max-connections,5000

For Low Memory

llm-server-ctx-size,2048
llm-server-n-predict,512
llm-server-mlock,false
llm-server-no-mmap,false
llm-cache,false
sandbox-memory-mb,128

For Multi-Agent Systems

a2a-enabled,true
a2a-timeout,30
a2a-max-hops,5
a2a-retry-count,3
a2a-persist-messages,true
bot-reflection-enabled,true
bot-reflection-interval,10
user-memory-enabled,true

For Hybrid RAG

rag-hybrid-enabled,true
rag-dense-weight,0.7
rag-sparse-weight,0.3
rag-reranker-enabled,true
rag-max-results,10
rag-min-score,0.3
rag-cache-enabled,true
bm25-enabled,true
bm25-k1,1.2
bm25-b,0.75

For Dense-Only Search (Faster)

bm25-enabled,false
rag-dense-weight,1.0
rag-sparse-weight,0.0
rag-max-results,10

For Code Execution

sandbox-enabled,true
sandbox-runtime,lxc
sandbox-timeout,30
sandbox-memory-mb,512
sandbox-network,false
sandbox-python-packages,numpy,pandas,requests

Validation Rules

  1. Paths: Model files must exist
  2. URLs: Must be valid format
  3. Ports: Must be 1-65535
  4. Emails: Must contain @ and domain
  5. Colors: Must be valid hex format
  6. Booleans: Exactly true or false
  7. Weights: Must sum to 1.0 (e.g., rag-dense-weight + rag-sparse-weight)