Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

LLM Providers

General Bots supports multiple Large Language Model (LLM) providers, both cloud-based services and local deployments. This guide helps you choose the right provider for your use case.

Overview

LLMs are the intelligence behind General Bots’ conversational capabilities. You can configure:

  • Cloud Providers — External APIs (OpenAI, Anthropic, Google, etc.)
  • Local Models — Self-hosted models via llama.cpp
  • Hybrid — Use local for simple tasks, cloud for complex reasoning

Cloud Providers

OpenAI (GPT Series)

The most widely known LLM provider, offering the GPT-5 flagship model.

ModelContextBest ForSpeed
GPT-51MAll-in-one advanced reasoningMedium
GPT-oss 120B128KOpen-weight, agent workflowsMedium
GPT-oss 20B128KCost-effective open-weightFast

Configuration (config.csv):

name,value
llm-provider,openai
llm-model,gpt-5

Strengths:

  • Most advanced all-in-one model
  • Excellent general knowledge
  • Strong code generation
  • Good instruction following

Considerations:

  • API costs can add up
  • Data sent to external servers
  • Rate limits apply

Anthropic (Claude Series)

Known for safety, helpfulness, and extended thinking capabilities.

ModelContextBest ForSpeed
Claude Opus 4.5200KMost capable, complex reasoningSlow
Claude Sonnet 4.5200KBest balance of capability/speedFast

Configuration (config.csv):

name,value
llm-provider,anthropic
llm-model,claude-sonnet-4.5

Strengths:

  • Extended thinking mode for multi-step tasks
  • Excellent at following complex instructions
  • Strong coding abilities
  • Better at refusing harmful requests

Considerations:

  • Premium pricing
  • Newer provider, smaller ecosystem

Google (Gemini Series)

Google’s multimodal AI models with strong reasoning capabilities.

ModelContextBest ForSpeed
Gemini Pro2MComplex reasoning, benchmarksMedium
Gemini Flash1MFast multimodal tasksFast

Configuration (config.csv):

name,value
llm-provider,google
llm-model,gemini-pro

Strengths:

  • Largest context window (2M tokens)
  • Native multimodal (text, image, video, audio)
  • Strong at structured data
  • Good coding abilities

Considerations:

  • Some features region-limited
  • API changes more frequently

xAI (Grok Series)

Integration with real-time data from X platform.

ModelContextBest ForSpeed
Grok 4128KReal-time research, analysisFast

Configuration (config.csv):

name,value
llm-provider,xai
llm-model,grok-4

Strengths:

  • Real-time data access from X
  • Strong research and analysis
  • Good for trend analysis

Considerations:

  • Newer provider
  • X platform integration focus

Groq

Ultra-fast inference using custom LPU hardware. Offers open-source models at high speed.

ModelContextBest ForSpeed
Llama 4 Scout10MLong context, multimodalVery Fast
Llama 4 Maverick1MComplex tasksVery Fast
Qwen3128KEfficient MoE architectureExtremely Fast

Configuration (config.csv):

name,value
llm-provider,groq
llm-model,llama-4-scout

Strengths:

  • Fastest inference speeds (500+ tokens/sec)
  • Competitive pricing
  • Open-source models
  • Great for real-time applications

Considerations:

  • Rate limits on free tier
  • Models may be less capable than GPT-5/Claude

Mistral AI

European AI company offering efficient, open-weight models.

ModelContextBest ForSpeed
Mixtral-8x22B64KMulti-language, codingFast

Configuration (config.csv):

name,value
llm-provider,mistral
llm-model,mixtral-8x22b

Strengths:

  • European data sovereignty (GDPR)
  • Excellent code generation
  • Open-weight models available
  • Competitive pricing
  • Proficient in multiple languages

Considerations:

  • Smaller context than competitors
  • Less brand recognition

DeepSeek

Known for efficient, capable models with exceptional reasoning.

ModelContextBest ForSpeed
DeepSeek-V3.1128KGeneral purpose, optimized costFast
DeepSeek-R3128KReasoning, math, scienceMedium

Configuration (config.csv):

name,value
llm-provider,deepseek
llm-model,deepseek-r3
llm-server-url,https://api.deepseek.com

Strengths:

  • Extremely cost-effective
  • Strong reasoning (R1 model)
  • Rivals proprietary leaders in performance
  • Open-weight versions available (MIT/Apache 2.0)

Considerations:

  • Data processed in China
  • Newer provider

Local Models

Run models on your own hardware for privacy, cost control, and offline operation.

Setting Up Local LLM

General Bots uses llama.cpp server for local inference:

name,value
llm-provider,local
llm-server-url,http://localhost:8081
llm-model,DeepSeek-R3-Distill-Qwen-1.5B

For High-End GPU (24GB+ VRAM)

ModelSizeVRAMQuality
Llama 4 Scout 17B Q818GB24GBExcellent
Qwen3 72B Q442GB48GB+Excellent
DeepSeek-R3 32B Q420GB24GBVery Good

For Mid-Range GPU (12-16GB VRAM)

ModelSizeVRAMQuality
Qwen3 14B Q815GB16GBVery Good
GPT-oss 20B Q412GB16GBVery Good
DeepSeek-R3-Distill 14B Q48GB12GBGood
Gemma 3 27B Q416GB16GBGood

For Small GPU or CPU (8GB VRAM or less)

ModelSizeVRAMQuality
DeepSeek-R3-Distill 1.5B Q41GB4GBBasic
Gemma 2 9B Q45GB8GBAcceptable
Gemma 3 27B Q210GB8GBAcceptable

Model Download URLs

Add models to installer.rs data_download_list:

#![allow(unused)]
fn main() {
// Qwen3 14B - Recommended for mid-range GPU
"https://huggingface.co/Qwen/Qwen3-14B-GGUF/resolve/main/qwen3-14b-q4_k_m.gguf"

// DeepSeek R1 Distill - For CPU or minimal GPU
"https://huggingface.co/unsloth/DeepSeek-R3-Distill-Qwen-1.5B-GGUF/resolve/main/DeepSeek-R3-Distill-Qwen-1.5B-Q4_K_M.gguf"

// GPT-oss 20B - Good balance for agents
"https://huggingface.co/openai/gpt-oss-20b-GGUF/resolve/main/gpt-oss-20b-q4_k_m.gguf"

// Gemma 3 27B - For quality local inference
"https://huggingface.co/google/gemma-3-27b-it-GGUF/resolve/main/gemma-3-27b-it-q4_k_m.gguf"
}

Embedding Models

For vector search, you need an embedding model:

name,value
embedding-provider,local
embedding-server-url,http://localhost:8082
embedding-model,bge-small-en-v1.5

Recommended embedding models:

ModelDimensionsSizeQuality
bge-small-en-v1.5384130MBGood
bge-base-en-v1.5768440MBBetter
bge-large-en-v1.510241.3GBBest
nomic-embed-text768550MBGood

Hybrid Configuration

Use different models for different tasks:

name,value
llm-provider,anthropic
llm-model,claude-sonnet-4.5
llm-fast-provider,groq
llm-fast-model,llama-3.3-70b
llm-fallback-provider,local
llm-fallback-model,DeepSeek-R3-Distill-Qwen-1.5B
embedding-provider,local
embedding-model,bge-small-en-v1.5

Model Selection Guide

By Use Case

Use CaseRecommendedWhy
Customer supportClaude Sonnet 4.5Best at following guidelines
Code generationDeepSeek-R3, Claude Sonnet 4.5Specialized for code
Document analysisGemini Pro2M context window
Real-time chatGroq Llama 3.3Fastest responses
Privacy-sensitiveLocal DeepSeek-R3No external data transfer
Cost-sensitiveDeepSeek, Local modelsLowest cost per token
Complex reasoningClaude Opus, Gemini ProBest reasoning ability
Real-time researchGrokLive data access
Long contextGemini Pro, ClaudeLargest context windows

By Budget

BudgetRecommended Setup
FreeLocal models only
Low ($10-50/mo)Groq + Local fallback
Medium ($50-200/mo)DeepSeek-V3.1 + Claude Sonnet 4.5
High ($200+/mo)GPT-5 + Claude Opus 4.5
EnterprisePrivate deployment + premium APIs

Configuration Reference

config.csv Parameters

All LLM configuration belongs in config.csv, not environment variables:

ParameterDescriptionExample
llm-providerProvider nameopenai, anthropic, local
llm-modelModel identifiergpt-5
llm-server-urlAPI endpoint (local only)http://localhost:8081
llm-server-ctx-sizeContext window size128000
llm-temperatureResponse randomness (0-2)0.7
llm-max-tokensMaximum response length4096
llm-cache-enabledEnable semantic cachingtrue
llm-cache-ttlCache time-to-live (seconds)3600

API Keys

API keys are stored in Vault, not in config files or environment variables:

# Store API key in Vault
vault kv put gbo/llm/openai api_key="sk-..."
vault kv put gbo/llm/anthropic api_key="sk-ant-..."
vault kv put gbo/llm/google api_key="AIza..."

Reference in config.csv:

name,value
llm-provider,openai
llm-model,gpt-5
llm-api-key,vault:gbo/llm/openai/api_key

Security Considerations

Cloud Providers

  • API keys stored in Vault, never in config files
  • Consider data residency requirements (EU: Mistral)
  • Review provider data retention policies
  • Use separate keys for production/development

Local Models

  • All data stays on your infrastructure
  • No internet required after model download
  • Full control over model versions
  • Consider GPU security for sensitive deployments

Performance Optimization

Caching

Enable semantic caching to reduce API calls:

name,value
llm-cache-enabled,true
llm-cache-ttl,3600
llm-cache-similarity-threshold,0.92

Batching

For bulk operations, use batch APIs when available:

name,value
llm-batch-enabled,true
llm-batch-size,10

Context Management

Optimize context window usage with episodic memory:

name,value
episodic-memory-enabled,true
episodic-memory-threshold,4
episodic-memory-history,2
episodic-memory-auto-summarize,true

See Episodic Memory for details.

Troubleshooting

Common Issues

API Key Invalid

  • Verify key is stored correctly in Vault
  • Check if key has required permissions
  • Ensure billing is active on provider account

Model Not Found

  • Check model name spelling
  • Verify model is available in your region
  • Some models require waitlist access

Rate Limits

  • Implement exponential backoff
  • Use caching to reduce calls
  • Consider upgrading API tier

Local Model Slow

  • Check GPU memory usage
  • Reduce context size
  • Use quantized models (Q4 instead of F16)

Logging

Enable LLM logging for debugging:

name,value
llm-log-requests,true
llm-log-responses,false
llm-log-timing,true

2025 Model Comparison

ModelCreatorTypeStrengths
GPT-5OpenAIProprietaryMost advanced all-in-one
Claude Opus/Sonnet 4.5AnthropicProprietaryExtended thinking, complex reasoning
Gemini 3 ProGoogleProprietaryBenchmarks, reasoning
Grok 4xAIProprietaryReal-time X data
DeepSeek-V3.1/R1DeepSeekOpen (MIT/Apache)Cost-optimized, reasoning
Llama 4MetaOpen-weight10M context, multimodal
Qwen3AlibabaOpen (Apache)Efficient MoE
Mixtral-8x22BMistralOpen (Apache)Multi-language, coding
GPT-ossOpenAIOpen (Apache)Agent workflows
Gemma 2/3GoogleOpen-weightLightweight, efficient

Next Steps