Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Infrastructure Design

This chapter covers the complete infrastructure design for General Bots, including scaling, security, secrets management, observability, and high availability.

Architecture Overview

General Bots uses a modular architecture where each component runs in isolated LXC containers. This provides isolation where each service has its own filesystem and process space, scalability through adding more containers to handle increased load, security since compromised components cannot affect others, and portability allowing containers to move between hosts easily.

Component Diagram

High Availability Architecture

Infrastructure Architecture

Production-ready infrastructure with automatic scaling, load balancing, and multi-tenant isolation.

Encryption at Rest

All data stored by General Bots is encrypted at rest using AES-256-GCM.

Database Encryption

PostgreSQL uses Transparent Data Encryption (TDE):

# config.csv
encryption-at-rest,true
encryption-algorithm,aes-256-gcm
encryption-key-source,vault

Enable in PostgreSQL:

-- Enable pgcrypto extension
CREATE EXTENSION IF NOT EXISTS pgcrypto;

-- Encrypted columns use pgp_sym_encrypt
ALTER TABLE bot_memories 
ADD COLUMN value_encrypted bytea;

UPDATE bot_memories 
SET value_encrypted = pgp_sym_encrypt(value, current_setting('app.encryption_key'));

File Storage Encryption

MinIO server-side encryption is enabled using SSE-S3 for automatic encryption or SSE-C for customer-managed keys:

# Enable SSE-S3 encryption
mc encrypt set sse-s3 local/gbo-bucket

# Or use customer-managed keys (SSE-C)
mc encrypt set sse-c local/gbo-bucket

Configuration:

# config.csv
drive-encryption,true
drive-encryption-type,sse-s3
drive-encryption-key,vault:gbo/encryption/drive_key

Redis Encryption

Redis with TLS and encrypted RDB provides secure caching:

# redis.conf
tls-port 6379
port 0
tls-cert-file /opt/gbo/conf/certificates/redis/server.crt
tls-key-file /opt/gbo/conf/certificates/redis/server.key
tls-ca-cert-file /opt/gbo/conf/certificates/ca.crt

# Enable RDB encryption (Redis 7.2+)
rdb-save-incremental-fsync yes

Vector Database Encryption

Qdrant with encrypted storage uses TLS for transport and filesystem-level encryption for data at rest:

# qdrant/config.yaml
storage:
  storage_path: /opt/gbo/data/qdrant
  on_disk_payload: true
  
service:
  enable_tls: true
  
# Disk encryption handled at filesystem level

Filesystem-Level Encryption

For comprehensive encryption, use LUKS on the data partition:

# Create encrypted partition for /opt/gbo/data
cryptsetup luksFormat /dev/sdb1
cryptsetup open /dev/sdb1 gbo-data
mkfs.ext4 /dev/mapper/gbo-data
mount /dev/mapper/gbo-data /opt/gbo/data

Media Processing: LiveKit

LiveKit handles all media processing needs for General Bots. WebRTC is native to LiveKit. Recording is built-in via the Egress service. Transcoding uses the Egress service. Streaming and AI integration are built into LiveKit.

LiveKit’s Egress service handles room recording, participant recording, livestreaming to YouTube and Twitch, and track composition.

LiveKit Configuration

# config.csv
meet-provider,livekit
meet-server-url,wss://localhost:7880
meet-api-key,vault:gbo/meet/api_key
meet-api-secret,vault:gbo/meet/api_secret
meet-recording-enabled,true
meet-transcription-enabled,true

Messaging: Redis

General Bots uses Redis for all messaging needs including session state, PubSub for real-time communication, and Streams for persistence:

#![allow(unused)]
fn main() {
// Session state
redis::cmd("SET").arg("session:123").arg(state_json)

// PubSub for real-time
redis::cmd("PUBLISH").arg("channel:bot-1").arg(message)

// Streams for persistence (optional)
redis::cmd("XADD").arg("stream:events").arg("*").arg("event").arg(data)
}

Configuration:

# config.csv
messaging-provider,redis
messaging-persistence,streams
messaging-retention-hours,24

Sharding Strategies

Each tenant or organization gets isolated databases.

Multi-Tenant Architecture

Each tenant gets isolated resources with dedicated database schemas, cache namespaces, and vector collections. The router maps tenant IDs to their respective data stores automatically.

Key isolation features include database-per-tenant or schema-per-tenant options, namespace isolation in Valkey cache, collection isolation in Qdrant vectors, and bucket isolation in SeaweedFS storage.

Configuration:

# config.csv
shard-strategy,tenant
shard-auto-provision,true
shard-isolation-level,database

Advantages include complete data isolation which is compliance friendly, easy backup and restore per tenant, simplicity, and no cross-tenant queries. Disadvantages include more resources per tenant, complex tenant migration, and connection pool overhead.

Option 2: Hash-Based Sharding

Distribute by user or session ID hash. For example, a user_id of 12345 produces a hash that modulo num_shards equals 2, routing to shard-2.

Configuration:

# config.csv
shard-strategy,hash
shard-count,4
shard-key,user_id
shard-algorithm,consistent-hash

Advantages include even distribution, predictable routing, and good performance for high-volume single-tenant deployments. Disadvantages include complex resharding, difficult cross-shard queries, and no tenant isolation.

Option 3: Time-Based Sharding

For time-series data like logs and analytics:

# config.csv
shard-strategy,time
shard-interval,monthly
shard-retention-months,12
shard-auto-archive,true

This automatically creates partitions named messages_2024_01, messages_2024_02, messages_2024_03, and so on.

Option 4: Geographic Sharding

Route by user location:

# config.csv
shard-strategy,geo
shard-regions,us-east,eu-west,ap-south
shard-default,us-east
shard-detection,ip

Geographic Distribution

The global router uses GeoIP to direct users to the nearest regional cluster. US-East in Virginia runs a full cluster, EU-West in Frankfurt runs a full cluster, and AP-South in Singapore runs a full cluster. Each regional cluster runs independently with data replication between regions for disaster recovery.

Auto-Scaling with LXC

Configuration

# config.csv - Auto-scaling settings
scale-enabled,true
scale-min-instances,1
scale-max-instances,10
scale-cpu-threshold,70
scale-memory-threshold,80
scale-request-threshold,1000
scale-cooldown-seconds,300
scale-check-interval,30

Scaling Rules

MetricScale UpScale Down
CPU> 70% for 2 min< 30% for 5 min
Memory> 80% for 2 min< 40% for 5 min
Requests/sec> 1000< 200
Response time> 2000ms< 500ms
Queue depth> 100< 10

Auto-Scale Service

The auto-scaler runs as a systemd service:

# /etc/systemd/system/gbo-autoscale.service
[Unit]
Description=General Bots Auto-Scaler
After=network.target

[Service]
Type=simple
ExecStart=/opt/gbo/scripts/autoscale.sh
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Container Lifecycle

The startup flow begins with creating the LXC container from a template, then configuring resources for CPU, memory, and storage, then starting the BotServer binary, and finally marking the container as ready and adding it to the load balancer pool.

The shutdown flow begins with an active container serving requests, then draining to stop accepting new connections, then stopping with a graceful BotServer shutdown, and finally deleting or returning the container to the pool.

Load Balancing

Caddy Configuration

{
    admin off
    auto_https on
}

bot.example.com {
    # Rate limiting
    rate_limit {
        zone api {
            key {remote_host}
            events 100
            window 1m
        }
    }
    
    # WebSocket (sticky sessions)
    handle /ws* {
        reverse_proxy botserver-1:8080 botserver-2:8080 {
            lb_policy cookie
            health_uri /api/health
            health_interval 10s
        }
    }
    
    # API (round robin)
    handle /api/* {
        reverse_proxy botserver-1:8080 botserver-2:8080 {
            lb_policy round_robin
            fail_duration 30s
        }
    }
}

Rate Limiting Configuration

# config.csv - Rate limiting
rate-limit-enabled,true
rate-limit-requests,100
rate-limit-window,60
rate-limit-burst,20
rate-limit-by,ip

# Per-endpoint limits
rate-limit-api-chat,30
rate-limit-api-files,50
rate-limit-api-auth,10
rate-limit-api-llm,20

Failover Systems

Health Checks

Every service exposes /health:

{
  "status": "healthy",
  "version": "6.1.0",
  "checks": {
    "database": {"status": "ok", "latency_ms": 5},
    "cache": {"status": "ok", "latency_ms": 2},
    "vectordb": {"status": "ok", "latency_ms": 10},
    "llm": {"status": "ok", "latency_ms": 50}
  }
}

Circuit Breaker

# config.csv
circuit-breaker-enabled,true
circuit-breaker-threshold,5
circuit-breaker-timeout,30
circuit-breaker-half-open-requests,3

The circuit breaker has three states. Closed represents normal operation while counting failures. Open means failing fast and returning errors immediately. Half-Open tests with limited requests before deciding to close or reopen.

Database Failover

PostgreSQL with streaming replication provides high availability.

Database Replication

PostgreSQL replication is managed by Patroni for automatic failover. The Primary serves as the write leader handling all write operations. The Replica provides synchronous replication from the primary for read scaling. Patroni acts as the failover manager performing automatic leader election on failure.

Failover happens automatically within seconds, with clients redirected via the connection pooler.

Graceful Degradation

# config.csv - Fallbacks
fallback-llm-enabled,true
fallback-llm-provider,local
fallback-llm-model,DeepSeek-R3-Distill-Qwen-1.5B

fallback-cache-enabled,true
fallback-cache-mode,memory

fallback-vectordb-enabled,true
fallback-vectordb-mode,keyword-search

Secrets Management (Vault)

Architecture

The minimal .env file contains only Vault connection details. All other secrets are stored in Vault and fetched at runtime. The Vault server stores secrets organized by path including gbo/drive for access keys, gbo/tables for database credentials, gbo/cache for passwords, gbo/directory for client credentials, gbo/email for mail credentials, gbo/llm for provider API keys, gbo/encryption for master and data keys, and gbo/meet for API credentials.

Zitadel vs Vault

Zitadel handles user authentication, OAuth/OIDC, and MFA. Vault handles service credentials, API keys, and encryption keys. Use both together where Zitadel manages user identity and SSO while Vault manages service secrets and encryption keys.

Minimal .env with Vault

# .env - Only Vault and Directory needed
VAULT_ADDR=https://localhost:8200
VAULT_TOKEN=hvs.your-token-here

# Directory for user auth (Zitadel)
DIRECTORY_URL=https://localhost:8080
DIRECTORY_CLIENT_ID=your-client-id
DIRECTORY_CLIENT_SECRET=your-client-secret

# All other secrets fetched from Vault at runtime

Observability

Option 1: InfluxDB + Grafana (Current)

For time-series metrics:

# config.csv
observability-provider,influxdb
observability-url,http://localhost:8086
observability-org,pragmatismo
observability-bucket,metrics

Vector serves as a log and metric aggregator. BotServer logs flow to Vector which pipelines them to InfluxDB for metrics storage and Grafana for dashboards.

Vector configuration:

# vector.toml
[sources.botserver_logs]
type = "file"
include = ["/opt/gbo/logs/*.log"]

[transforms.parse_logs]
type = "remap"
inputs = ["botserver_logs"]
source = '''
. = parse_json!(.message)
'''

[sinks.influxdb]
type = "influxdb_metrics"
inputs = ["parse_logs"]
endpoint = "http://localhost:8086"
org = "pragmatismo"
bucket = "metrics"

Replacing log.* Calls with Vector

Instead of replacing all log calls, configure Vector to collect logs from files, parse and enrich them, and route to appropriate sinks:

# Route errors to alerts
[transforms.filter_errors]
type = "filter"
inputs = ["parse_logs"]
condition = '.level == "error"'

[sinks.alertmanager]
type = "http"
inputs = ["filter_errors"]
uri = "http://alertmanager:9093/api/v1/alerts"

Search: Qdrant

Qdrant handles all search needs in General Bots, providing both vector similarity search for semantic queries and payload filtering for keyword-like queries.

Hybrid Search with Qdrant

Qdrant supports hybrid search combining vector similarity with keyword filters:

#![allow(unused)]
fn main() {
// Combine vector similarity + keyword filter
let search_request = SearchPoints {
    collection_name: "kb".to_string(),
    vector: query_embedding,
    limit: 10,
    filter: Some(Filter {
        must: vec![
            Condition::Field(FieldCondition {
                key: "content".to_string(),
                r#match: Some(Match::Text("keyword".to_string())),
            }),
        ],
        ..Default::default()
    }),
    ..Default::default()
};
}

Workflow Scheduling: SET SCHEDULE

General Bots uses the SET SCHEDULE keyword for all scheduling needs:

REM Run every day at 9 AM
SET SCHEDULE "daily-report" TO "0 9 * * *"
    TALK "Running daily report..."
    result = GET "/api/reports/daily"
    SEND MAIL "admin@example.com", "Daily Report", result
END SCHEDULE

MFA with Zitadel

Configuration

MFA is handled transparently by Zitadel:

# config.csv
auth-mfa-enabled,true
auth-mfa-methods,totp,sms,email,whatsapp
auth-mfa-required-for,admin,sensitive-operations
auth-mfa-grace-period-days,7

Zitadel MFA Settings

In the Zitadel console, navigate to Settings then Login Behavior. Enable Multi-Factor Authentication and select allowed methods including TOTP for authenticator apps, SMS, Email, and WebAuthn/FIDO2.

WhatsApp MFA Channel

# config.csv
auth-mfa-whatsapp-enabled,true
auth-mfa-whatsapp-provider,twilio
auth-mfa-whatsapp-template,mfa_code

The flow proceeds as follows: the user logs in with password, Zitadel triggers MFA, a code is sent via WhatsApp, the user enters the code, and the session is established.

Summary: What You Need

PostgreSQL, Redis, Qdrant, MinIO, and Zitadel are required components. Vault, InfluxDB, and LiveKit are recommended for production deployments. Vector is optional for log aggregation.

Next Steps

The Scaling and Load Balancing chapter provides a detailed scaling guide. The Container Deployment chapter covers LXC setup. The Security Features chapter offers a security deep dive. The LLM Providers appendix helps with model selection.