USE MODEL

Dynamically switches the LLM model used for AI operations within a script. Enables model routing based on task requirements, cost optimization, or performance needs.

Syntax

USE MODEL "modelname"

USE MODEL "auto"

Parameters

Parameter	Type	Description
`modelname`	String	Name of the model to use, or “auto” for automatic routing

Description

USE MODEL allows scripts to dynamically select which language model to use for subsequent AI operations. This is essential for:

Cost optimization - Use smaller/cheaper models for simple tasks
Quality control - Use powerful models for complex reasoning
Speed optimization - Use fast models for real-time responses
Specialized tasks - Use code-specific models for programming

When set to "auto", the system automatically routes queries to the most appropriate model based on task complexity, latency requirements, and cost considerations.

Examples

Basic Model Selection

' Use a fast model for simple queries
USE MODEL "fast"
response = LLM "What time is it in New York?"
TALK response

' Switch to quality model for complex analysis
USE MODEL "quality"
analysis = LLM "Analyze the market trends for Q4 and provide recommendations"
TALK analysis

Automatic Model Routing

' Let the system choose the best model
USE MODEL "auto"

' Simple query -> routes to fast model
greeting = LLM "Say hello"

' Complex query -> routes to quality model  
report = LLM "Generate a detailed financial analysis with projections"

Code Generation

' Use code-specialized model
USE MODEL "code"

code = LLM "Write a Python function to calculate fibonacci numbers"
TALK code

Cost-Aware Processing

' Process bulk items with cheap model
USE MODEL "fast"
FOR EACH item IN items
    summary = LLM "Summarize in one sentence: " + item.text
    item.summary = summary
NEXT item

' Final review with quality model
USE MODEL "quality"
review = LLM "Review these summaries for accuracy: " + summaries

Model Fallback Pattern

' Try preferred model first
USE MODEL "claude-sonnet-4.5"
ON ERROR GOTO fallback
response = LLM prompt
GOTO done

fallback:
' Fall back to local model if API fails
USE MODEL "local"
response = LLM prompt

done:
TALK response

Model Routing Strategies

The system supports several routing strategies configured in config.csv:

Strategy	Description
`manual`	Explicit model selection only
`auto`	Automatic routing based on query analysis
`load-balanced`	Distribute across models for throughput
`fallback`	Try models in order until one succeeds

Built-in Model Aliases

Alias	Description	Use Case
`fast`	Optimized for speed	Simple queries, real-time chat
`quality`	Optimized for accuracy	Complex reasoning, analysis
`code`	Code-specialized model	Programming tasks
`local`	Local GGUF model	Offline/private operation
`auto`	System-selected	Let routing decide

Config.csv Options

name,value
model-routing-strategy,auto
model-default,fast
model-fast,DeepSeek-R3-Distill-Qwen-1.5B-Q3_K_M.gguf
model-quality,claude-sonnet-4.5
model-code,codellama-7b.gguf
model-fallback-enabled,true
model-fallback-order,quality,fast,local

Option	Default	Description
`model-routing-strategy`	`auto`	Routing strategy to use
`model-default`	`fast`	Default model when not specified
`model-fast`	(configured)	Model for fast/simple tasks
`model-quality`	(configured)	Model for quality/complex tasks
`model-code`	(configured)	Model for code generation
`model-fallback-enabled`	`true`	Enable automatic fallback
`model-fallback-order`	`quality,fast,local`	Order to try on failure

Auto-Routing Criteria

When USE MODEL "auto" is active, the system considers:

Query complexity - Token count, reasoning required
Task type - Code, analysis, chat, translation
Latency requirements - Real-time vs batch
Cost budget - Per-query and daily limits
Model availability - Health checks, rate limits

Keyword	Description
`LLM`	Query the language model
`SET CONTEXT`	Add context for LLM
`BEGIN SYSTEM PROMPT`	Define AI persona

Performance Considerations

Model switching has minimal overhead
Auto-routing adds ~10ms for classification
Consider batching similar queries under one model
Local models avoid network latency

Best Practices

Start with auto - Let the system optimize, then tune
Batch by model - Group similar tasks to reduce switching
Monitor costs - Track per-model usage in analytics
Test fallbacks - Ensure graceful degradation
Profile your queries - Understand which need quality vs speed

General Bots Documentation