Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

USE MODEL

Dynamically switches the LLM model used for AI operations within a script. Enables model routing based on task requirements, cost optimization, or performance needs.

Syntax

USE MODEL "modelname"
USE MODEL "auto"

Parameters

ParameterTypeDescription
modelnameStringName of the model to use, or “auto” for automatic routing

Description

USE MODEL allows scripts to dynamically select which language model to use for subsequent AI operations. This is essential for:

  • Cost optimization - Use smaller/cheaper models for simple tasks
  • Quality control - Use powerful models for complex reasoning
  • Speed optimization - Use fast models for real-time responses
  • Specialized tasks - Use code-specific models for programming

When set to "auto", the system automatically routes queries to the most appropriate model based on task complexity, latency requirements, and cost considerations.

Examples

Basic Model Selection

' Use a fast model for simple queries
USE MODEL "fast"
response = LLM "What time is it in New York?"
TALK response

' Switch to quality model for complex analysis
USE MODEL "quality"
analysis = LLM "Analyze the market trends for Q4 and provide recommendations"
TALK analysis

Automatic Model Routing

' Let the system choose the best model
USE MODEL "auto"

' Simple query -> routes to fast model
greeting = LLM "Say hello"

' Complex query -> routes to quality model  
report = LLM "Generate a detailed financial analysis with projections"

Code Generation

' Use code-specialized model
USE MODEL "code"

code = LLM "Write a Python function to calculate fibonacci numbers"
TALK code

Cost-Aware Processing

' Process bulk items with cheap model
USE MODEL "fast"
FOR EACH item IN items
    summary = LLM "Summarize in one sentence: " + item.text
    item.summary = summary
NEXT item

' Final review with quality model
USE MODEL "quality"
review = LLM "Review these summaries for accuracy: " + summaries

Model Fallback Pattern

' Try preferred model first
USE MODEL "claude-sonnet-4.5"
ON ERROR GOTO fallback
response = LLM prompt
GOTO done

fallback:
' Fall back to local model if API fails
USE MODEL "local"
response = LLM prompt

done:
TALK response

Model Routing Strategies

The system supports several routing strategies configured in config.csv:

StrategyDescription
manualExplicit model selection only
autoAutomatic routing based on query analysis
load-balancedDistribute across models for throughput
fallbackTry models in order until one succeeds

Built-in Model Aliases

AliasDescriptionUse Case
fastOptimized for speedSimple queries, real-time chat
qualityOptimized for accuracyComplex reasoning, analysis
codeCode-specialized modelProgramming tasks
localLocal GGUF modelOffline/private operation
autoSystem-selectedLet routing decide

Config.csv Options

name,value
model-routing-strategy,auto
model-default,fast
model-fast,DeepSeek-R3-Distill-Qwen-1.5B-Q3_K_M.gguf
model-quality,claude-sonnet-4.5
model-code,codellama-7b.gguf
model-fallback-enabled,true
model-fallback-order,quality,fast,local
OptionDefaultDescription
model-routing-strategyautoRouting strategy to use
model-defaultfastDefault model when not specified
model-fast(configured)Model for fast/simple tasks
model-quality(configured)Model for quality/complex tasks
model-code(configured)Model for code generation
model-fallback-enabledtrueEnable automatic fallback
model-fallback-orderquality,fast,localOrder to try on failure

Auto-Routing Criteria

When USE MODEL "auto" is active, the system considers:

  1. Query complexity - Token count, reasoning required
  2. Task type - Code, analysis, chat, translation
  3. Latency requirements - Real-time vs batch
  4. Cost budget - Per-query and daily limits
  5. Model availability - Health checks, rate limits
KeywordDescription
LLMQuery the language model
SET CONTEXTAdd context for LLM
BEGIN SYSTEM PROMPTDefine AI persona

Performance Considerations

  • Model switching has minimal overhead
  • Auto-routing adds ~10ms for classification
  • Consider batching similar queries under one model
  • Local models avoid network latency

Best Practices

  1. Start with auto - Let the system optimize, then tune
  2. Batch by model - Group similar tasks to reduce switching
  3. Monitor costs - Track per-model usage in analytics
  4. Test fallbacks - Ensure graceful degradation
  5. Profile your queries - Understand which need quality vs speed

See Also