Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Multimodal Module

Image, video, and audio generation with vision/captioning capabilities.

Overview

The multimodal module connects to BotModels server for AI-powered media generation and analysis.

BASIC Keywords

KeywordPurpose
IMAGEGenerate image from text prompt
VIDEOGenerate video from text prompt
AUDIOGenerate speech audio from text
SEEDescribe/caption an image or video

IMAGE

Generate an image from a text prompt:

url = IMAGE "A sunset over mountains with a lake"
TALK "Here's your image: " + url

Timeout: 300 seconds (5 minutes)

VIDEO

Generate a video from a text prompt:

url = VIDEO "A cat playing with a ball of yarn"
TALK "Here's your video: " + url

Timeout: 600 seconds (10 minutes)

AUDIO

Generate speech audio from text:

url = AUDIO "Welcome to our service. How can I help you today?"
PLAY url

SEE

Get a description of an image or video:

description = SEE "path/to/image.jpg"
TALK "I see: " + description

Configuration

Add to config.csv:

botmodels-enabled,true
botmodels-host,localhost
botmodels-port,5000
botmodels-api-key,your-api-key
botmodels-use-https,false

Image Generation Config

botmodels-image-model,stable-diffusion
botmodels-image-steps,20
botmodels-image-width,512
botmodels-image-height,512

Video Generation Config

botmodels-video-model,text2video
botmodels-video-frames,16
botmodels-video-fps,8

BotModels Client

Rust API for direct integration:

#![allow(unused)]
fn main() {
let client = BotModelsClient::from_state(&state, &bot_id);

if client.is_enabled() {
    let image_url = client.generate_image("A beautiful garden").await?;
    let description = client.describe_image("path/to/photo.jpg").await?;
}
}

Available Methods

MethodDescription
generate_image(prompt)Create image from text
generate_video(prompt)Create video from text
generate_audio(text)Create speech audio
describe_image(path)Get image caption
describe_video(path)Get video description
speech_to_text(audio_path)Transcribe audio
health_check()Check BotModels server status

Response Structures

GenerationResponse

{
    "status": "success",
    "file_path": "/path/to/generated/file.png",
    "generation_time": 12.5,
    "error": null
}

DescribeResponse

{
    "description": "A golden retriever playing fetch in a park",
    "confidence": 0.92
}

Requirements

  • BotModels server running (separate service)
  • GPU recommended for generation tasks
  • Sufficient disk space for generated media

See Also