Lunos logoLunos

Multimodal Overview

Lunos supports multimodal requests for models that can process non-text inputs. You can combine text with images, PDFs, audio, or video in one request and send it through the same chat-style API flow.

For output generation, use dedicated image generation endpoints.

When to use multimodal

  • Analyze product photos and screenshots
  • Read long documents and scanned files
  • Transcribe or analyze speech recordings
  • Understand video scenes, events, and timelines

Main endpoint pattern

Most multimodal requests use:

POST /v1/chat/completions

The request body uses messages, and each message can include a content array with multiple content blocks.

Typical content block types

  • text
  • image_url
  • file (for PDFs)
  • input_audio
  • video_url

Input multimodal vs generation APIs

  • Use POST /v1/chat/completions for understanding existing files (image/PDF/audio/video input).
  • Use POST /v1/images/generations when you want the model to create a new image.

Basic mixed-input example

curl -X POST "https://api.lunos.tech/v1/chat/completions" \
  -H "Authorization: Bearer YOUR_SECRET_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-2.5-flash",
    "messages": [
      {
        "role": "user",
        "content": [
          { "type": "text", "text": "Summarize the key information from this file and image." },
          { "type": "file", "file": { "url": "https://example.com/report.pdf" } },
          { "type": "image_url", "image_url": { "url": "https://example.com/diagram.png" } }
        ]
      }
    ]
  }'
import requests

url = "https://api.lunos.tech/v1/chat/completions"
headers = {
    "Authorization": "Bearer YOUR_SECRET_KEY",
    "Content-Type": "application/json",
}
payload = {
    "model": "google/gemini-2.5-flash",
    "messages": [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Summarize the key information from this file and image."},
                {"type": "file", "file": {"url": "https://example.com/report.pdf"}},
                {"type": "image_url", "image_url": {"url": "https://example.com/diagram.png"}},
            ],
        }
    ],
}
response = requests.post(url, headers=headers, json=payload)
print(response.json())
const response = await fetch("https://api.lunos.tech/v1/chat/completions", {
  method: "POST",
  headers: {
    Authorization: "Bearer YOUR_SECRET_KEY",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "google/gemini-2.5-flash",
    messages: [
      {
        role: "user",
        content: [
          { type: "text", text: "Summarize the key information from this file and image." },
          { type: "file", file: { url: "https://example.com/report.pdf" } },
          { type: "image_url", image_url: { url: "https://example.com/diagram.png" } },
        ],
      },
    ],
  }),
});
const data = await response.json();
console.log(data);

Model compatibility

Not every model supports every modality. Before sending multimodal data:

  1. Call GET /v1/models
  2. Check inputModalities on your selected model
  3. Send only supported content types

URL vs base64

  • Prefer URL inputs for public files and large assets
  • Use base64 when files are local or private
  • Keep payloads small when using base64 to avoid request failures

Lunos best practices

  • Validate file type and size in your backend before forwarding to Lunos
  • Route dynamically by capability (inputModalities) instead of hardcoding one model
  • Add retries and fallbacks for provider-specific limits
  • Keep one text instruction in the same content array to guide model behavior clearly

Next pages