Multimodal Overview

Lunos supports multimodal requests for models that can process non-text inputs. You can combine text with images, PDFs, audio, or video in one request and send it through the same chat-style API flow.

For output generation, use dedicated image generation endpoints.

When to use multimodal

Analyze product photos and screenshots
Read long documents and scanned files
Transcribe or analyze speech recordings
Understand video scenes, events, and timelines

Main endpoint pattern

Most multimodal requests use:

POST /v1/chat/completions

The request body uses messages, and each message can include a content array with multiple content blocks.

Typical content block types

text
image_url
file (for PDFs)
input_audio
video_url

Input multimodal vs generation APIs

Use POST /v1/chat/completions for understanding existing files (image/PDF/audio/video input).
Use POST /v1/images/generations when you want the model to create a new image.

Basic mixed-input example

cURL Python TypeScript

curl -X POST "https://api.lunos.tech/v1/chat/completions" \
  -H "Authorization: Bearer YOUR_SECRET_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-2.5-flash",
    "messages": [
      {
        "role": "user",
        "content": [
          { "type": "text", "text": "Summarize the key information from this file and image." },
          { "type": "file", "file": { "url": "https://example.com/report.pdf" } },
          { "type": "image_url", "image_url": { "url": "https://example.com/diagram.png" } }
        ]
      }
    ]
  }'

import requests

url = "https://api.lunos.tech/v1/chat/completions"
headers = {
    "Authorization": "Bearer YOUR_SECRET_KEY",
    "Content-Type": "application/json",
}
payload = {
    "model": "google/gemini-2.5-flash",
    "messages": [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Summarize the key information from this file and image."},
                {"type": "file", "file": {"url": "https://example.com/report.pdf"}},
                {"type": "image_url", "image_url": {"url": "https://example.com/diagram.png"}},
            ],
        }
    ],
}
response = requests.post(url, headers=headers, json=payload)
print(response.json())

const response = await fetch("https://api.lunos.tech/v1/chat/completions", {
  method: "POST",
  headers: {
    Authorization: "Bearer YOUR_SECRET_KEY",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "google/gemini-2.5-flash",
    messages: [
      {
        role: "user",
        content: [
          { type: "text", text: "Summarize the key information from this file and image." },
          { type: "file", file: { url: "https://example.com/report.pdf" } },
          { type: "image_url", image_url: { url: "https://example.com/diagram.png" } },
        ],
      },
    ],
  }),
});
const data = await response.json();
console.log(data);

Model compatibility

Not every model supports every modality. Before sending multimodal data:

Call GET /v1/models
Check inputModalities on your selected model
Send only supported content types

URL vs base64

Prefer URL inputs for public files and large assets
Use base64 when files are local or private
Keep payloads small when using base64 to avoid request failures

Lunos best practices

Validate file type and size in your backend before forwarding to Lunos
Route dynamically by capability (inputModalities) instead of hardcoding one model
Add retries and fallbacks for provider-specific limits
Keep one text instruction in the same content array to guide model behavior clearly