Lunos logoLunos

Web Search

Web Search (server tool) lets a model access real-time information from the web. When the model decides it needs current data, it requests a web search and then uses the returned results to craft a grounded answer (with citations when available).

Beta: Server tools and their parameters may change.

How it works

  1. Your request includes a tool entry:
    • tools: [{ "type": "web_search" }]
  2. The model decides whether it needs a web search, and generates a search query.
  3. Lunos forwards the request to the router, which executes the search using the configured engine.
  4. The search results (URLs/titles/snippets) are returned to the model.
  5. The model synthesizes the results into the final response. It may repeat searches within the same request when needed.

Quick start

curl -X POST "https://api.lunos.tech/v1/chat/completions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      { "role": "user", "content": "What were the major AI announcements this week?" }
    ],
    "tools": [
      { "type": "web_search" }
    ]
  }'
import requests

url = "https://api.lunos.tech/v1/chat/completions"
headers = {
  "Authorization": "Bearer YOUR_API_KEY",
  "Content-Type": "application/json",
}
payload = {
  "model": "openai/gpt-4o",
  "messages": [
    { "role": "user", "content": "What were the major AI announcements this week?" }
  ],
  "tools": [
    { "type": "web_search" }
  ],
}
response = requests.post(url, headers=headers, json=payload)
print(response.json())
const response = await fetch("https://api.lunos.tech/v1/chat/completions", {
  method: "POST",
  headers: {
    Authorization: "Bearer YOUR_API_KEY",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "openai/gpt-4o",
    messages: [
      { role: "user", content: "What were the major AI announcements this week?" },
    ],
    tools: [{ type: "web_search" }],
  }),
});

const data = await response.json();
console.log(data);

Configuration

You can customize web search behavior by passing an optional parameters object next to the tool type.

Example:

{
  "tools": [
    {
      "type": "web_search",
      "parameters": {
        "engine": "exa",
        "max_results": 5,
        "max_total_results": 20,
        "search_context_size": "medium",
        "allowed_domains": ["example.com"],
        "excluded_domains": ["reddit.com"]
      }
    }
  ]
}

Common parameters:

  • engine (auto): auto, native, exa, firecrawl, or parallel
  • max_results (default 5): max results per search call (engine-dependent range)
  • max_total_results: cap the total results across all searches in one request
  • search_context_size (medium): controls how much snippet context is retrieved (engine-dependent)
  • allowed_domains: restrict results to specific domains (engine-dependent)
  • excluded_domains: exclude results from specific domains (engine-dependent)

User location (optional)

If supported by your engine, you can pass an approximate user location to bias results geographically:

{
  "tools": [
    {
      "type": "web_search",
      "parameters": {
        "user_location": {
          "type": "approximate",
          "city": "San Francisco",
          "region": "California",
          "country": "US",
          "timezone": "America/Los_Angeles"
        }
      }
    }
  ]
}

Engine selection

Different engines may support different capabilities (like domain filtering or context sizing). Use:

  • auto (default): pick a native provider when available, otherwise fall back to an Exa-style engine
  • native: force provider-native web search where available
  • exa: use Exa search (keyword + embeddings-style)
  • firecrawl: use Firecrawl search (BYOK — bring your own key)
  • parallel: use Parallel search

Domain filtering

To restrict what the model is allowed to use, configure:

{
  "type": "web_search",
  "parameters": {
    "allowed_domains": ["arxiv.org", "nature.com"],
    "excluded_domains": ["reddit.com"]
  }
}

Domain filtering behavior depends on the engine. If a given engine cannot apply those filters, Lunos/router may reject the request or ignore unsupported fields.

Controlling total results

When the model performs multiple searches inside one request, use max_total_results to limit the cumulative number of results:

{
  "type": "web_search",
  "parameters": {
    "max_results": 5,
    "max_total_results": 15
  }
}

Once the cap is reached, further searches in the same request should stop and the model is informed that the limit has been hit.

Error handling

You may see errors in these cases:

  1. The selected model doesn’t allow tool-based web search (request rejected by Lunos)
  2. The tool type or parameters are invalid (request rejected by the router)
  3. Your selected engine rejects filters it cannot apply

If that happens, switch to a compatible model and/or adjust the tool parameters.

Usage tracking (best-effort)

If the router returns it, the response usage object may include extra fields for server tool calls (for example, the number of web search requests).

Notes for Lunos

  • In Lunos requests, use type: "web_search".
  • Lunos translates that tool type to the equivalent router tool internally.
  • Tool usage can add additional cost beyond standard LLM tokens. Validate pricing for your chosen engine in your account/dashboard.

Next steps