Lunos logoLunos

Free daily requests

Select models include a daily bucket of tokens you can spend before normal metered billing applies. The allowance resets on a fixed cadence (typically midnight UTC — confirm in-product if we announce changes).

Note: Only models that show a Free label in the Models table participate. Limits and eligibility can change.

How it works

  • Daily reset: Unused gifted tokens generally do not roll over; a fresh budget appears each cycle.
  • Metering: Both prompt and completion tokens usually count toward the gift bucket.
  • Automatic: When the gift bucket is exhausted, identical requests continue but bill at the standard token rate (assuming your account can pay).

Plan access: Eligibility depends on model. Some free-daily models are Premium only, while others are available to all users.

What is a Premium user?

A Premium user is a user whose workspace is marked as Premium by Lunos billing logic (not by a separate subscription object). In practice, top-up access starts at a minimum of $5, and Premium-gated free-daily models are available when requests use API keys from workspaces with Premium status.

Eligible models

These models participate in free daily requests. Max tokens is the maximum context window supported for each model on Lunos (prompt + completion within that limit).

Model id Max tokens Plan
nvidia/nemotron-3-super-120b-a12b 54,000 Premium only
openai/gpt-oss-120b 54,000 Premium only
z-ai/glm-4.5-air 32,000 Premium only
openai/gpt-oss-20b 54,000 All users
google/gemma-4-31b-it 64,000 All users

Daily gift amounts and any finer limits are shown on /models.

Making requests

Use the model id exactly as documented:

const completion = await client.chat.completions.create({
  model: "openai/gpt-oss-120b",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Explain free daily requests on Lunos." },
  ],
});

Track usage

Open Usage statistics to see gifted vs paid consumption per key.

Best practices

  • Set explicit max_tokens caps in dev loops.
  • Batch related questions into fewer turns when possible.
  • Prototype on gifted models, then load-test with paid quotas before launch.

Example: cap output size

await client.chat.completions.create({
  model: "z-ai/glm-4.5-air",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Summarize free daily tokens in two sentences." },
  ],
  max_tokens: 100,
});

FAQ

Do unused tokens roll over?
Generally no — assume they reset each cycle.

Can I ship production traffic on gifted tokens only?
You can, but add a payment method so traffic fails open instead of stopping if a model’s policy changes.

How do I know I left the free bucket?
Usage dashboards show consumption; billed requests succeed without error but appear as paid usage.

Next: Quickstart to wire your first call.