Back to Chronicle
🤖AI10 April 2026

Gemini 1.5 Flash on the Free Tier: What You Get

We integrated Google's Gemini 1.5 Flash into Labs. Here's a breakdown of the free tier constraints and how we designed around them.

When cost is a constraint, engineering creativity becomes the skill.

Free Tier Limits (as of April 2026)

  • RPM: 15 requests per minute
  • RPD: 1,500 requests per day
  • Context window: 1M tokens
  • Output tokens: 8,192 per request
  • Our Design Choices

    Structured output with strict JSON schema — We instruct Gemini to return only a flat JSON object rather than free text. This avoids markdown wrapping and reduces token usage per call.

    System instructions, not prompt repetition — The mode-specific behavior (refine / expand / simplify) is encoded into the system instruction, not repeated in every user prompt. This cuts ~30% of input tokens.

    Client-side debouncing — The UI uses useTransition to prevent multiple rapid requests, protecting our RPM budget.

    The Upgrade Path

    When budget allows, we'll move to gemini-2.0-flash for lower latency. Eventually, when effectiveness beats cost, gemini-2.5-pro for complex reasoning tasks.

    #gemini#AI#cost-strategy