When cost is a constraint, engineering creativity becomes the skill.

Free Tier Limits (as of April 2026)

RPM: 15 requests per minute

RPD: 1,500 requests per day

Context window: 1M tokens

Output tokens: 8,192 per request

Our Design Choices

Structured output with strict JSON schema — We instruct Gemini to return only a flat JSON object rather than free text. This avoids markdown wrapping and reduces token usage per call.

System instructions, not prompt repetition — The mode-specific behavior (refine / expand / simplify) is encoded into the system instruction, not repeated in every user prompt. This cuts ~30% of input tokens.

Client-side debouncing — The UI uses useTransition to prevent multiple rapid requests, protecting our RPM budget.

The Upgrade Path

When budget allows, we'll move to gemini-2.0-flash for lower latency. Eventually, when effectiveness beats cost, gemini-2.5-pro for complex reasoning tasks.

Gemini 1.5 Flash on the Free Tier: What You Get

Free Tier Limits (as of April 2026)

Our Design Choices

The Upgrade Path