When cost is a constraint, engineering creativity becomes the skill.
Free Tier Limits (as of April 2026)
Our Design Choices
Structured output with strict JSON schema — We instruct Gemini to return only a flat JSON object rather than free text. This avoids markdown wrapping and reduces token usage per call.
System instructions, not prompt repetition — The mode-specific behavior (refine / expand / simplify) is encoded into the system instruction, not repeated in every user prompt. This cuts ~30% of input tokens.
Client-side debouncing — The UI uses useTransition to prevent multiple rapid requests, protecting our RPM budget.
The Upgrade Path
When budget allows, we'll move to gemini-2.0-flash for lower latency. Eventually, when effectiveness beats cost, gemini-2.5-pro for complex reasoning tasks.