Gemini Image Generation Quota Exhaustion
Summary
The AI image generation feature (POST /ai/generate-image-gemini) fails with HTTP 500 when the Google Gemini API returns 429 Too Many Requests. The web-app Next.js route catches the Gemini SDK error but returns it as a 500 to users instead of a meaningful error. 119 errors in the last 2 days (91 rate-limit, 28 empty-response), with bursts of 130–190 failures during early morning hours (05:00–07:00 UTC).
Log Evidence
All 361 error logs originate from the web-app pods. Only 4 ingress-level logs were captured for this endpoint.
429 Quota Exhaustion — Full Error
* Quota exceeded for metric:
generativelanguage.googleapis.com/generate_requests_per_model_per_day,
limit: 0
[{
"@type": "type.googleapis.com/google.rpc.QuotaFailure",
"violations": [{
"quotaMetric": "generativelanguage.googleapis.com/generate_requests_per_model_per_day",
"quotaId": "GenerateRequestsPerDayPerProjectPerModel"
}]
}]The quota metric GenerateRequestsPerDayPerProjectPerModel with limit: 0 means zero remaining requests for gemini-2.5-flash-image on this day. The daily per-model quota resets overnight, and morning traffic exhausts it within 1–2 hours.
429 Quota Exhaustion — Wrapped Error
Error generating image with Gemini: Error: [GoogleGenerativeAI Error]:
Error fetching from https://generativelanguage.googleapis.com/v1beta/
models/gemini-2.5-flash-image:generateContent: [429 Too Many Requests]
You exceeded your current quota, please check your plan and billing details.Empty Response Error
Error generating image with Gemini: Error: No image generated
at eo (.next/server/app/ai/generate-image-gemini/route.js:1:21934)Five occurrences in 25 seconds, suggesting a single user retrying rapidly after each failure.
Traffic Pattern
Hourly Error Breakdown
| Hour (UTC) | 429 Rate Limit | Empty Response | Total |
|---|---|---|---|
| Feb 17 00:00 | 0 | 5 | 5 |
| Feb 17 02:00 | 37 | 0 | 37 |
| Feb 17 04:00 | 52 | 0 | 52 |
| Feb 17 06:00 | 48 | 0 | 48 |
| Feb 17 08:00 | 0 | 2 | 2 |
| Feb 17 20:00 | 0 | 6 | 6 |
| Feb 18 06:00 | 25 | 0 | 25 |
| Feb 18 07:00 | 66 | 0 | 66 |
| Feb 18 12:00 | 0 | 8 | 8 |
| Feb 18 15:00 | 0 | 6 | 6 |
Rate-limit (429) errors cluster between 02:00–08:00 UTC when the daily quota is consumed. Empty-response errors occur throughout the day independently of quota state. The pattern repeats daily: Google resets the quota overnight, and early-morning traffic exhausts it within 1–2 hours.
Root Cause
The Next.js API route calls gemini-2.5-flash-image:generateContent via the @google/generative-ai SDK. Two failure modes:
- Quota exhaustion (429): The Gemini API key is on a plan with a low request-per-day limit. The SDK throws an error with the 429 status. The route catches this and logs it but returns HTTP 500 to the caller.
- Empty response: The Gemini API accepts the request but returns no image data. The route throws Error: No image generated and returns HTTP 500.
Code Analysis
// web-app/app/ai/generate-image-gemini/route.ts
} catch (error) {
console.error("Error generating image with Gemini:", error);
return NextResponse.json(
{ error: error instanceof Error ? error.message : "Failed to generate image" },
{ status: 500 } // ← always 500, even for 429
);
}The route creates a new GoogleGenerativeAI client on every request (no singleton, no connection pooling). The catch-all error handler returns HTTP 500 for all failures, including 429 rate limits. No retry logic, no rate limiting, no error classification, no circuit breaker.
Impact
- User-facing: Users attempting to generate AI images get a 500 error. ~690 failed HTTP requests observed in traces over 2 days.
- No data loss or cascading failures. The feature is self-contained.
- Revenue impact: If image generation is part of a paid feature flow, users may abandon the flow.
Recommended Remediation
Immediate
- Return HTTP 429 (not 500) when the Gemini API returns a rate-limit error — parse the error message for "429" and pass retry-after information to the caller
- Return HTTP 503 with a user-friendly message when the API returns an empty response
Short-term
- Review the Gemini API plan — upgrade quota or switch to a higher-tier plan if the current limits are insufficient for production traffic
- Add client-side rate limiting — queue or throttle image generation requests to stay within API quota
- Add a retry with exponential backoff for transient 429 errors before failing to the user
Medium-term
- Add a fallback image generation provider (e.g., OpenAI DALL-E, Stability AI) for when Gemini quota is exhausted
- Add quota monitoring — alert when Gemini API usage reaches 80% of the daily/minute limit so the team can act before users are affected
Want this level of investigation for your infrastructure?
Book a Demo →