← Back to Services
INC-2026-02-18

Gemini Image Generation Quota Exhaustion

Date2026-02-18
SeverityMedium
StatusResolved
Affected Serviceweb-app
Detected ByAudit validation (log and trace analysis)
ClientACME Corp (EdTech platform)

Summary

The AI image generation feature (POST /ai/generate-image-gemini) fails with HTTP 500 when the Google Gemini API returns 429 Too Many Requests. The web-app Next.js route catches the Gemini SDK error but returns it as a 500 to users instead of a meaningful error. 119 errors in the last 2 days (91 rate-limit, 28 empty-response), with bursts of 130–190 failures during early morning hours (05:00–07:00 UTC).

Log Evidence

All 361 error logs originate from the web-app pods. Only 4 ingress-level logs were captured for this endpoint.

429 Quota Exhaustion — Full Error

* Quota exceeded for metric:
    generativelanguage.googleapis.com/generate_requests_per_model_per_day,
    limit: 0
  [{
    "@type": "type.googleapis.com/google.rpc.QuotaFailure",
    "violations": [{
      "quotaMetric": "generativelanguage.googleapis.com/generate_requests_per_model_per_day",
      "quotaId": "GenerateRequestsPerDayPerProjectPerModel"
    }]
  }]

The quota metric GenerateRequestsPerDayPerProjectPerModel with limit: 0 means zero remaining requests for gemini-2.5-flash-image on this day. The daily per-model quota resets overnight, and morning traffic exhausts it within 1–2 hours.

429 Quota Exhaustion — Wrapped Error

Error generating image with Gemini: Error: [GoogleGenerativeAI Error]:
Error fetching from https://generativelanguage.googleapis.com/v1beta/
models/gemini-2.5-flash-image:generateContent: [429 Too Many Requests]
You exceeded your current quota, please check your plan and billing details.

Empty Response Error

Error generating image with Gemini: Error: No image generated
    at eo (.next/server/app/ai/generate-image-gemini/route.js:1:21934)

Five occurrences in 25 seconds, suggesting a single user retrying rapidly after each failure.

Traffic Pattern

Hourly Error Breakdown

Hour (UTC)429 Rate LimitEmpty ResponseTotal
Feb 17 00:00055
Feb 17 02:0037037
Feb 17 04:0052052
Feb 17 06:0048048
Feb 17 08:00022
Feb 17 20:00066
Feb 18 06:0025025
Feb 18 07:0066066
Feb 18 12:00088
Feb 18 15:00066

Rate-limit (429) errors cluster between 02:00–08:00 UTC when the daily quota is consumed. Empty-response errors occur throughout the day independently of quota state. The pattern repeats daily: Google resets the quota overnight, and early-morning traffic exhausts it within 1–2 hours.

Root Cause

The Next.js API route calls gemini-2.5-flash-image:generateContent via the @google/generative-ai SDK. Two failure modes:

  • Quota exhaustion (429): The Gemini API key is on a plan with a low request-per-day limit. The SDK throws an error with the 429 status. The route catches this and logs it but returns HTTP 500 to the caller.
  • Empty response: The Gemini API accepts the request but returns no image data. The route throws Error: No image generated and returns HTTP 500.

Code Analysis

// web-app/app/ai/generate-image-gemini/route.ts
} catch (error) {
    console.error("Error generating image with Gemini:", error);
    return NextResponse.json(
        { error: error instanceof Error ? error.message : "Failed to generate image" },
        { status: 500 }  // ← always 500, even for 429
    );
}

The route creates a new GoogleGenerativeAI client on every request (no singleton, no connection pooling). The catch-all error handler returns HTTP 500 for all failures, including 429 rate limits. No retry logic, no rate limiting, no error classification, no circuit breaker.

Impact

  • User-facing: Users attempting to generate AI images get a 500 error. ~690 failed HTTP requests observed in traces over 2 days.
  • No data loss or cascading failures. The feature is self-contained.
  • Revenue impact: If image generation is part of a paid feature flow, users may abandon the flow.

Recommended Remediation

Immediate

  • Return HTTP 429 (not 500) when the Gemini API returns a rate-limit error — parse the error message for "429" and pass retry-after information to the caller
  • Return HTTP 503 with a user-friendly message when the API returns an empty response

Short-term

  • Review the Gemini API plan — upgrade quota or switch to a higher-tier plan if the current limits are insufficient for production traffic
  • Add client-side rate limiting — queue or throttle image generation requests to stay within API quota
  • Add a retry with exponential backoff for transient 429 errors before failing to the user

Medium-term

  • Add a fallback image generation provider (e.g., OpenAI DALL-E, Stability AI) for when Gemini quota is exhausted
  • Add quota monitoring — alert when Gemini API usage reaches 80% of the daily/minute limit so the team can act before users are affected

Want this level of investigation for your infrastructure?

Book a Demo →