INC-2026-02-18

Gemini Image Generation Quota Exhaustion

While reviewing logs on the customer’s infrastructure, ApexData spotted a pattern: the AI image generation feature was failing in daily waves — hundreds of errors clustered in the morning hours, then silence until the next day. The root cause: a third-party API quota was being exhausted within one to two hours of peak traffic. Every request after that point failed silently. Users were seeing generic error messages with no indication that the feature was effectively down for the rest of the day.

Date2026-02-18

SeverityMedium

StatusResolved

Affected Serviceweb-app

Detected ByAudit validation (log and trace analysis)

ClientACME Corp (EdTech platform)

Log Evidence

All 361 error logs originate from the web-app pods. Only 4 ingress-level logs were captured for this endpoint.

429 Quota Exhaustion — Full Error

* Quota exceeded for metric:
    generativelanguage.googleapis.com/generate_requests_per_model_per_day,
    limit: 0
  [{
    "@type": "type.googleapis.com/google.rpc.QuotaFailure",
    "violations": [{
      "quotaMetric": "generativelanguage.googleapis.com/generate_requests_per_model_per_day",
      "quotaId": "GenerateRequestsPerDayPerProjectPerModel"
    }]
  }]

The quota metric GenerateRequestsPerDayPerProjectPerModel with limit: 0 means zero remaining requests for gemini-2.5-flash-image on this day. The daily per-model quota resets overnight, and morning traffic exhausts it within 1–2 hours.

429 Quota Exhaustion — Wrapped Error

Error generating image with Gemini: Error: [GoogleGenerativeAI Error]:
Error fetching from https://generativelanguage.googleapis.com/v1beta/
models/gemini-2.5-flash-image:generateContent: [429 Too Many Requests]
You exceeded your current quota, please check your plan and billing details.

Empty Response Error

Error generating image with Gemini: Error: No image generated
    at eo (.next/server/app/ai/generate-image-gemini/route.js:1:21934)

Five occurrences in 25 seconds, suggesting a single user retrying rapidly after each failure.

Traffic Pattern

Hourly Error Breakdown

Hour (UTC)	429 Rate Limit	Empty Response	Total
Feb 17 00:00	0	5	5
Feb 17 02:00	37	0	37
Feb 17 04:00	52	0	52
Feb 17 06:00	48	0	48
Feb 17 08:00	0	2	2
Feb 17 20:00	0	6	6
Feb 18 06:00	25	0	25
Feb 18 07:00	66	0	66
Feb 18 12:00	0	8	8
Feb 18 15:00	0	6	6

Rate-limit (429) errors cluster between 02:00–08:00 UTC when the daily quota is consumed. Empty-response errors occur throughout the day independently of quota state. The pattern repeats daily: Google resets the quota overnight, and early-morning traffic exhausts it within 1–2 hours.

Root Cause

The Next.js API route calls gemini-2.5-flash-image:generateContent via the @google/generative-ai SDK. Two failure modes:

Quota exhaustion (429): The Gemini API key is on a plan with a low request-per-day limit. The SDK throws an error with the 429 status. The route catches this and logs it but returns HTTP 500 to the caller.
Empty response: The Gemini API accepts the request but returns no image data. The route throws Error: No image generated and returns HTTP 500.

Code Analysis

// web-app/app/ai/generate-image-gemini/route.ts
} catch (error) {
    console.error("Error generating image with Gemini:", error);
    return NextResponse.json(
        { error: error instanceof Error ? error.message : "Failed to generate image" },
        { status: 500 }  // ← always 500, even for 429
    );
}

The route creates a new GoogleGenerativeAI client on every request (no singleton, no connection pooling). The catch-all error handler returns HTTP 500 for all failures, including 429 rate limits. No retry logic, no rate limiting, no error classification, no circuit breaker.

Impact

User-facing: Users attempting to generate AI images get a 500 error. ~690 failed HTTP requests observed in traces over 2 days.
No data loss or cascading failures. The feature is self-contained.
Revenue impact: If image generation is part of a paid feature flow, users may abandon the flow.

Recommended Remediation

Immediate

Return HTTP 429 (not 500) when the Gemini API returns a rate-limit error — parse the error message for "429" and pass retry-after information to the caller
Return HTTP 503 with a user-friendly message when the API returns an empty response

Short-term

Review the Gemini API plan — upgrade quota or switch to a higher-tier plan if the current limits are insufficient for production traffic
Add client-side rate limiting — queue or throttle image generation requests to stay within API quota
Add a retry with exponential backoff for transient 429 errors before failing to the user

Medium-term

Add a fallback image generation provider (e.g., OpenAI DALL-E, Stability AI) for when Gemini quota is exhausted
Add quota monitoring — alert when Gemini API usage reaches 80% of the daily/minute limit so the team can act before users are affected

Want this level of investigation for your infrastructure?

Book a Demo →