← Back to Services
INC-2026-02-11

Production 500s and Staging Webhook Panics

Date2026-02-11
SeverityMedium (prod), High (staging)
StatusResolved
Affected Serviceapi-server, web-app
Detected ByObservability system (error rate alert)
ClientACME Corp (EdTech platform)

Summary

Production has two distinct 500 errors: the Apple IAP plans endpoint fails because the subscription_platform table lacks an "apple" record, and the AI assistant token route throws an unhandled error for unauthenticated requests. Staging has a separate, higher-volume issue: nil pointer panics in the payment gateway webhook handler cause 502 Bad Gateway responses.

Affected Domains

Environment5xx TypeCount (24h)EndpointRoot Cause
Production5002GET /api/billing/apple/plansMissing apple platform in DB
Production5001POST /ai/workspace-ui-tokenUnhandled auth error in Next.js route
Staging502~19POST /api/billing/gateway/card/webhookNil pointer panic in Go handler
Staging500~17POST /api/billing/alt-gateway/webhookAlt payment provider handler error (not investigated)

Production Issue 1: Apple Plans Endpoint — 500

Failed Requests

FieldRequest ARequest B
Time (UTC)13:16:0913:18:54
Status500500
Response{"error":"Failed to fetch plans"}{"error":"Failed to fetch plans"}
Latency5ms6ms
DeviceiPhone, iOS 18.7, Safari WebViewiPhone, iOS 18.5, Safari WebView
Referer/onboarding/trial_apple/12?platform=ios/onboarding/trial_apple/12?platform=ios

Both requests come from real iOS users on the Apple onboarding funnel (step 15 — the payment screen). Different session IDs and Cloudflare IPs confirm these are two separate users.

Root Cause (Source Code)

// Handler: internal/modules/billing/handlers.go:318-352
func (p PaymentsModule) ReadApplePlans(c echo.Context) error {
    plans, err := GetAllPlans(p.db, "apple")
    if err != nil {
        log.Println(err.Error())
        return c.JSON(http.StatusInternalServerError,
            map[string]string{"error": "Failed to fetch plans"})
    }
    // ... build DTOs and return 200
}
// Repository: internal/modules/billing/repository.go:370-380
func GetPlatformIDByCode(db *sqlx.DB, code string) (int, error) {
    var id int
    err := db.Get(&id,
        `SELECT id FROM subscription_platform WHERE code = $1`, code)
    if err != nil {
        return 0, fmt.Errorf("failed to get platform ID: %w", err)
    }
    return id, nil
}

Failure Chain

ReadApplePlans (handlers.go:318)
  -> GetAllPlans(db, "apple") (repository.go:278)
    -> GetPlatformIDByCode(db, "apple") (repository.go:370)
      -> SELECT id FROM subscription_platform WHERE code = 'apple'
      -> sql.ErrNoRows (no "apple" row exists)
    -> returns: fmt.Errorf("failed to get platform ID: %w", err)
-> HTTP 500 {"error": "Failed to fetch plans"}

User impact: Every iOS user reaching step 15 of the Apple onboarding funnel gets a 500 when the frontend calls this endpoint to load available plans. The payment screen fails to render plan options.

Production Issue 2: Assistant Token — 500

FieldValue
Time13:15:58 UTC
URL/ai/workspace-ui-token
Status500
Response Size5 bytes
User-AgentGooglebot/2.1
Referer/workspace

Root Cause

// web-app/app/ai/workspace-ui-token/route.ts
export const POST = async (req: Request) => {
    const { userId } = await getUserId(req) // returns null for bots
    if (!userId) throw new Error("User not authenticated"); // <- unhandled
    // ...
};

The handler throws a bare Error instead of returning a proper HTTP response. Next.js catches the unhandled exception and returns 500. Fix: return a 401 response instead of throwing.

User impact: None. Only affects search engine crawlers.

Staging Issue: Payment Gateway Webhook Panic (502)

FieldValue
Podapi-server (staging)
Running Since26 days
Restart Count0
CPU1m
Memory44Mi

Traffic Source

FieldValue
User-AgentGo-http-client/2.0
OriginPayment gateway processor
Event Typescard_gate.order.updated, subscription.updated.v2

Stack Trace

echo: http: panic serving 10.244.x.x:59132:
  runtime error: invalid memory address or nil pointer dereference
goroutine 578919 [running]:
  payments.(*PaymentsModule).getPaymentBaseType(0x0?, 0x0?)
    gateway_handlers.go:3179 +0x1c
  payments.(*PaymentsModule).getTransactionTypes(0x199?, 0xc001167520)
    gateway_handlers.go:3197 +0x1c
  payments.(*PaymentsModule).handleCardWebhook(...)
    gateway_handlers.go:972 +0xb0

getPaymentBaseType receives a nil or zero-valued *Order pointer. Go panics are per-goroutine: the HTTP server recovers them but the TCP connection is already dropped. Nginx sees no response headers and returns 502. The pod stays Running with zero restarts because the HTTP server process itself is unaffected.

Impact

EnvironmentSeverityImpact
ProductionMediumApple IAP plans endpoint broken — iOS users on the Apple onboarding funnel cannot see payment plans. 2 confirmed failures from real users.
ProductionLowAI assistant token returns 500 for bots. No real user impact.
StagingHigh~25% of payment gateway webhooks fail with 502. Payment state may drift as webhooks go unprocessed. Gateway retries mitigate partial impact.
StagingMediumAlt payment provider webhooks returning 500. Not yet investigated.

Recommended Remediation

Immediate (Production)

  • Insert apple platform record in the production subscription_platform table and add active plans for the Apple IAP funnel
  • Return 401 instead of throwing Error in workspace-token route

Immediate (Staging)

  • Add nil pointer checks in getPaymentBaseType and getTransactionTypes before any pointer dereference
  • Add panic recovery middleware to the Echo HTTP server to return 500 JSON responses instead of dropping connections

Medium-term

  • Investigate Alt payment provider webhook 500s (separate incident)
  • Validate all incoming payment gateway webhook payloads before processing (null-check required fields)
  • Add per-endpoint error rate alerting per domain to detect 5xx spikes earlier

Want this level of investigation for your infrastructure?

Book a Demo →