Production 500s and Staging Webhook Panics
Summary
Production has two distinct 500 errors: the Apple IAP plans endpoint fails because the subscription_platform table lacks an "apple" record, and the AI assistant token route throws an unhandled error for unauthenticated requests. Staging has a separate, higher-volume issue: nil pointer panics in the payment gateway webhook handler cause 502 Bad Gateway responses.
Affected Domains
| Environment | 5xx Type | Count (24h) | Endpoint | Root Cause |
|---|---|---|---|---|
| Production | 500 | 2 | GET /api/billing/apple/plans | Missing apple platform in DB |
| Production | 500 | 1 | POST /ai/workspace-ui-token | Unhandled auth error in Next.js route |
| Staging | 502 | ~19 | POST /api/billing/gateway/card/webhook | Nil pointer panic in Go handler |
| Staging | 500 | ~17 | POST /api/billing/alt-gateway/webhook | Alt payment provider handler error (not investigated) |
Production Issue 1: Apple Plans Endpoint — 500
Failed Requests
| Field | Request A | Request B |
|---|---|---|
| Time (UTC) | 13:16:09 | 13:18:54 |
| Status | 500 | 500 |
| Response | {"error":"Failed to fetch plans"} | {"error":"Failed to fetch plans"} |
| Latency | 5ms | 6ms |
| Device | iPhone, iOS 18.7, Safari WebView | iPhone, iOS 18.5, Safari WebView |
| Referer | /onboarding/trial_apple/12?platform=ios | /onboarding/trial_apple/12?platform=ios |
Both requests come from real iOS users on the Apple onboarding funnel (step 15 — the payment screen). Different session IDs and Cloudflare IPs confirm these are two separate users.
Root Cause (Source Code)
// Handler: internal/modules/billing/handlers.go:318-352
func (p PaymentsModule) ReadApplePlans(c echo.Context) error {
plans, err := GetAllPlans(p.db, "apple")
if err != nil {
log.Println(err.Error())
return c.JSON(http.StatusInternalServerError,
map[string]string{"error": "Failed to fetch plans"})
}
// ... build DTOs and return 200
}// Repository: internal/modules/billing/repository.go:370-380
func GetPlatformIDByCode(db *sqlx.DB, code string) (int, error) {
var id int
err := db.Get(&id,
`SELECT id FROM subscription_platform WHERE code = $1`, code)
if err != nil {
return 0, fmt.Errorf("failed to get platform ID: %w", err)
}
return id, nil
}Failure Chain
ReadApplePlans (handlers.go:318)
-> GetAllPlans(db, "apple") (repository.go:278)
-> GetPlatformIDByCode(db, "apple") (repository.go:370)
-> SELECT id FROM subscription_platform WHERE code = 'apple'
-> sql.ErrNoRows (no "apple" row exists)
-> returns: fmt.Errorf("failed to get platform ID: %w", err)
-> HTTP 500 {"error": "Failed to fetch plans"}User impact: Every iOS user reaching step 15 of the Apple onboarding funnel gets a 500 when the frontend calls this endpoint to load available plans. The payment screen fails to render plan options.
Production Issue 2: Assistant Token — 500
| Field | Value |
|---|---|
| Time | 13:15:58 UTC |
| URL | /ai/workspace-ui-token |
| Status | 500 |
| Response Size | 5 bytes |
| User-Agent | Googlebot/2.1 |
| Referer | /workspace |
Root Cause
// web-app/app/ai/workspace-ui-token/route.ts
export const POST = async (req: Request) => {
const { userId } = await getUserId(req) // returns null for bots
if (!userId) throw new Error("User not authenticated"); // <- unhandled
// ...
};The handler throws a bare Error instead of returning a proper HTTP response. Next.js catches the unhandled exception and returns 500. Fix: return a 401 response instead of throwing.
User impact: None. Only affects search engine crawlers.
Staging Issue: Payment Gateway Webhook Panic (502)
| Field | Value |
|---|---|
| Pod | api-server (staging) |
| Running Since | 26 days |
| Restart Count | 0 |
| CPU | 1m |
| Memory | 44Mi |
Traffic Source
| Field | Value |
|---|---|
| User-Agent | Go-http-client/2.0 |
| Origin | Payment gateway processor |
| Event Types | card_gate.order.updated, subscription.updated.v2 |
Stack Trace
echo: http: panic serving 10.244.x.x:59132:
runtime error: invalid memory address or nil pointer dereference
goroutine 578919 [running]:
payments.(*PaymentsModule).getPaymentBaseType(0x0?, 0x0?)
gateway_handlers.go:3179 +0x1c
payments.(*PaymentsModule).getTransactionTypes(0x199?, 0xc001167520)
gateway_handlers.go:3197 +0x1c
payments.(*PaymentsModule).handleCardWebhook(...)
gateway_handlers.go:972 +0xb0getPaymentBaseType receives a nil or zero-valued *Order pointer. Go panics are per-goroutine: the HTTP server recovers them but the TCP connection is already dropped. Nginx sees no response headers and returns 502. The pod stays Running with zero restarts because the HTTP server process itself is unaffected.
Impact
| Environment | Severity | Impact |
|---|---|---|
| Production | Medium | Apple IAP plans endpoint broken — iOS users on the Apple onboarding funnel cannot see payment plans. 2 confirmed failures from real users. |
| Production | Low | AI assistant token returns 500 for bots. No real user impact. |
| Staging | High | ~25% of payment gateway webhooks fail with 502. Payment state may drift as webhooks go unprocessed. Gateway retries mitigate partial impact. |
| Staging | Medium | Alt payment provider webhooks returning 500. Not yet investigated. |
Recommended Remediation
Immediate (Production)
- Insert apple platform record in the production subscription_platform table and add active plans for the Apple IAP funnel
- Return 401 instead of throwing Error in workspace-token route
Immediate (Staging)
- Add nil pointer checks in getPaymentBaseType and getTransactionTypes before any pointer dereference
- Add panic recovery middleware to the Echo HTTP server to return 500 JSON responses instead of dropping connections
Medium-term
- Investigate Alt payment provider webhook 500s (separate incident)
- Validate all incoming payment gateway webhook payloads before processing (null-check required fields)
- Add per-endpoint error rate alerting per domain to detect 5xx spikes earlier
Want this level of investigation for your infrastructure?
Book a Demo →