Engineering Notes

June 4th, 2026 • Evgeny Potapov, CEO, co-founder

The failure that felt normal

An investigation story: a client's AI image feature failed for most of every day, on the same daily schedule, since the day it launched — a Gemini per-day quota that drained early each day. Because it had always behaved that way, the team called it normal and quietly lost the users who hit it. How an ApexData audit found the pattern, and why a defect present from the first deploy is the hardest kind to see.

May 28th, 2026 • Andrey Shamakhov, CTO, co-founder

What counts as evidence: grading the output of a tool-using agent

A pattern for tool-using AI agents — make findings the unit of output, gate the top-tier severity claim on objective evidence at record time, and require cross-source corroboration for any elevated claim at synthesis time. The agent's report becomes something a reviewer can rank.

EXPLAIN, offline — Reconstructing a query plan from collected statistics, with no connection to the database

May 27th, 2026 • Evgeny Potapov, CEO, co-founder

What Postgres knows about your tables

How to predict what PostgreSQL would do with a query without running it — the statistics the planner reads, where they live, and how an offline analyzer reconstructs the plan from collected pg_class, pg_indexes, and pg_stats data, plus the honest boundary between structural prediction and the live cost model.

Bending without breaking — May 15 release: OpenSSL durability, kernel fallback, PHP-FPM self-heal

May 15th, 2026 • Artur Asadullin, Lead Infrastructure Engineer

Release: 2026-05-15 - Bending without breaking: notes from a mid-May agent release

Notes from a mid-May 2026 release of our observability agent. A configurable trace sampling rate and per-process opt-out for high-volume tiers, an OpenSSL interception path that no longer tracks libssl internal layouts, a graceful-degradation mode for older kernels, PHP-FPM monitoring that self-heals across worker recycles, and a cleaner startup log.

Closer to the question — April 3 release: logs, dashboards, and kernel-level signals

April 3rd, 2026 • Elena Kuznetsova, Engineering Lead, co-founder

Release: 2026-04-03 - Closer to the question: notes from the April 3 release

Notes from an April 2026 release: the logs page redrawn with a collapsible filter sidebar and resizable columns, Lighthouse split onto its own tab in URL monitoring, an AI dashboard agent that reads its own rendered output and revises itself, and an observability agent that now records packet drops, DNS query shape, TCP round-trip time, and syslog alongside the application-protocol view.

Around a slow query — March 17 release: concurrent requests, offline EXPLAIN, deduplicated logs

March 17th, 2026 • Elena Kuznetsova, Engineering Lead, co-founder

Release: 2026-03-17 - Around a slow query: notes from the March 17 release

Notes from a March 2026 release: more context around a slow query in the platform — concurrent HTTP requests matched by interval overlap, an offline SQL Explain analyzer, and a cross-pod comparison toggle — plus log deduplication, request profiling on the OTLP pipeline, and resilient handling of malformed spans in the agent.

Two ways to draw a dashboard — March 3 release: generated dashboards, model fallback, hybrid log search

March 3rd, 2026 • Elena Kuznetsova, Engineering Lead, co-founder

Release: 2026-03-03 - Two ways to draw a dashboard: notes from a March release

Notes from a March 2026 release: AI-generated dashboards with a visualization linter and cross-provider model fallback, three hand-built dashboards for APM, pod restarts and deployments, and a hybrid full-text log search powered by Manticore.

Three more places to look — Feb 9 release: tree-search investigations, ingress dashboards, URL monitoring

February 9th, 2026 • Elena Kuznetsova, Engineering Lead, co-founder

Release: 2026-02-09 - Three more places to look: notes from a February release

Notes from a February 2026 release: a tree-search investigation mode, dedicated ingress-controller dashboards with throttling and scheduling delay, and synthetic URL monitoring from outside the cluster.

The cost of watching — Grogh 0.17782: halved GC overhead, shared OTLP connection, leaner eBPF probes

January 30th, 2026 • Artur Asadullin, Lead Infrastructure Engineer

Grogh 0.17782: The cost of watching — notes from an observability agent release

Notes from a recent release of our observability agent. Lower GC, network, eBPF, and memory cost on the host, plus visibility into Unix sockets, PostgreSQL schemas, Redis over TLS, and same-node container connections.

January 15th, 2026 • Evgeny Potapov, CEO, co-founder

AI-First Coding: Closing the Gap Between Skeptics and Practitioners in Dev Teams

Notes from a Tel Aviv meetup talk on AI-first coding in 2026: why developer skepticism is mostly outdated, where the real concerns sit (trust, security, fun), and how to share adoption inside a team without mandating it.

Surviving a wrong first guess — design notes: tree of hypotheses, specialist subagents, scored evidence

December 15th, 2025 • Andrey Shamakhov, CTO, co-founder

Tree-search agents: building an AI agent that survives a wrong first guess

Design notes from building an investigation agent for production incidents: tree of hypotheses, specialised subagents, evidence-scored evaluation, bounded search, and configurable models.

November 27th, 2025 • Evgeny Potapov, CEO, co-founder

Claude Code Workshop & Best Practices

Claude Code Workshop & Best Practices Speaker: Evgeny Potapov, ApexData co-founder & CEO

Building observability strategy — Part 3: runbooks, reversible deploys, recovery time

October 3rd, 2025 • Evgeny Potapov, CEO, co-founder

Building an effective observability strategy - Part 3

Once the layers from Part 1 are in place and the code from Part 2 has been written, the on-call experience changes. Part 3 of a series on observability strategy: the practices that reduce the rate of escalations from on-call rotations to the developers who built the system.

Building observability strategy — Part 2: 30% of the codebase, 28% more development time

September 18th, 2025 • Evgeny Potapov, CEO, co-founder

Building an effective observability strategy - Part 2

Observability is not a tool you buy; it is code you write, and a meaningful fraction of the code in a working production system. Part 2 of a series on observability strategy: how much code is instrumentation, what that means for how a team works, and what it costs in development time.

Building observability strategy — Part 1: an observability checklist covering user experience, tracing, and infrastructure

September 3rd, 2025 • Evgeny Potapov, CEO, co-founder

Building an effective observability strategy - Part 1

An observability strategy designed from the people the system serves, not the boxes it runs on. A top-down tour of the layers — user experience, business signals, tracing, service monitoring, infrastructure, user feedback — and what each one answers.