methodology

How the numbers are made.

Every number on the dashboard is produced by probatorium and shown as the average of all runs for a release. Here's exactly how it works — so you can reproduce it.

3
host cluster
x86_64
architectures
52
servers compared
01 · the cluster

One generator, two servers, a fat pipe.

A dedicated three-host cluster on Ubuntu 26.04: a load generator plus two servers under test — one x86_64, one arm64. The x86 server is an AMD Ryzen 7 7745HX with 64 GB of DDR5 in dual channel (2×32 GB); the arm64 server an Arm Cortex-A520 with 64 GB; the load generator an AMD Ryzen 9 9955HX with a brand-new 32 GB Crucial DDR5-5600 SODIMM kit (2×16 GB), dual-channel. Each is a separate machine on a high-bandwidth fabric, so the network is never the bottleneck; kernel tunables are recorded with each run.

02 · run shape

Two modes per cell.

A cell is one (server, scenario) pair, run for a fixed window after warm-up.

Saturation

Blast as hard as possible to find peak throughput — the capability ceiling.

Rated

A closed-loop, paced sweep that records tail latency at a target rate — the SLO story.

A release seeds several back-to-back runs over many dates, so a version accumulates many runs. The dashboard averages across all of them.

03 · what we measure

Eight signals per cell.

Saturation RPS Peak requests/sec — higher is better.
RPS @ SLO Max RPS sustained while p99 stays under 10 / 50 / 100 / 500 / 1000 ms.
Rated p99 Tail latency at the rated rate — lower is better.
Peak / steady RSS Resident memory under load.
Mean CPU Server CPU utilisation across the window.
GC pause p99 Garbage-collector pause — Go runtimes only.
Goroutine / FD HWM Concurrency and file-descriptor high-water marks.
Error % Sent-vs-handled delta; connect errors tracked separately.
cell status ok — served cleanlyn/a — can't serve that route/protocoldnf — crash, timeout or dial failuresuspect — completed but over error budget
04 · scenario families

No single workload tells the whole story.

Static

GET and POST across small-to-1 MB bodies, JSON encoding, connection churn and h2c.

Concurrency

1 → 1024 concurrent connections on the hottest GET paths.

Driver

Postgres (read · range · write · tx), Redis (get · set · pipeline), memcached and session read/write — all on the event loop.

Streaming

WebSocket echo, large-frame echo and hub broadcast (128 / 1024), plus SSE fan-out.

05 · how averaging works

Mean, with confidence attached.

For each (server, scenario) we take the arithmetic mean across every included run, and carry the min, max, standard deviation and run count so the dashboard can show a “±X% over N runs” chip. Latency-at-SLO is averaged per bound independently; missing measurements are never imputed to zero.

Short pre-flight smoke runs (under 30 seconds) are excluded by default — they're warm-up-dominated and would bias the mean. The provenance of every run (included or excluded, and why) is retained.

06 · fairness & honesty

No thumb on the scale.

See it for yourself.

52 servers across Go, Rust, C/C++, Java, C#, Python, Node and Bun — every result, averaged across runs.

Open the dashboard →