Stack Axis Benchmarks

Representative stack reductions.

Run locally with:

uv run --extra bench python benchmarks/benchmark_axis.py
uv run --extra bench python benchmarks/benchmark_axis.py --ops average

np, bn, and rd are trimmed median elapsed ms. np/rd and bn/rd above 1 mean reducers is faster; ratios are bolded when reducers ratio > 0.95.

std and nanstd use ddof=1.
average and nanaverage use 1-D weights whose length matches the reduced axis (100 for the cube; 11 or 31 for the short stacks). percentile/quantile use [16, 50, 84].
nanaverage compares against masked np.average; Bottleneck has no weighted-average reducer.
nanpercentile/nanquantile are omitted because NumPy’s axis path is extremely slow here.

Plain Results

General cube `(100, 100, 100)`

At this size, reducers wins the most cases.

function	np (ms)	bn (ms)	rd (ms)	np/rd	bn/rd
`mean`	0.18	-	0.10	1.81x	-
`average`	0.32	-	0.44	0.72x	-
`median`	9.35	8.22	0.59	15.96x	14.02x
`var`	0.65	-	0.17	3.77x	-
`std`	0.70	-	0.37	1.89x	-
`min`	0.18	-	0.12	1.48x	-
`max`	0.18	-	0.12	1.45x	-
`minmax`	0.35	-	0.42	0.84x	-
`sum`	0.17	-	0.11	1.61x	-
`percentile`	15.90	-	1.68	9.48x	-
`quantile`	15.84	-	1.53	10.38x	-

function	np (ms)	bn (ms)	rd (ms)	np/rd	bn/rd
`mean`	0.18	-	0.07	2.60x	-
`average`	0.47	-	0.14	3.47x	-
`median`	9.03	7.92	0.46	19.44x	17.06x
`var`	0.72	-	0.12	5.84x	-
`std`	0.71	-	0.13	5.54x	-
`min`	0.13	-	0.10	1.26x	-
`max`	0.12	-	0.10	1.20x	-
`minmax`	0.28	-	0.21	1.34x	-
`sum`	0.18	-	0.07	2.63x	-
`percentile`	14.80	-	1.42	10.40x	-
`quantile`	15.11	-	1.44	10.50x	-

function	np (ms)	bn (ms)	rd (ms)	np/rd	bn/rd
`mean`	0.07	-	0.09	0.81x	-
`average`	0.13	-	0.37	0.36x	-
`median`	9.54	8.44	0.57	16.65x	14.75x
`var`	0.27	-	0.14	1.90x	-
`std`	0.28	-	0.12	2.32x	-
`min`	0.07	-	0.08	0.91x	-
`max`	0.07	-	0.10	0.67x	-
`minmax`	0.13	-	0.18	0.72x	-
`sum`	0.07	-	0.09	0.83x	-
`percentile`	15.79	-	1.77	8.94x	-
`quantile`	16.59	-	1.74	9.54x	-

function	np (ms)	bn (ms)	rd (ms)	np/rd	bn/rd
`mean`	0.17	-	0.12	1.50x	-
`average`	0.31	-	0.20	1.59x	-
`median`	9.19	8.45	0.65	14.22x	13.07x
`var`	0.52	-	0.18	2.84x	-
`std`	0.53	-	0.21	2.52x	-
`min`	0.13	-	0.11	1.16x	-
`max`	0.10	-	0.11	0.91x	-
`minmax`	0.20	-	0.23	0.87x	-
`sum`	0.17	-	0.11	1.56x	-
`percentile`	15.21	-	2.23	6.81x	-
`quantile`	15.29	-	1.80	8.47x	-

Short `axis=0` stacks

Cases where shape[0] is small with axis=0, reducers may lose to NumPy/Bottleneck for cheap reductions (like min, max, mean, sum).

Larger shape[0] or output size (prod(*shape[1:])) makes reducers win over NumPy/Bottleneck.

function	np (ms)	bn (ms)	rd (ms)	np/rd	bn/rd
`mean`	0.02	-	0.02	1.13x	-
`average`	0.04	-	0.06	0.76x	-
`median`	0.89	0.68	0.25	3.52x	2.69x
`var`	0.08	-	0.05	1.64x	-
`std`	0.07	-	0.05	1.57x	-
`min`	0.02	-	0.03	0.73x	-
`max`	0.02	-	0.03	0.76x	-
`minmax`	0.04	-	0.05	0.78x	-
`sum`	0.02	-	0.03	0.84x	-
`percentile`	1.27	-	0.76	1.68x	-
`quantile`	1.29	-	0.72	1.80x	-

function	np (ms)	bn (ms)	rd (ms)	np/rd	bn/rd
`mean`	0.06	-	0.05	1.13x	-
`average`	0.11	-	0.14	0.77x	-
`median`	3.19	2.65	0.30	10.82x	8.98x
`var`	0.20	-	0.05	3.83x	-
`std`	0.21	-	0.05	3.94x	-
`min`	0.06	-	0.07	0.79x	-
`max`	0.06	-	0.06	0.89x	-
`minmax`	0.11	-	0.13	0.88x	-
`sum`	0.06	-	0.06	0.93x	-
`percentile`	5.39	-	0.81	6.67x	-
`quantile`	5.15	-	0.80	6.48x	-

function	np (ms)	bn (ms)	rd (ms)	np/rd	bn/rd
`mean`	0.53	-	0.24	2.20x	-
`average`	0.92	-	1.03	0.90x	-
`median`	30.48	24.01	4.80	6.35x	5.01x
`var`	2.01	-	0.49	4.07x	-
`std`	2.04	-	0.47	4.30x	-
`min`	0.46	-	0.30	1.56x	-
`max`	0.49	-	0.31	1.57x	-
`minmax`	0.93	-	0.52	1.78x	-
`sum`	0.51	-	0.26	1.98x	-
`percentile`	37.15	-	16.88	2.20x	-
`quantile`	38.25	-	16.09	2.38x	-

function	np (ms)	bn (ms)	rd (ms)	np/rd	bn/rd
`mean`	0.01	-	0.02	0.83x	-
`average`	0.03	-	0.06	0.45x	-
`median`	1.04	0.83	0.22	4.68x	3.74x
`var`	0.04	-	0.04	1.07x	-
`std`	0.05	-	0.04	1.12x	-
`min`	0.01	-	0.01	0.73x	-
`max`	0.01	-	0.01	0.69x	-
`minmax`	0.02	-	0.03	0.69x	-
`sum`	0.01	-	0.02	0.60x	-
`percentile`	1.51	-	0.59	2.57x	-
`quantile`	1.37	-	0.60	2.27x	-

function	np (ms)	bn (ms)	rd (ms)	np/rd	bn/rd
`mean`	0.03	-	0.04	0.73x	-
`average`	0.05	-	0.13	0.40x	-
`median`	3.41	2.90	0.33	10.17x	8.67x
`var`	0.10	-	0.06	1.58x	-
`std`	0.10	-	0.05	1.88x	-
`min`	0.02	-	0.03	0.73x	-
`max`	0.03	-	0.03	0.82x	-
`minmax`	0.05	-	0.06	0.85x	-
`sum`	0.02	-	0.04	0.61x	-
`percentile`	5.63	-	0.71	7.89x	-
`quantile`	5.49	-	0.66	8.26x	-

function	np (ms)	bn (ms)	rd (ms)	np/rd	bn/rd
`mean`	0.38	-	0.19	1.96x	-
`average`	0.46	-	1.04	0.44x	-
`median`	31.85	25.16	3.24	9.84x	7.77x
`var`	1.20	-	0.37	3.27x	-
`std`	1.23	-	0.40	3.09x	-
`min`	0.24	-	0.19	1.29x	-
`max`	0.25	-	0.19	1.28x	-
`minmax`	0.49	-	0.36	1.36x	-
`sum`	0.25	-	0.20	1.27x	-
`percentile`	39.39	-	12.00	3.28x	-
`quantile`	38.40	-	11.47	3.35x	-

NaN-aware Results

Same shapes with about 1% NaNs.

np.nanpercentile / np.nanquantile are omitted: locally they were extremely slow (often >100x times slower than rd) and dominated benchmark runtime.

General cube `(100, 100, 100)`

At this size, reducers wins the most cases.

function	np (ms)	bn (ms)	rd (ms)	np/rd	bn/rd
`nanmean`	0.99	0.74	0.18	5.39x	4.02x
`nanaverage`	3.89	-	0.47	8.24x	-
`nanmedian`	30.50	8.36	0.47	65.20x	17.87x
`nanvar`	2.23	1.93	0.20	10.98x	9.49x
`nanstd`	2.25	1.92	0.24	9.40x	8.02x
`nanmin`	0.18	0.71	0.10	1.78x	7.14x
`nanmax`	0.17	0.72	0.11	1.63x	6.76x
`nanminmax`	0.34	1.45	0.22	1.58x	6.68x
`nansum`	0.53	0.72	0.17	3.02x	4.14x

function	np (ms)	bn (ms)	rd (ms)	np/rd	bn/rd
`nanmean`	0.75	0.72	0.15	5.05x	4.86x
`nanaverage`	4.83	-	0.20	24.72x	-
`nanmedian`	27.40	8.37	0.54	50.57x	15.45x
`nanvar`	2.06	1.80	0.17	12.17x	10.59x
`nanstd`	2.05	1.90	0.18	11.58x	10.71x
`nanmin`	0.13	0.72	0.15	0.86x	4.94x
`nanmax`	0.13	0.70	0.10	1.22x	6.84x
`nanminmax`	0.25	1.41	0.19	1.33x	7.41x
`nansum`	0.52	0.71	0.17	3.12x	4.28x

function	np (ms)	bn (ms)	rd (ms)	np/rd	bn/rd
`nanmean`	0.75	0.74	0.20	3.80x	3.73x
`nanaverage`	3.26	-	0.49	6.68x	-
`nanmedian`	28.53	8.30	0.51	55.97x	16.28x
`nanvar`	1.69	1.62	0.23	7.31x	6.99x
`nanstd`	1.68	1.88	0.21	7.96x	8.90x
`nanmin`	0.07	0.68	0.09	0.76x	7.72x
`nanmax`	0.07	0.67	0.08	0.82x	8.22x
`nanminmax`	0.14	1.41	0.17	0.81x	8.27x
`nansum`	0.31	0.73	0.21	1.52x	3.54x

function	np (ms)	bn (ms)	rd (ms)	np/rd	bn/rd
`nanmean`	0.65	0.72	0.16	4.08x	4.48x
`nanaverage`	3.57	-	0.24	15.16x	-
`nanmedian`	26.70	8.36	0.64	41.86x	13.10x
`nanvar`	1.75	1.81	0.21	8.40x	8.68x
`nanstd`	1.70	1.84	0.21	8.31x	8.98x
`nanmin`	0.11	0.70	0.10	1.10x	7.25x
`nanmax`	0.10	0.68	0.09	1.10x	7.22x
`nanminmax`	0.21	1.44	0.21	1.03x	6.96x
`nansum`	0.78	0.71	0.16	4.84x	4.42x

Short `axis=0` stacks

Cases where shape[0] is small with axis=0, reducers may lose to NumPy/Bottleneck for cheap reductions (like min, max, mean, sum).

Larger shape[0] or output size (prod(*shape[1:])) makes reducers win over NumPy/Bottleneck.

function	np (ms)	bn (ms)	rd (ms)	np/rd	bn/rd
`nanmean`	0.12	0.03	0.05	2.19x	0.64x
`nanaverage`	0.50	-	0.08	6.32x	-
`nanmedian`	1.90	0.75	0.26	7.25x	2.86x
`nanvar`	0.26	0.10	0.08	3.31x	1.29x
`nanstd`	0.27	0.10	0.05	5.36x	2.06x
`nanmin`	0.02	0.04	0.02	1.05x	2.07x
`nanmax`	0.02	0.04	0.02	1.04x	1.95x
`nanminmax`	0.04	0.09	0.04	1.06x	2.12x
`nansum`	0.06	0.04	0.05	1.08x	0.71x

function	np (ms)	bn (ms)	rd (ms)	np/rd	bn/rd
`nanmean`	0.31	0.14	0.14	2.25x	1.00x
`nanaverage`	1.20	-	0.20	6.02x	-
`nanmedian`	7.54	2.85	0.64	11.73x	4.44x
`nanvar`	0.69	0.40	0.09	7.54x	4.42x
`nanstd`	0.67	0.41	0.08	8.54x	5.25x
`nanmin`	0.06	0.14	0.05	1.03x	2.48x
`nanmax`	0.06	0.14	0.06	1.02x	2.44x
`nanminmax`	0.11	0.27	0.11	1.03x	2.48x
`nansum`	0.17	0.14	0.14	1.20x	0.98x

function	np (ms)	bn (ms)	rd (ms)	np/rd	bn/rd
`nanmean`	3.09	1.48	0.40	7.64x	3.66x
`nanaverage`	12.13	-	1.18	10.29x	-
`nanmedian`	58.18	26.32	5.77	10.09x	4.56x
`nanvar`	6.67	4.92	0.57	11.68x	8.63x
`nanstd`	6.68	4.71	0.54	12.32x	8.69x
`nanmin`	0.48	1.37	0.21	2.32x	6.61x
`nanmax`	0.49	1.41	0.23	2.12x	6.13x
`nanminmax`	0.99	2.81	0.49	2.03x	5.73x
`nansum`	1.63	1.30	0.39	4.16x	3.34x

function	np (ms)	bn (ms)	rd (ms)	np/rd	bn/rd
`nanmean`	0.10	0.04	0.06	1.63x	0.69x
`nanaverage`	0.46	-	0.08	5.70x	-
`nanmedian`	1.85	0.75	0.25	7.42x	3.02x
`nanvar`	0.21	0.10	0.06	3.36x	1.51x
`nanstd`	0.21	0.10	0.05	4.31x	2.08x
`nanmin`	0.01	0.04	0.01	1.15x	4.77x
`nanmax`	0.01	0.04	0.01	1.14x	4.79x
`nanminmax`	0.02	0.09	0.02	1.16x	5.02x
`nansum`	0.04	0.04	0.06	0.65x	0.66x

function	np (ms)	bn (ms)	rd (ms)	np/rd	bn/rd
`nanmean`	0.25	0.13	0.16	1.58x	0.85x
`nanaverage`	1.10	-	0.20	5.40x	-
`nanmedian`	6.87	2.92	0.29	24.04x	10.22x
`nanvar`	0.54	0.39	0.08	6.37x	4.62x
`nanstd`	0.55	0.40	0.09	5.88x	4.21x
`nanmin`	0.02	0.13	0.02	1.07x	5.95x
`nanmax`	0.02	0.13	0.02	1.03x	6.02x
`nanminmax`	0.05	0.27	0.04	1.07x	6.13x
`nansum`	0.10	0.13	0.16	0.65x	0.84x

function	np (ms)	bn (ms)	rd (ms)	np/rd	bn/rd
`nanmean`	2.53	1.49	0.41	6.21x	3.66x
`nanaverage`	10.74	-	1.16	9.27x	-
`nanmedian`	53.31	26.11	3.37	15.81x	7.74x
`nanvar`	5.51	4.99	0.55	9.96x	9.02x
`nanstd`	5.40	4.98	0.59	9.21x	8.49x
`nanmin`	0.25	1.39	0.14	1.73x	9.76x
`nanmax`	0.25	1.38	0.16	1.57x	8.76x
`nanminmax`	0.51	2.75	0.29	1.74x	9.43x
`nansum`	1.01	1.33	0.41	2.45x	3.23x

Reading The Axis-0 Rows

Two factors explain most axis=0 rows:

.shape[0] drives the order-statistic win. NumPy’s median uses partition, but still pays broader axis/output overhead. The difference grows with stack depth and is largest for NaN-aware quantiles.
Output-element size (prod(*arr.shape[1:])) matters. reducers parallelizes across output elements. NumPy’s cheap strided loops can win on small outputs, but (11, 512, 512) has ~26x more outputs than (11, 100, 100) and reducers wins for most cases.
Some reducers stream along the non-stacking axes. For var, std, and count_finite, reducers walks each stack level as contiguous memory over the non-stacking axes and updates many output positions at once. For finite-only medians, it skips non-finite values while reading along the stacking axis, so the scratch buffer contains only useful values. In image-stack terms: it scans each stack level over its output positions, rather than repeatedly gathering one output position from every stack level into a tiny temporary list.

Practical rule: for tiny shallow stacks and cheap scans, NumPy/Bottleneck may be better. For larger outputs or expensive reducers (median, percentile, var/std), reducers is usually the better path.

Plain Results

General cube (100, 100, 100)

Short axis=0 stacks

NaN-aware Results

General cube (100, 100, 100)

Short axis=0 stacks

Reading The Axis-0 Rows

General cube `(100, 100, 100)`

Short `axis=0` stacks

General cube `(100, 100, 100)`

Short `axis=0` stacks