Stack Axis Benchmarks

Representative stack reductions.

Run locally with:

uv run --extra bench python benchmarks/benchmark_axis.py
uv run --extra bench python benchmarks/benchmark_axis.py --ops average

np, bn, and rd are trimmed median elapsed ms. np/rd and bn/rd above 1 mean reducers is faster; ratios are bolded when reducers ratio > 0.95.

Plain Results

General cube (100, 100, 100)

At this size, reducers wins the most cases.

function np (ms) bn (ms) rd (ms) np/rd bn/rd
mean 0.18 - 0.10 1.81x -
average 0.32 - 0.44 0.72x -
median 9.35 8.22 0.59 15.96x 14.02x
var 0.65 - 0.17 3.77x -
std 0.70 - 0.37 1.89x -
min 0.18 - 0.12 1.48x -
max 0.18 - 0.12 1.45x -
minmax 0.35 - 0.42 0.84x -
sum 0.17 - 0.11 1.61x -
percentile 15.90 - 1.68 9.48x -
quantile 15.84 - 1.53 10.38x -
function np (ms) bn (ms) rd (ms) np/rd bn/rd
mean 0.18 - 0.07 2.60x -
average 0.47 - 0.14 3.47x -
median 9.03 7.92 0.46 19.44x 17.06x
var 0.72 - 0.12 5.84x -
std 0.71 - 0.13 5.54x -
min 0.13 - 0.10 1.26x -
max 0.12 - 0.10 1.20x -
minmax 0.28 - 0.21 1.34x -
sum 0.18 - 0.07 2.63x -
percentile 14.80 - 1.42 10.40x -
quantile 15.11 - 1.44 10.50x -
function np (ms) bn (ms) rd (ms) np/rd bn/rd
mean 0.07 - 0.09 0.81x -
average 0.13 - 0.37 0.36x -
median 9.54 8.44 0.57 16.65x 14.75x
var 0.27 - 0.14 1.90x -
std 0.28 - 0.12 2.32x -
min 0.07 - 0.08 0.91x -
max 0.07 - 0.10 0.67x -
minmax 0.13 - 0.18 0.72x -
sum 0.07 - 0.09 0.83x -
percentile 15.79 - 1.77 8.94x -
quantile 16.59 - 1.74 9.54x -
function np (ms) bn (ms) rd (ms) np/rd bn/rd
mean 0.17 - 0.12 1.50x -
average 0.31 - 0.20 1.59x -
median 9.19 8.45 0.65 14.22x 13.07x
var 0.52 - 0.18 2.84x -
std 0.53 - 0.21 2.52x -
min 0.13 - 0.11 1.16x -
max 0.10 - 0.11 0.91x -
minmax 0.20 - 0.23 0.87x -
sum 0.17 - 0.11 1.56x -
percentile 15.21 - 2.23 6.81x -
quantile 15.29 - 1.80 8.47x -

Short axis=0 stacks

Cases where shape[0] is small with axis=0, reducers may lose to NumPy/Bottleneck for cheap reductions (like min, max, mean, sum).

Larger shape[0] or output size (prod(*shape[1:])) makes reducers win over NumPy/Bottleneck.

function np (ms) bn (ms) rd (ms) np/rd bn/rd
mean 0.02 - 0.02 1.13x -
average 0.04 - 0.06 0.76x -
median 0.89 0.68 0.25 3.52x 2.69x
var 0.08 - 0.05 1.64x -
std 0.07 - 0.05 1.57x -
min 0.02 - 0.03 0.73x -
max 0.02 - 0.03 0.76x -
minmax 0.04 - 0.05 0.78x -
sum 0.02 - 0.03 0.84x -
percentile 1.27 - 0.76 1.68x -
quantile 1.29 - 0.72 1.80x -
function np (ms) bn (ms) rd (ms) np/rd bn/rd
mean 0.06 - 0.05 1.13x -
average 0.11 - 0.14 0.77x -
median 3.19 2.65 0.30 10.82x 8.98x
var 0.20 - 0.05 3.83x -
std 0.21 - 0.05 3.94x -
min 0.06 - 0.07 0.79x -
max 0.06 - 0.06 0.89x -
minmax 0.11 - 0.13 0.88x -
sum 0.06 - 0.06 0.93x -
percentile 5.39 - 0.81 6.67x -
quantile 5.15 - 0.80 6.48x -
function np (ms) bn (ms) rd (ms) np/rd bn/rd
mean 0.53 - 0.24 2.20x -
average 0.92 - 1.03 0.90x -
median 30.48 24.01 4.80 6.35x 5.01x
var 2.01 - 0.49 4.07x -
std 2.04 - 0.47 4.30x -
min 0.46 - 0.30 1.56x -
max 0.49 - 0.31 1.57x -
minmax 0.93 - 0.52 1.78x -
sum 0.51 - 0.26 1.98x -
percentile 37.15 - 16.88 2.20x -
quantile 38.25 - 16.09 2.38x -
function np (ms) bn (ms) rd (ms) np/rd bn/rd
mean 0.01 - 0.02 0.83x -
average 0.03 - 0.06 0.45x -
median 1.04 0.83 0.22 4.68x 3.74x
var 0.04 - 0.04 1.07x -
std 0.05 - 0.04 1.12x -
min 0.01 - 0.01 0.73x -
max 0.01 - 0.01 0.69x -
minmax 0.02 - 0.03 0.69x -
sum 0.01 - 0.02 0.60x -
percentile 1.51 - 0.59 2.57x -
quantile 1.37 - 0.60 2.27x -
function np (ms) bn (ms) rd (ms) np/rd bn/rd
mean 0.03 - 0.04 0.73x -
average 0.05 - 0.13 0.40x -
median 3.41 2.90 0.33 10.17x 8.67x
var 0.10 - 0.06 1.58x -
std 0.10 - 0.05 1.88x -
min 0.02 - 0.03 0.73x -
max 0.03 - 0.03 0.82x -
minmax 0.05 - 0.06 0.85x -
sum 0.02 - 0.04 0.61x -
percentile 5.63 - 0.71 7.89x -
quantile 5.49 - 0.66 8.26x -
function np (ms) bn (ms) rd (ms) np/rd bn/rd
mean 0.38 - 0.19 1.96x -
average 0.46 - 1.04 0.44x -
median 31.85 25.16 3.24 9.84x 7.77x
var 1.20 - 0.37 3.27x -
std 1.23 - 0.40 3.09x -
min 0.24 - 0.19 1.29x -
max 0.25 - 0.19 1.28x -
minmax 0.49 - 0.36 1.36x -
sum 0.25 - 0.20 1.27x -
percentile 39.39 - 12.00 3.28x -
quantile 38.40 - 11.47 3.35x -

NaN-aware Results

Same shapes with about 1% NaNs.

np.nanpercentile / np.nanquantile are omitted: locally they were extremely slow (often >100x times slower than rd) and dominated benchmark runtime.

General cube (100, 100, 100)

At this size, reducers wins the most cases.

function np (ms) bn (ms) rd (ms) np/rd bn/rd
nanmean 0.99 0.74 0.18 5.39x 4.02x
nanaverage 3.89 - 0.47 8.24x -
nanmedian 30.50 8.36 0.47 65.20x 17.87x
nanvar 2.23 1.93 0.20 10.98x 9.49x
nanstd 2.25 1.92 0.24 9.40x 8.02x
nanmin 0.18 0.71 0.10 1.78x 7.14x
nanmax 0.17 0.72 0.11 1.63x 6.76x
nanminmax 0.34 1.45 0.22 1.58x 6.68x
nansum 0.53 0.72 0.17 3.02x 4.14x
function np (ms) bn (ms) rd (ms) np/rd bn/rd
nanmean 0.75 0.72 0.15 5.05x 4.86x
nanaverage 4.83 - 0.20 24.72x -
nanmedian 27.40 8.37 0.54 50.57x 15.45x
nanvar 2.06 1.80 0.17 12.17x 10.59x
nanstd 2.05 1.90 0.18 11.58x 10.71x
nanmin 0.13 0.72 0.15 0.86x 4.94x
nanmax 0.13 0.70 0.10 1.22x 6.84x
nanminmax 0.25 1.41 0.19 1.33x 7.41x
nansum 0.52 0.71 0.17 3.12x 4.28x
function np (ms) bn (ms) rd (ms) np/rd bn/rd
nanmean 0.75 0.74 0.20 3.80x 3.73x
nanaverage 3.26 - 0.49 6.68x -
nanmedian 28.53 8.30 0.51 55.97x 16.28x
nanvar 1.69 1.62 0.23 7.31x 6.99x
nanstd 1.68 1.88 0.21 7.96x 8.90x
nanmin 0.07 0.68 0.09 0.76x 7.72x
nanmax 0.07 0.67 0.08 0.82x 8.22x
nanminmax 0.14 1.41 0.17 0.81x 8.27x
nansum 0.31 0.73 0.21 1.52x 3.54x
function np (ms) bn (ms) rd (ms) np/rd bn/rd
nanmean 0.65 0.72 0.16 4.08x 4.48x
nanaverage 3.57 - 0.24 15.16x -
nanmedian 26.70 8.36 0.64 41.86x 13.10x
nanvar 1.75 1.81 0.21 8.40x 8.68x
nanstd 1.70 1.84 0.21 8.31x 8.98x
nanmin 0.11 0.70 0.10 1.10x 7.25x
nanmax 0.10 0.68 0.09 1.10x 7.22x
nanminmax 0.21 1.44 0.21 1.03x 6.96x
nansum 0.78 0.71 0.16 4.84x 4.42x

Short axis=0 stacks

Cases where shape[0] is small with axis=0, reducers may lose to NumPy/Bottleneck for cheap reductions (like min, max, mean, sum).

Larger shape[0] or output size (prod(*shape[1:])) makes reducers win over NumPy/Bottleneck.

function np (ms) bn (ms) rd (ms) np/rd bn/rd
nanmean 0.12 0.03 0.05 2.19x 0.64x
nanaverage 0.50 - 0.08 6.32x -
nanmedian 1.90 0.75 0.26 7.25x 2.86x
nanvar 0.26 0.10 0.08 3.31x 1.29x
nanstd 0.27 0.10 0.05 5.36x 2.06x
nanmin 0.02 0.04 0.02 1.05x 2.07x
nanmax 0.02 0.04 0.02 1.04x 1.95x
nanminmax 0.04 0.09 0.04 1.06x 2.12x
nansum 0.06 0.04 0.05 1.08x 0.71x
function np (ms) bn (ms) rd (ms) np/rd bn/rd
nanmean 0.31 0.14 0.14 2.25x 1.00x
nanaverage 1.20 - 0.20 6.02x -
nanmedian 7.54 2.85 0.64 11.73x 4.44x
nanvar 0.69 0.40 0.09 7.54x 4.42x
nanstd 0.67 0.41 0.08 8.54x 5.25x
nanmin 0.06 0.14 0.05 1.03x 2.48x
nanmax 0.06 0.14 0.06 1.02x 2.44x
nanminmax 0.11 0.27 0.11 1.03x 2.48x
nansum 0.17 0.14 0.14 1.20x 0.98x
function np (ms) bn (ms) rd (ms) np/rd bn/rd
nanmean 3.09 1.48 0.40 7.64x 3.66x
nanaverage 12.13 - 1.18 10.29x -
nanmedian 58.18 26.32 5.77 10.09x 4.56x
nanvar 6.67 4.92 0.57 11.68x 8.63x
nanstd 6.68 4.71 0.54 12.32x 8.69x
nanmin 0.48 1.37 0.21 2.32x 6.61x
nanmax 0.49 1.41 0.23 2.12x 6.13x
nanminmax 0.99 2.81 0.49 2.03x 5.73x
nansum 1.63 1.30 0.39 4.16x 3.34x
function np (ms) bn (ms) rd (ms) np/rd bn/rd
nanmean 0.10 0.04 0.06 1.63x 0.69x
nanaverage 0.46 - 0.08 5.70x -
nanmedian 1.85 0.75 0.25 7.42x 3.02x
nanvar 0.21 0.10 0.06 3.36x 1.51x
nanstd 0.21 0.10 0.05 4.31x 2.08x
nanmin 0.01 0.04 0.01 1.15x 4.77x
nanmax 0.01 0.04 0.01 1.14x 4.79x
nanminmax 0.02 0.09 0.02 1.16x 5.02x
nansum 0.04 0.04 0.06 0.65x 0.66x
function np (ms) bn (ms) rd (ms) np/rd bn/rd
nanmean 0.25 0.13 0.16 1.58x 0.85x
nanaverage 1.10 - 0.20 5.40x -
nanmedian 6.87 2.92 0.29 24.04x 10.22x
nanvar 0.54 0.39 0.08 6.37x 4.62x
nanstd 0.55 0.40 0.09 5.88x 4.21x
nanmin 0.02 0.13 0.02 1.07x 5.95x
nanmax 0.02 0.13 0.02 1.03x 6.02x
nanminmax 0.05 0.27 0.04 1.07x 6.13x
nansum 0.10 0.13 0.16 0.65x 0.84x
function np (ms) bn (ms) rd (ms) np/rd bn/rd
nanmean 2.53 1.49 0.41 6.21x 3.66x
nanaverage 10.74 - 1.16 9.27x -
nanmedian 53.31 26.11 3.37 15.81x 7.74x
nanvar 5.51 4.99 0.55 9.96x 9.02x
nanstd 5.40 4.98 0.59 9.21x 8.49x
nanmin 0.25 1.39 0.14 1.73x 9.76x
nanmax 0.25 1.38 0.16 1.57x 8.76x
nanminmax 0.51 2.75 0.29 1.74x 9.43x
nansum 1.01 1.33 0.41 2.45x 3.23x

Reading The Axis-0 Rows

Two factors explain most axis=0 rows:

  • .shape[0] drives the order-statistic win. NumPy’s median uses partition, but still pays broader axis/output overhead. The difference grows with stack depth and is largest for NaN-aware quantiles.
  • Output-element size (prod(*arr.shape[1:])) matters. reducers parallelizes across output elements. NumPy’s cheap strided loops can win on small outputs, but (11, 512, 512) has ~26x more outputs than (11, 100, 100) and reducers wins for most cases.
  • Some reducers stream along the non-stacking axes. For var, std, and count_finite, reducers walks each stack level as contiguous memory over the non-stacking axes and updates many output positions at once. For finite-only medians, it skips non-finite values while reading along the stacking axis, so the scratch buffer contains only useful values. In image-stack terms: it scans each stack level over its output positions, rather than repeatedly gathering one output position from every stack level into a tiny temporary list.

Practical rule: for tiny shallow stacks and cheap scans, NumPy/Bottleneck may be better. For larger outputs or expensive reducers (median, percentile, var/std), reducers is usually the better path.