reducers

Reduction functions + Rust(rs), shortname rd.

Rust-backed reduction functions for NumPy arrays - plain (numpy-like) and NaN-aware. The functions I implemented are those listed in the numba documentation.

The target is

  1. much faster than numpy in many use cases,
  2. much faster than bottleneck in many use cases, and
  3. especially maximum performance for median and variance calculations, which are often bottlenecks in data processing pipelines.

Even though reducers is ≳2x slower than bottleneck for an n=30 1-D array (dominated by Rust overhead), it becomes ≳10 times faster than bottleneck for nD-combining larger arrays.

After Installation

Run the autotuner once on the machine where reducers will run:

python -m reducers.autotuner

It saves parallel-grain settings for that CPU and workload profile. Future import reducers calls apply those settings automatically. The built-in defaults are still valid; use python -m reducers.autotuner --reset to remove the saved tuning file and return to them.

First-Look API

import reducers as rd

# Plain: mean, min, max, ...
rd.mean(a)                     # include all values; NaN/inf propagate (np.mean)

# NaN-aware:
rd.nanmean(a)                  # skip NaN, keep inf (== np.nanmean behavior)
rd.nanmean(a, ignore_inf=True) # also drop +/-inf (finite-only)

The same plain vs. nan* pattern applies to sum, min, max, minmax, median, std, var, percentile, quantile, and weighted average, plus the extras lmedian (lower value-selecting median) and count_finite.

Axis reductions cover the two layouts optimized by the Rust kernels:

rd.nanmedian(stack, axis=0)      # reduce a stack shaped (N, H, W)
rd.nanmean(values, axis=-1)      # reduce contiguous trailing-axis slices
rd.nanpercentile(stack, [16, 50, 84], axis=0)

Maximum-performance Python API

The high-level API above is the default interface. For fixed hot loops where the caller already controls layout, import the low-level Python API as rdl:

import reducers.lowlevel as rdl

rdl calls the same Rust kernels while skipping the high-level Python normalization layer. With the default copy=False, arrays are passed directly to the extension; they must already have the dimensionality, C-contiguity, and supported dtype expected by the called kernel. Use copy=True only when you explicitly want np.ascontiguousarray(...) at that call site.

buf = np.ascontiguousarray(a, dtype=np.float64)

rdl.mean_valid(buf)                  # trusted values; no NaN/inf filtering
rdl.mean_skip_nonfinite(buf)         # skip NaN and +/-inf
rdl.var_mean_valid(buf, ddof=1)      # paired result from one Rust reducer

Weighted 1-D loops can call the fused weighted kernels without high-level return formatting. Choose the narrow primitive for the output terms needed:

weighted_sum = rdl.weighted_sum_only_skip_nonfinite(buf, w)
weighted_sum, sum_weights = rdl.weighted_sum_and_weights_skip_nonfinite(buf, w)
weighted_sum, sum_weights, unweighted_sum = rdl.weighted_sum_skip_nonfinite(buf, w)
average = rdl.weighted_average_skip_nonfinite(buf, w)

The skip policy applies to values in buf; weights attached to retained values are used as-is. The bare weighted kernels expect contiguous same-length 1-D buffers; make any required copies before the call.

For stack-style axis reductions, normalize to the 2-D layout the Rust axis kernel expects:

stack2 = np.ascontiguousarray(stack.reshape(stack.shape[0], -1))
median_image = rdl.reduce_axis0_valid(stack2, "median").reshape(stack.shape[1:])

For reusable per-output scratch buffers, in-place order statistics avoid an extra copy and may reorder the buffer:

scratch = np.ascontiguousarray(values, dtype=np.float64)
median = rdl.median_valid_in_place(scratch)

Some reducers can return intermediate quantities that are already computed by the same Rust scan:

std, mean = rd.nanstd(a, ddof=1, return_mean=True)

weighted_sum, sum_of_weights = rd.nansum(
    a, weights=w, return_sum_weights=True
)
weighted_sum, unweighted_sum = rd.nansum(
    a, weights=w, return_unweighted_sum=True
)
weighted_sum, unweighted_sum, sum_of_weights = rd.nansum(
    a, weights=w, return_unweighted_sum=True, return_sum_weights=True
)

You may use:

try:
    import reducers as rd
    mean = rd.mean
except ImportError:
    import numpy as np
    mean = np.mean

to completely replace numpy/bottleneck reductions with reducers in your code for the available reduction functions.

Semantics

One additional parameter is ignore_inf for nan* functions:

NaN +/-inf
mean / median / … (plain) propagate propagate (IEEE)
nanmean / nanmedian / … skip (np.nan* parity) keep
nan*(..., ignore_inf=True) skip skip (finite-only)

API shape (numpy-like subset)

mean(a, axis=None, *, validate=True)
nanmean(a, axis=None, *, ignore_inf=False, validate=True)
average(a, weights=None, axis=None, *, validate=True)
nanaverage(a, weights=None, axis=None, *, ignore_inf=False, validate=True)
sum(a, axis=None, *, weights=None, return_sum_weights=False,
    return_unweighted_sum=False, validate=True)
nansum(a, axis=None, *, weights=None, return_sum_weights=False,
       return_unweighted_sum=False, ignore_inf=False, validate=True)
var(a, axis=None, ddof=0, *, return_mean=False, validate=True)
std(a, axis=None, ddof=0, *, return_mean=False, validate=True)
minmax(a, axis=None, *, validate=True)
nanminmax(a, axis=None, *, ignore_inf=False, validate=True)
percentile(a, q, axis=None, *, validate=True)   # q in [0, 100], linear interp

Important notes:

  • axis may be None (default, whole-array), 0 or -1 (identical to a.ndim - 1); other axes raise NotImplementedError. This keeps hidden transpose/copy costs out of the API and lets the Rust kernels specialize for the supported layouts.
  • validate=False skips input prep for trusted hot loops where the caller already has a contiguous supported kernel dtype (float32, float64, bool, or a NumPy integer dtype). Integer and bool arrays are reduced directly without an up-front float copy; complex and object arrays are unsupported.
  • Integer and bool min, nanmin, max, nanmax, and lmedian preserve dtype (not converted to float). mean, sum, var/std, median, and percentiles still return floating results.
  • minmax is the fused plain endpoint reducer for axis=None; nanminmax is the fused NaN-skipping endpoint reducer for axis=None. Axis calls currently return two separate axis reductions.
  • For [nan]var and [nan]std, return_mean=True returns the already-computed mean alongside the variance or standard deviation.
  • For weighted [nan]sum, return_sum_weights=True and return_unweighted_sum=True expose quantities already available during the fused weighted scan; they require weights=....
  • Not a literal drop-in: no out, keepdims, where, dtype, or percentile method (linear only).
  • Weighted averages support weights=None, weights with the same shape as a, and 1-D weights along supported axes. A zero sum of retained weights raises ZeroDivisionError.

See Performance for the kernel techniques and benchmarks.

Rust Crate Use

reducers is also a Rust crate. The kernel modules do not depend on PyO3 or NumPy unless the Python extension feature is enabled.

[dependencies]
reducers = "<version>"
use reducers::{reducers_1d, ScanPolicy};

let values = [1.0_f64, 2.0, f64::NAN, 4.0];

assert!(reducers_1d::mean(&values, ScanPolicy::AllValues).is_nan());
assert_eq!(reducers_1d::mean(&values, ScanPolicy::SkipNan), 7.0 / 3.0);

The main Rust entry points are:

module use case
reducers_1d Whole-slice reducers such as mean, nanmean-style scans, order statistics, weighted averages, and integer/bool kernels.
axis Normalized 2-D axis kernels used by the Python layer; useful when the caller already controls layout.
finite::ScanPolicy Shared scan policy: plain all-values, trusted finite, skip-NaN, or finite-only.
parallel Runtime thread and grain controls.

Build local Rust API docs with:

cargo doc --no-default-features --open

The published Rust API reference is also available on docs.rs.