2-D Image Benchmarks

This page records representative 2-D image-stack combine/rejection benchmarks. Timings are machine-specific; see How To Get Maximum Performance for tuning guidance, optimized-workspace notes, and benchmark interpretation.

1. IRAF Benchmark: compact ndcombine() wrapper

TL;DR

In the local 512x512 output-only parity run below (N=7), imc is about 1.2-2.4x faster than IRAF IMCOMBINE across the detailed operation/dtype rows. When using 1k*1k, the number becomes ≳4x.

Extensively tested imc outputs match IRAF’s IMCOMBINE for supported compatibility cases.

FITS I/O times are included. For IRAF setup, ecl.e discovery, parity cases, and details, see benchmarks/IRAF/README.md.

The IRAF comparison is a task-level benchmark for supported output-only compatibility cases. It validates imc output-only results against IRAF, then times FITS input reads, stacking, ndcombine(..., diagnostics=None), and combined FITS output writes. It does not time diagnostics, spatial grow, or a pure Rust kernel microbenchmark. The same fused path is available from Standard Combiner as cmb.combine(method, rejectors=..., diagnostics=None) after cmb = Combiner(arr). The compact ndcombine() wrapper is used here because this benchmark is the IRAF-style compatibility/CLI-facing path. See How To Get Maximum Performance for what the fast path skips and why it is faster than diagnostic routes.

uv run --extra bench python benchmarks/IRAF/scripts/benchmark_iraf.py --width 512 --height 512

On the local 512x512 parity matrix after the fused-kernel work, one-case throughput by operation/dtype was:

op	dtype	N	imc ms	IRAF ms	IRAF/imc (speedup)
sigclip_mean	uint16	7	8.247	14.136	1.7x
ccdclip_mean	uint16	7	7.767	12.177	1.6x
pclip_mean	uint16	7	7.891	15.759	2.0x
minmax_mean	uint16	7	7.834	11.066	1.4x
sigclip_median	uint16	7	7.810	11.567	1.5x
ccdclip_median	uint16	7	7.348	12.325	1.7x
pclip_median	uint16	7	7.709	11.588	1.5x
minmax_median	uint16	7	7.750	10.320	1.3x
sigclip_mean	int16	7	5.805	14.045	2.4x
ccdclip_mean	int16	7	6.767	11.238	1.7x
pclip_mean	int16	7	5.727	12.106	2.1x
minmax_mean	int16	7	6.055	7.539	1.2x
sigclip_median	int16	7	6.105	11.115	1.8x
ccdclip_median	int16	7	6.429	10.931	1.7x
pclip_median	int16	7	5.803	12.272	2.1x
minmax_median	int16	7	6.136	9.430	1.5x
sigclip_mean	float32	7	7.942	15.148	1.9x
ccdclip_mean	float32	7	8.595	14.650	1.7x
pclip_mean	float32	7	8.012	14.110	1.8x
minmax_mean	float32	7	8.084	9.635	1.2x
sigclip_median	float32	7	9.350	14.458	1.5x
ccdclip_median	float32	7	8.183	14.416	1.8x
pclip_median	float32	7	8.017	14.121	1.8x
minmax_median	float32	7	7.898	12.640	1.6x

Rows are medians across the baseline, threshold, and input-BPM IRAF parity variants for each operation/dtype pair. This result is specifically for output-only mean/median combinations with sigclip, ccdclip, pclip, and minmax on the measured machine.

benchmark_iraf.py prints a full Markdown report, including package versions and OS/kernel details, to stdout. Use --output /tmp/imc-iraf-benchmark.md for a local artifact; the source tree tracks only this curated summary. The detailed benchmark harness notes live in benchmarks/IRAF/README.md.

2. Benchmark Table

Benchmark done on MBP 14” [2024, macOS 26.4.1, M4Pro(8P+4E/G20c/N16c/48G)]. New runs of benchmarks/benchmark_combine.py print an environment table with Python, NumPy, bottleneck, and OS/kernel details before the timing tables.

TL;DR

imc (Standard Combiner, diagnostics=None) is usually faster than the Astropy/NumPy and ccdproc comparison paths. The strongest rows are the frequently used median and sigma-clipped median combinations, especially for float32 images.

imc is the safe public Standard Combiner route. imc_chain is the Chained Combiner route that keeps rejection diagnostics. imc_opt is the prepared direct path with validation disabled: simple generic mean/median rows use reducers, while rejection rows use fused imcombiners.kernels calls where available.

imc_opt is much faster than Chained Combiner and usually faster than Standard Combiner because it skips object construction, validation, and default workspace copies. When imc_opt only near-ties Standard Combiner on rejection rows, that is expected: both paths are calling the same fused Rust kernel, so only Python wrapper/preparation overhead remains.

Columns:

imc	Standard Combiner on the original input dtype, ms. For supported single-rejector mean/median sigclip rows with `diagnostics=None`, this dispatches through the fused kernel.
imc_chain	Chained Combiner path, ms. For rejection rows this retains diagnostics and pipeline state, so it intentionally does more work than fast output-only routes. For simple rows it is the same basic `cmb.combine(...)` shape without rejection diagnostics.
imc_opt	optimized direct path on a prepared contiguous float workspace with `validate=False`, ms. For simple mean/median rows this uses the optimized imcombiners stack dispatcher. For sigclip rows this calls the fused `imck.sigclip_combine(...)` kernel and passes `mask=` directly.
ap_bn	NumPy `nanmean`/`nanmedian` for simple rows; Astropy `sigma_clip` plus NumPy `nan*` reduction for sigclip rows, ms
ccdproc	`ccdproc.Combiner`, ms
imc/opt	imc ÷ imc_opt - values above 1.0 mean the optimized direct path is faster than Standard Combiner
chain/opt	imc_chain ÷ imc_opt - Chained Combiner cost relative to the optimized direct path
ap/opt	ap_bn ÷ imc_opt - Astropy/NumPy comparison path vs the optimized direct path
ccd/opt	ccdproc ÷ imc_opt - ccdproc vs the optimized direct path

The benchmark validates imc, imc_chain, and imc_opt against the Astropy/NumPy reference before timing. ccdproc differences are informational only and are marked † when they exceed the tolerance; for sigclip rows, the script intentionally leaves ccdproc.Combiner.sigma_clipping() at its public default iteration behavior while the reference and imcombiners paths use maxiters=5.

Below are results from this command, regenerated after the rejection-kernel scratch-reuse and variance-domain sigma-clipping updates:

uv run --extra bench python benchmarks/benchmark_combine.py

3. No Mask

Simple

op	dtype	N	imc	imc_chain	imc_opt	ap_bn	ccdproc	imc/opt	chain/opt	ap/opt	ccd/opt
mean	uint8	5	0.81	0.83	0.35	1.39	3.27	2.3	2.3	3.9	9.2
median	uint8	5	1.37	1.44	0.86	20.94	50.07	1.6	1.7	24.3	58.0
mean	uint8	31	3.50	3.37	1.14	7.68	28.65	3.1	3.0	6.7	25.2
median	uint8	31	6.40	6.54	4.10	212.58	497.49	1.6	1.6	51.8	121.2
mean	uint16	5	0.72	0.72	0.29	1.38	3.25	2.5	2.5	4.8	11.3
median	uint16	5	1.41	1.50	0.91	20.82	50.42	1.5	1.6	22.8	55.3
mean	uint16	31	3.52	3.34	1.15	7.69	28.92	3.0	2.9	6.7	25.0
median	uint16	31	6.50	6.45	4.38	217.63	509.91	1.5	1.5	49.6	116.3
mean	int16	5	0.67	0.70	0.33	1.43	3.33	2.0	2.1	4.3	10.0
median	int16	5	1.32	1.31	0.90	20.35	50.83	1.5	1.5	22.6	56.5
mean	int16	31	3.47	3.36	1.07	7.74	29.11	3.2	3.1	7.2	27.2
median	int16	31	6.61	7.11	4.89	217.11	515.67	1.4	1.5	44.4	105.4
mean	int32	5	0.79	0.77	0.40	1.42	3.34	2.0	1.9	3.5	8.3
median	int32	5	1.43	1.44	0.93	21.45	51.29	1.5	1.5	23.0	55.1
mean	int32	31	4.30	4.27	1.40	8.05	29.47	3.1	3.0	5.7	21.0
median	int32	31	7.56	7.52	4.83	215.86	510.40	1.6	1.6	44.7	105.7
mean	float32	5	0.39	0.42	0.37	1.14	3.08	1.1	1.1	3.1	8.3
median	float32	5	1.00	1.00	0.91	21.05	51.27	1.1	1.1	23.2	56.5
mean	float32	31	1.54	1.49	1.05	5.94	27.57	1.5	1.4	5.7	26.2
median	float32	31	5.18	5.30	4.26	212.87	515.36	1.2	1.2	49.9	120.9

Sigclip

sigma=(3, 3), maxiters=5, ddof=0, cenfunc="median", clip_cen="mean". ccdproc uses maxiters=1 †.

op	dtype	N	imc	imc_chain	imc_opt	ap_bn	ccdproc	imc/opt	chain/opt	ap/opt	ccd/opt
sigclip_mean	uint8	5	1.92	5.37	1.57	11.64	15.35	1.23	3.42	7.42	9.79
sigclip_median	uint8	5	2.17	6.06	1.69	30.90	62.57	1.28	3.58	18.23	36.93
sigclip_mean	uint8	31	9.28	21.15	6.87	108.48	137.58†	1.35	3.08	15.79	20.02
sigclip_median	uint8	31	11.18	23.68	9.69	314.21	611.53†	1.15	2.44	32.44	63.13
sigclip_mean	uint16	5	2.01	5.53	1.65	11.75	15.51	1.22	3.35	7.13	9.41
sigclip_median	uint16	5	2.10	5.92	1.76	30.98	63.11	1.19	3.35	17.56	35.78
sigclip_mean	uint16	31	9.20	20.22	7.03	110.74	142.30†	1.31	2.88	15.75	20.24
sigclip_median	uint16	31	11.34	24.12	9.56	323.22	629.36†	1.19	2.52	33.82	65.85
sigclip_mean	int16	5	1.92	5.38	1.61	11.69	15.63	1.19	3.34	7.25	9.68
sigclip_median	int16	5	2.11	5.96	1.85	30.98	63.09	1.14	3.21	16.71	34.02
sigclip_mean	int16	31	9.24	20.60	6.93	110.92	142.66†	1.33	2.97	15.99	20.57
sigclip_median	int16	31	11.75	23.84	9.15	320.43	628.88†	1.28	2.61	35.03	68.75
sigclip_mean	int32	5	2.16	6.23	1.63	12.06	16.00	1.32	3.81	7.38	9.79
sigclip_median	int32	5	2.18	6.53	1.82	31.51	64.32	1.19	3.58	17.27	35.25
sigclip_mean	int32	31	10.69	24.54	7.37	112.88	144.04†	1.45	3.33	15.32	19.55
sigclip_median	int32	31	13.43	27.97	10.09	322.34	634.16†	1.33	2.77	31.95	62.86
sigclip_mean	float32	5	1.61	5.07	1.60	11.65	15.80	1.00	3.17	7.28	9.87
sigclip_median	float32	5	1.87	5.81	1.78	31.76	63.60	1.05	3.26	17.81	35.66
sigclip_mean	float32	31	7.39	19.41	6.87	110.03	142.60†	1.08	2.82	16.01	20.75
sigclip_median	float32	31	9.54	22.54	9.08	322.20	630.67†	1.05	2.48	35.48	69.44

4. With Mask (~2% pixels masked per frame)

Simple

op	dtype	N	imc	imc_chain	imc_opt	ap_bn	ccdproc	imc/opt	chain/opt	ap/opt	ccd/opt
mean	uint8	5	1.19	1.26	0.68	2.16	3.42	1.8	1.9	3.2	5.1
median	uint8	5	1.94	1.83	1.35	22.93	54.37	1.4	1.4	17.0	40.4
mean	uint8	31	7.10	6.81	4.18	12.02	29.90	1.7	1.6	2.9	7.1
median	uint8	31	10.35	10.01	6.64	221.46	513.60	1.6	1.5	33.3	77.3
mean	uint16	5	1.22	1.19	0.69	2.09	3.50	1.8	1.7	3.0	5.1
median	uint16	5	1.90	1.90	1.38	22.64	53.66	1.4	1.4	16.4	38.9
mean	uint16	31	7.41	7.14	3.47	12.17	30.06	2.1	2.1	3.5	8.7
median	uint16	31	10.41	10.18	7.22	225.31	524.08	1.4	1.4	31.2	72.6
mean	int16	5	1.19	1.23	0.68	2.13	3.50	1.7	1.8	3.1	5.1
median	int16	5	1.85	1.96	1.31	22.84	53.64	1.4	1.5	17.4	40.9
mean	int16	31	7.73	6.89	4.04	12.07	29.76	1.9	1.7	3.0	7.4
median	int16	31	10.13	10.22	7.23	221.46	524.65	1.4	1.4	30.6	72.6
mean	int32	5	1.42	1.45	0.76	2.15	3.51	1.9	1.9	2.8	4.6
median	int32	5	2.11	2.06	1.43	22.90	55.28	1.5	1.4	16.0	38.7
mean	int32	31	8.52	8.49	4.84	12.24	30.05	1.8	1.8	2.5	6.2
median	int32	31	11.40	11.71	7.11	220.92	525.46	1.6	1.6	31.1	73.9
mean	float32	5	0.97	0.93	0.76	1.87	3.22	1.3	1.2	2.5	4.2
median	float32	5	1.68	1.64	1.36	22.74	53.66	1.2	1.2	16.7	39.4
mean	float32	31	4.80	5.64	4.18	10.79	28.70	1.1	1.3	2.6	6.9
median	float32	31	8.16	8.26	7.55	222.21	520.42	1.1	1.1	29.4	68.9

Sigclip

sigma=(3, 3), maxiters=5, ddof=0, cenfunc="median", clip_cen="mean". ccdproc uses maxiters=1 †.

op	dtype	N	imc	imc_chain	imc_opt	ap_bn	ccdproc	imc/opt	chain/opt	ap/opt	ccd/opt
sigclip_mean	uint8	5	2.69	5.86	1.71	13.40	16.27	1.57	3.42	7.83	9.50
sigclip_median	uint8	5	2.79	6.62	1.77	33.20	65.79	1.57	3.74	18.72	37.10
sigclip_mean	uint8	31	13.65	23.84	7.92	121.33	139.87†	1.72	3.01	15.32	17.66
sigclip_median	uint8	31	16.47	27.94	11.39	329.44	623.07†	1.45	2.45	28.91	54.69
sigclip_mean	uint16	5	2.54	5.85	1.61	12.96	15.97	1.57	3.63	8.03	9.89
sigclip_median	uint16	5	2.74	6.76	1.92	33.74	65.54	1.43	3.52	17.57	34.13
sigclip_mean	uint16	31	13.85	23.96	8.09	120.70	142.25†	1.71	2.96	14.91	17.58
sigclip_median	uint16	31	17.47	28.22	10.83	330.36	635.34†	1.61	2.61	30.51	58.67
sigclip_mean	int16	5	2.48	5.90	1.60	13.00	15.97	1.55	3.69	8.14	9.99
sigclip_median	int16	5	2.72	6.77	1.80	33.36	66.94	1.51	3.77	18.56	37.25
sigclip_mean	int16	31	13.93	24.16	8.38	121.12	142.77†	1.66	2.88	14.45	17.03
sigclip_median	int16	31	16.67	27.76	11.25	336.98	642.56†	1.48	2.47	29.95	57.12
sigclip_mean	int32	5	2.67	6.68	1.61	13.30	16.16	1.66	4.14	8.24	10.01
sigclip_median	int32	5	2.86	7.17	1.77	34.00	66.45	1.61	4.05	19.18	37.49
sigclip_mean	int32	31	15.66	28.34	8.94	121.58	142.82†	1.75	3.17	13.61	15.98
sigclip_median	int32	31	18.30	31.20	11.41	332.24	643.49†	1.60	2.74	29.13	56.42
sigclip_mean	float32	5	2.28	5.69	1.52	12.79	15.67	1.50	3.75	8.43	10.33
sigclip_median	float32	5	2.48	6.38	1.92	33.86	66.55	1.30	3.33	17.67	34.73
sigclip_mean	float32	31	11.81	22.68	8.01	121.10	141.90†	1.47	2.83	15.11	17.71
sigclip_median	float32	31	14.70	25.96	11.46	332.39	635.18†	1.28	2.26	29.00	55.41