TODO#

Project-wide checklist derived from PLAN.md. Items are grouped by milestones; all are initially unchecked.

Implementation Milestones#

  • M0: Scaffolding

    • Repository layout: crates, Python package skeleton, CI with maturin

    • Basic CSR struct in Rust and initial PyO3 binding stub

    • Docs site scaffold (Sphinx + MyST)

  • M1: CSR Core

    • Kernels: SpMV, SpMM, reductions (sum, row/col sums), transpose

    • Indexing/slicing: rows/cols (read-only)

    • Cleanup ops: prune(eps), eliminate_zeros

    • Python OOP façade; NumPy interop; release GIL for compute

    • Parallel + SIMD implementations (Rayon + std::simd) for all v0.1 kernels

    • Implement and optimize the basic arithmetic kernels

  • M2: Conversions and Formats

    • Public COO and CSC types

    • Conversions: CSR <-> COO <-> CSC

    • Arithmetic: A + B, Hadamard A.multiply(B)

    • IO: Matrix Market (.mtx) and NPZ save/load

  • M3: Performance and Stability

    • Blocked/tiling strategies; cache/NUMA tuning

    • Benchmark suite and performance regression gates

    • API polish and error taxonomy

    • Documentation/tutorials pass

  • M4: Advanced

    • BSR format

    • Iterative solvers (CG/GMRES) and basic preconditioners

    • Plug-in kernel strategy

    • Optional f32 kernels and dtype growth

Cross-cutting#

  • Formats & Data Model

    • CSR default: values f64, indices i64; plan for f32/i32 as feature flags

    • ND baseline: COO-ND representation and invariants (v0.2)

      • COO-ND storage and invariants

      • Axis reductions: sum over axes

      • Axis permutation

      • ND→2D conversions: mode/axes unfold to CSR/CSC

      • Broadcasting elementwise ops (Hadamard)

      • mean and reshape

    • Future CSF for ND advanced ops (v0.4+)

  • Python API

    • lacuna.sparse classes: SparseArray/SparseMatrix, CSR/CSC/COO/COOND

    • Construction and conversion APIs; SciPy/NumPy bridges

    • Ops surface: matmul, add, multiply, transpose, sum; slicing semantics

  • Rust Design

    • Core traits: SparseFormat, SparseND, and op traits (SpMV/SpMM/Add/MulElem/Transpose/…)

    • Deterministic reductions where required; careful unsafe only in hot paths

Kernel Implementation Checklist (by crate)#

  • crates/lacuna-kernels (Rust optimized kernels, f64/i64, parallel+SIMD)

    • SpMV

      • Feature done

      • Tests done

    • SpMM

      • Feature done

      • Tests done

    • Reductions: sum, row_sums, col_sums

      • Feature done

      • Tests done

    • Transpose (CSR -> CSR)

      • Feature done

      • Tests done

    • Cleanup: eliminate_zeros, prune(eps)

      • Feature done

      • Tests done

    • Arithmetic: add_csr (A+B), mul_scalar (alpha*A)

      • Feature done

      • Tests done

    • Utilities / Refactors

      • Centralize reusable kernel utilities in util.rs (constants, helpers, UsizeF64Map)

      • Replace HashMap-based sparse accumulators with UsizeF64Map in reduce.rs and spmv.rs

      • Improve reduction paths: parallel small-dimension branches; SIMD stripe merge for column sums

    • ND COO

      • Sum / Permute axes / Reduce sum over axes (COO-ND kernels)

      • SpMV/SpMM along mode axis

      • ND→2D conversions (mode/axes unfold) in convert.rs

  • crates/lacuna-py (PyO3 bindings)

    • SpMV / SpMM

      • Feature done (Csr64.spmv/spmm and *_from_parts)

      • Tests done (indirectly covered via Python tests)

    • Reductions / Transpose / Cleanup / Arithmetic

      • Feature done (sum/row_sums/col_sums, transpose, prune/eliminate_zeros, add/mul_scalar bindings)

      • Tests done (indirectly covered via Python tests)

    • Bindings structure

      • Split monolithic src/lib.rs into modules: csr.rs, csc.rs, coo.rs, functions.rs; keep lib.rs as aggregator (no Python API changes)

    • ND bindings

      • Export ND wrappers: coond_sum_from_parts, coond_mean_from_parts, coond_reduce_sum_axes_from_parts, coond_reduce_mean_axes_from_parts, coond_permute_axes_from_parts, coond_reshape_from_parts, coond_hadamard_broadcast_from_parts, coond_mode_to_{csr,csc}_from_parts, coond_axes_to_{csr,csc}_from_parts

      • Registered in lib.rs

  • python/lacuna (High-level Python API: CSR facade)

    • SpMV

      • Feature done (__matmul__ 1D)

      • Tests done (python/tests/test_ops.py)

      • Benchmarks done (python/benchmarks/benchmark_spmv.py)

      • Docs done

    • SpMM

      • Feature done (__matmul__ 2D)

      • Tests done (python/tests/test_ops.py)

      • Benchmarks done (python/benchmarks/benchmark_spmm.py)

      • Docs done

    • Reductions: sum, row_sums, col_sums

      • Feature done (CSR.sum supports None/0/1)

      • Tests done (python/tests/test_ops.py)

      • Benchmarks done

      • Docs done

    • Transpose

      • Feature done (CSR.T)

      • Tests done (python/tests/test_ops.py/test_more_ops.py)

      • Benchmarks done

      • Docs done

    • Cleanup: prune, eliminate_zeros

      • Feature done

      • Tests done (python/tests/test_ops.py)

      • Benchmarks done

      • Docs done

    • Arithmetic: add, mul_scalar, sub, hadamard

      • Feature done (__add__, __mul__/__rmul__)

      • Tests done (python/tests/test_ops.py/test_more_ops.py)

      • Benchmarks done

      • Docs done

  • python/lacuna (High-level Python API: ND COO facade)

  • COOND

    • Feature done (sum, mean, reduce_sum_axes, reduce_mean_axes, permute_axes, reshape, hadamard_broadcast, mode_unfold_to_{csr,csc}, axes_unfold_to_{csr,csc})

    • Tests done (python/tests/test_nd.py)

  • Planned (per PLAN.md milestones)

    • Arithmetic: subtraction (A - B)

    • Arithmetic: Hadamard elementwise multiply A.multiply(B)

    • Format conversions: CSR <-> CSC, CSR <-> COO

    • ND baseline (COO-ND): elementwise ops with broadcasting (Hadamard), sum/mean over axes, transpose/permute, reshape

    • Reordering: CSR reorder (cache locality)

    • Cache-aware/blocked SpMM improvements

    • Block formats: BSR kernels

    • Dtype/index variants: f32 values, i32 indices (feature-gated)

    • ND advanced (CSF): kernels for tensordot/mode-n product; masked ops

  • Packaging & CI

    • Build wheels with maturin for Win/macOS/Linux and Python 3.10–3.13

    • GitHub Actions matrix for wheels + sdist

    • Versioning (SemVer) and licensing check (Apache-2.0)

  • Testing & Benchmarks

    • Rust unit/property tests

    • Python pytest parity tests vs NumPy/SciPy; randomized matrices

    • pytest-benchmark scenarios; SuiteSparse/synthetic datasets

    • Benchmarks import lacuna only from installed environment (no local path injection)

  • Documentation

    • User guides (MyST), API docs (autodoc/napoleon), and design notes

    • Tutorials: build CSR from COO; SpMV at scale; convert to SciPy