BatchLAS is a SYCL-first batched linear algebra library with optional vendor backends for CUDA, ROCm, netlib BLAS/LAPACK, and oneMKL. The repository currently contains the C++ library, an optional pybind11-based Python package, a broad unit-test suite, benchmark executables, tuning scripts, and research notebooks used to validate newer eigensolver and factorization work.
- SYCL is mandatory for building the library.
- The project builds as
C++20and defaults toRelWithDebInfo. - The installed CMake package exports
BatchLAS::batchlasplus component libraries. - The repository includes active work on dense factorizations, spectral routines, orthogonalization, sparse eigensolvers, and performance benchmarking.
- Recommended development entry points are the CMake presets in
CMakePresets.json.
The public C++ headers under include/ currently expose these main groups of functionality.
gemm,gemv,symm,syrk,syr2k,trmm,trsmpotrf,getrf,getrs,getrigeqrf,orgqr,ormqrsyev,gesvd
spmmsyevxfor partial symmetric eigensolveslanczossteqr,stedc, and related tridiagonal helpersritz_valuesilukpreconditioning support
orthowith multiple orthogonalization algorithms- matrix generators and structured constructors
- norms, condition numbers, transpose, and related helpers
When BATCHLAS_BUILD_PYTHON=ON, the repository builds a batchlas Python package with NumPy dense-array support and SciPy sparse wrappers for the supported public APIs. The Python facade also exposes convenience helpers such as available_backends(), available_devices(), and compiled_features().
include/: public C++ headerssrc/: library implementation and backend/component targetstests/: GoogleTest-based unit tests and smoke-test subsetbenchmarks/: performance and accuracy benchmark executablespython/: pybind11 bindings, Python facade, and Python testsscripts/: benchmark campaign helpers and result-processing scriptsplayground/: notebooks and exploratory scripts for algorithm workdocs/: architecture notes and design documentation
Minimum build requirements:
- CMake 3.14+
- A C++20 compiler with SYCL support
- A SYCL runtime/toolchain discoverable by CMake
Common optional dependencies:
- CUDA Toolkit for NVIDIA backends
- ROCm for AMD backends
- LAPACKE and CBLAS for the netlib host backend
- oneMKL for the optional MKL backend
- Python 3, pybind11, NumPy, and SciPy for Python bindings
Notes:
- SYCL support is not optional in the current build system.
- The CMake logic is primarily written around IntelLLVM/Clang-style SYCL compilers.
- The default build type is
RelWithDebInfo, notDebug.
For a Linux-oriented environment setup with package suggestions and oneAPI notes, see AGENTS.md.
Configure and build using the checked-in presets:
cmake --preset dev
cmake --build --preset devUseful presets currently provided:
dev: defaultRelWithDebInfolibrary builddev-tests: library build with the full test suite enabledfast-dev: library build plus the smoke-test subsetbenchmarks: benchmark build with tuning support enabledcuda: optional CUDA-enabled build when the environment supports it
cmake -S . -B build \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DBATCHLAS_BUILD_TESTS=ON \
-DBATCHLAS_BUILD_BENCHMARKS=OFF \
-DBATCHLAS_BUILD_PYTHON=OFF
cmake --build build -j"$(nproc)"Common CMake options:
BATCHLAS_BUILD_TESTS: build unit testsBATCHLAS_BUILD_BENCHMARKS: build benchmark executablesBATCHLAS_BUILD_PYTHON: build the Python packageBATCHLAS_ENABLE_CUDA: enable CUDA backend supportBATCHLAS_ENABLE_ROCM: enable ROCm backend support even if no AMD GPU is auto-detectedBATCHLAS_ENABLE_NETLIB: enable the host netlib backendBATCHLAS_ENABLE_MKL: enable the oneMKL backendBATCHLAS_ENABLE_TUNING: enable tuning targets; intended for benchmark buildsBATCHLAS_CPU_TARGET: override SYCL CPU target selection (auto,native_cpu,spir64_x86_64,none)BATCHLAS_TEST_TARGET_SET: chooseallorsmokeBATCHLAS_AMD_ARCH: override ROCm target architectureBATCHLAS_NVIDIA_ARCH: override CUDA target architecture
Build tests and run them with either the preset or a manual build:
cmake --preset dev-tests
cmake --build --preset dev-tests
ctest --test-dir build/presets/dev-tests --output-on-failureFor a faster edit-build-test loop, the fast-dev preset builds only the smoke subset:
util_span_testsutil_vector_testsmatrix_tests
The repository contains a large benchmark suite under benchmarks/, including BLAS kernels, QR/SVD paths, eigensolvers, band reduction, and sparse workflows. A typical benchmark build looks like this:
cmake --preset benchmarks
cmake --build --preset benchmarksThe scripts/ directory contains campaign helpers and archived CSV outputs from prior runs. Tuning support is wired through BATCHLAS_ENABLE_TUNING and the optional BATCHLAS_TUNING_PROFILE cache entry.
Enable the Python package like this:
cmake -S . -B build \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DBATCHLAS_BUILD_PYTHON=ON \
-DBATCHLAS_BUILD_TESTS=ON
cmake --build build -j"$(nproc)"The build places the importable package under build/python, so a build-tree import looks like:
PYTHONPATH="$PWD/build/python" python3 -c "import batchlas; print(batchlas.available_backends())"The extension module is built with pybind11 and linked against the installed or in-tree BatchLAS::batchlas target.
After installation, the project exports a standard CMake package. A consuming project can use:
find_package(BatchLAS CONFIG REQUIRED)
target_link_libraries(my_target PRIVATE BatchLAS::batchlas)The install tree also exports the generated configuration headers needed by the public interface.
- The top-level
batchlastarget is an interface facade over split component libraries. - The repository includes implementation notes for ongoing work in the root markdown files and under
docs/. playground/contains exploratory notebooks and scripts used during algorithm development.
BatchLAS is licensed under the MIT License. See LICENSE for the full text.
