Software

Algebraic Programming (ALP)

IRs, communication layers, domain-specific languages, libraries and everything in-between for realising Algebraic Programming (ALP). Apache 2.0 licensed, free and open source available at:

GitHub: github.com/Algebraic-Programming
Gitee: gitee.com/CSL-ALP

A separate webpage describes the overall vision that drives Algebraic Programming.

Notable sub-projects:

ALP/GraphBLAS [GitHub ,Gitee,pre-print]
Lightweight Parallel Foundations (LPF) [GitHub,Gitee,white paper]
MLIR Linnea [GitHub]

This is open research: feel free to use, comment, discuss, create issues, or submit pull requests!

High-performance BSP

A package of high-performance BSP proto-apps distributed under GPL. The first release contains

a 2D sparse matrix--vector (SpMV) multiplication kernel which is competitive to other state-of-the-art methods for shared-memory SpMV multiplication; expect superior performance on large, well-partitionable matrices, on highly-NUMA systems.
a 1D complex-valued Fast Fourier Transform (FFT) which relies on sequential unordered FFT kernels (UFFTs), or on unordered generalised (shifted) FFT kernels (UGFFTs).
Four versions are included.
1. The first relies on unoptimised UFFTs only, using recursive applications of butterfly matrices.
2. The second uses the optimised Spiral DFT kernel instead. This requires a partial bit-reversal of the vector to transform the UFFT into a regular FFT. The freely available Spiral-generated code used supports FFTs of complex vectors of length up to 8192. Whenever local subvectors become larger, the BSP run aborts.
4. The third is like the above, but using FFTW3 instead of Spiral. This has no size limitations, but does require an initial auto-tuning step.
5. The fourth combines Spiral and FFTW; a local UFFT is transformed into a regular FFT as before. If the vector is small enough, Spiral is used; otherwise, FFTW is called instead.

See the corresponding paper for further details. Additions and further improvements may be forthcoming.

Software:
- v1.1: ANSI C99 and C++98 source code (adds FFTW support)
- v1.0: ANSI C99 and C++98 source code
Corresponding paper: MulticoreBSP for C: a high-performance library for shared-memory parallel programming

Note that the code depends on the MulticoreBSP for C and the Sparse Library software.

A Bulk Synchronous Parallel (BSP) library specifically targeting shared-memory computing. Its interface updates that of the BSPlib standard, and adds two new high-performance primitives. For more information and access to the freely available software, see the project homepage:

http://www.multicorebsp.com

Sparse Library:

update to v1.6.0 since November 19th, 2014

Changelog
License: GPL v3
v1.6.0: Library (linux x86_64), Source (ANSI C++ 1998 and ANSI C++ 2011)
v1.5.2: Library (linux x86_64), Source (ANSI C++ '98)
v1.5.1: Library (linux x86_64), Source (ANSI C++ '98)
v1.4.0: Library (linux x86_64), Source (ANSI C++ '98)
v1.3.1: Library (linux x86_64), Source (ANSI C++ '98)
v1.3.0: Library (linux x86_64), Source (ANSI C++ '98)
v1.2.0: Library (linux x86_64), Source (ANSI C++ '98)
v1.1.0: Library (linux x86_64), Source (ANSI C++ '98)
Code documentation (html)
Code documentation (pdf)

This Sparse Library aims to provide simple means to perform basic sparse matrix computations using a wide range of storage formats for sparse matrices, in a research-oriented setting. Currently it supports the following storage and multiplication schemes:

Triplet Scheme (TS, also known as the Coordinate scheme COO),
Compressed Row Storage (CRS, also known as compressed sparse row; CSR),
Incremental CRS (ICRS, see Koster, 2002),
Zig-Zag CRS (ZZ-CRS, see Yzelman & Bisseling, 2009),
ZZ-ICRS (see Yzelman & Bisseling, 2009),
Sparse vector matrix (SVM),
Hilbert-curve ordered TS (HTS, by Haase, Liebmann & Plank),
Bi-directional ICRS (BICRS, see Yzelman & Bisseling, 2011),
Hilbert-curve ordered BICRS (see Yzelman & Bisseling, 2012),
Hierarchical BICRS (HBICRS),
Block Hilbert (hard-coded sparse blocking with Hilbert-curve ordering on blocks and HBICRS),
Bisection Hilbert (as above, but with adaptive sparse blocking),
Compressed BICRS (CBICRS, see Yzelman & Roose, 2014),
Vectorised BICRS (vecBICRS, includes compression, see Yzelman, Roose, & Meerbergen, 2014),
and the Dense diagonal scheme (DD Matrix).

Since version 1.5 the Sparse Library also supports parallel schemes. The currently supported schemes are:

Block CO-H+ (alike to Block Hilbert, but parallelised; see Yzelman & Roose, 2012),
Row-distributed block CO-H (alike to Block Hilbert, but with explicit 1D partitioning; see Yzelman & Roose, 2012),
Row-distributed Hilbert (as the above scheme, but without sparse blocking),
OpenMP CRS (implicit 1D fine-grained parallelisation using openMP).
Row-distributed Hilbert-compressed block CO-H (only stores a delta array based on 1D Hilbert coordinates, which are unpacked during SpMV multiplication).

Matrix-market input format is supported. Binary load/save operations are supported via the TS scheme. Methods in the library support the following operations:

Ax=y: Regular and in-place sparse matrix dense vector multiplication.
xA=y: Left-sided dense vector sparse matrix vector multiplication.
AX=Y (and XA=Y): Multiple right-hand side sparse matrix dense vector multiplication (and its transpose operation). Here, X and Z are tall-skinny matrices, the width of which must be known at compile time.

Some strategies are implemented as utilities using the Sparse Library and make use of the sequential schemes in the Sparse Library. These are:

A benchmarking application (micbench) written explicitly for the Intel Xeon Phi: employs the parallel Row-distributed block CO-H scheme, but uses the Vectorised BICRS scheme instead of Compressed BICRS as its block-level data storage. See Yzelman, Roose, & Meerbergen, 2014 for an overview of the employed techniques.
Sequential cache-oblivious (CO) Separated Block Diagonal (SBD) sparse matrix--vector (SpMV) multiplication (sbdmv, works only with matrices in SBD form, see Yzelman & Bisseling, 2011).
Parallel shared-memory SpMV multiplication using globally shared vectors, with locally applying the CO-SBD method (McShared, implicit 2D distributed SBD-based scheme; see Yzelman, PhD dissertation, 2011).
An explicitly 2D distributed CO-SBD scheme (McDMV). See Yzelman & Roose, 2014. Supercedes the older McDirect utility).

Warning for when using the parallel schemes:

Most strategies read the hardware.info file in its current directory to get the number of threads to use. If this file does not exist, this number will be one by default. E.g., to use four threads, execute echo 4 > hardware.info prior to starting your experiment.

R-tree library:

The R-tree datastructure family is a tree-like datastructure particularly fit for handling spatial data. The R-tree library aims for implementing many R-tree variants in a generic fashion. See the SourceForge homepage for more details.

Cache Simulator:

Changelog
V1.1.0: Library (linux), Source (ANSI C++), License: GPL v3
Code documentation (html)
Code documentation (pdf)

The Cache Simulator is currently used as a tool for researching effect of given sparse matrix reorderings with respect to cache efficiency. The library supplies one with tools necessary for run-time cache simulation, according to an idealised cache model detailed in the documentation. Thus, no exact cache simulation is implied (see for this end, for example, the valgrind tool cachegrind, or have a look at PAPI).

Audio and picture compression in MATLAB:

MATLAB code
Manual (in Dutch)

This is software illustrating the use of the Fast Fourier Transform (FFT) in audio (for example MP3) and picture (JPG) compression by use of the MATLAB software package. The audio part was originally used in a lecture, part of a short 2-day course for high-school students. The picture part was used in a hands-on computer session by the same high-school students. A manual (in Dutch) directed to those students is also available.
The main functions of the package are sound_demo, basic_compression and jpg. See their respective .m files for their detailed use and usage. Any comments or additions are most welcome.

jpg.m was coded by Arno Swart.

EMM 2 DMM:

Souce (C++)
Executable (Linux x86_64)

Utility to convert a single Extended Matrix-Market file containing a distributed matrix, to the three-file distributed Matrix-Market format. Example use is ./emm2dmm test.emm which will produce test.emm.mtx, test.emm.v and test.emm.u.

text2bin:

Source (C++)
Executable (Linux x86_64)

Utility to convert a text-file containing an array of line-seperated integers to a binary file. Integers are assumed unsigned and representable in (less than) 32-bits (otherwise overflowing) and are written in binary as fixed-size 32-bits unsigned integers. Both decimal and hexadecimal text representations are readable.

tabularise:

Source (C++)
Executable (Linux x86_64)

Utility to convert lines read from stdin to a LaTeX tabular line; that is, intermediate spaces are truncated and replaced with a single & character. Parameters to the executable can be used to specify prefixes and postfixes for each line, as well as a character to put column values in between (for example to transform the numerical input '4' to '$4$'). Program exits by sending an interrupt (ctrl+c).

Select

Navigation

Index