# Software

### High-performance BSP

A package of high-performance BSP proto-apps distributed under GPL. The first release contains- a 2D sparse matrix--vector (SpMV) multiplication kernel which is competitive to other state-of-the-art methods for shared-memory SpMV multiplication; expect superior performance on large, well-partitionable matrices, on highly-NUMA systems.
- a 1D complex-valued Fast Fourier Transform (FFT) which relies on sequential unordered FFT kernels (UFFTs), or on unordered generalised (shifted) FFT kernels (UGFFTs).

Four versions are included.- The first relies on unoptimised UFFTs only, using recursive applications of butterfly matrices.
- The second uses the optimised Spiral DFT kernel instead. This requires a partial bit-reversal of the vector to transform the UFFT into a regular FFT. The freely available Spiral-generated code used supports FFTs of complex vectors of length up to 8192. Whenever local subvectors become larger, the BSP run aborts.
- The third is like the above, but using FFTW3 instead of Spiral. This has no size limitations, but does require an initial auto-tuning step.
- The fourth combines Spiral and FFTW; a local UFFT is transformed into a regular FFT as before. If the vector is small enough, Spiral is used; otherwise, FFTW is called instead.

- Software:
- v1.1: ANSI C99 and C++98 source code (adds FFTW support)
- v1.0: ANSI C99 and C++98 source code

- Corresponding paper: MulticoreBSP for C: a high-performance library for shared-memory parallel programming

### MulticoreBSP:

A Bulk Synchronous Parallel (BSP) library specifically targeting shared-memory computing. Its interface updates that of the BSPlib standard, and adds two new high-performance primitives. For more information and access to the freely available software, see the project homepage:### Sparse Library:

##### update to v1.6.0 since November 19th, 2014

- Changelog
- License: GPL v3
- v1.6.0: Library (linux x86_64), Source (ANSI C++ 1998 and ANSI C++ 2011)
- v1.5.2: Library (linux x86_64), Source (ANSI C++ '98)
- v1.5.1: Library (linux x86_64), Source (ANSI C++ '98)
- v1.4.0: Library (linux x86_64), Source (ANSI C++ '98)
- v1.3.1: Library (linux x86_64), Source (ANSI C++ '98)
- v1.3.0: Library (linux x86_64), Source (ANSI C++ '98)
- v1.2.0: Library (linux x86_64), Source (ANSI C++ '98)
- v1.1.0: Library (linux x86_64), Source (ANSI C++ '98)
- Code documentation (html)
- Code documentation (pdf)

- Triplet Scheme (TS, also known as the Coordinate scheme COO),
- Compressed Row Storage (CRS, also known as compressed sparse row; CSR),
- Incremental CRS (ICRS, see Koster, 2002),
- Zig-Zag CRS (ZZ-CRS, see Yzelman & Bisseling, 2009),
- ZZ-ICRS (see Yzelman & Bisseling, 2009),
- Sparse vector matrix (SVM),
- Hilbert-curve ordered TS (HTS, by Haase, Liebmann & Plank),
- Bi-directional ICRS (BICRS, see Yzelman & Bisseling, 2011),
- Hilbert-curve ordered BICRS (see Yzelman & Bisseling, 2012),
- Hierarchical BICRS (HBICRS),
- Block Hilbert (hard-coded sparse blocking with Hilbert-curve ordering on blocks and HBICRS),
- Bisection Hilbert (as above, but with adaptive sparse blocking),
- Compressed BICRS (CBICRS, see Yzelman & Roose, 2014),
- Vectorised BICRS (vecBICRS, includes compression, see Yzelman, Roose, & Meerbergen, 2014),
- and the Dense diagonal scheme (DD Matrix).

- Block CO-H+ (alike to Block Hilbert, but parallelised; see Yzelman & Roose, 2012),
- Row-distributed block CO-H (alike to Block Hilbert, but with explicit 1D partitioning; see Yzelman & Roose, 2012),
- Row-distributed Hilbert (as the above scheme, but without sparse blocking),
- OpenMP CRS (implicit 1D fine-grained parallelisation using openMP).
- Row-distributed Hilbert-compressed block CO-H (only stores a delta array based on 1D Hilbert coordinates, which are unpacked during SpMV multiplication).

- Ax=y: Regular and in-place sparse matrix dense vector multiplication.
- xA=y: Left-sided dense vector sparse matrix vector multiplication.
- AX=Y (and XA=Y): Multiple right-hand side sparse matrix dense vector multiplication (and its transpose operation). Here, X and Z are tall-skinny matrices, the width of which must be known at compile time.

- A benchmarking application (micbench) written explicitly for the Intel Xeon Phi: employs the parallel Row-distributed block CO-H scheme, but uses the Vectorised BICRS scheme instead of Compressed BICRS as its block-level data storage. See Yzelman, Roose, & Meerbergen, 2014 for an overview of the employed techniques.
- Sequential cache-oblivious (CO) Separated Block Diagonal (SBD) sparse matrix--vector (SpMV) multiplication (sbdmv, works only with matrices in SBD form, see Yzelman & Bisseling, 2011).
- Parallel shared-memory SpMV multiplication using globally shared vectors, with locally applying the CO-SBD method (McShared, implicit 2D distributed SBD-based scheme; see Yzelman, PhD dissertation, 2011).
- An explicitly 2D distributed CO-SBD scheme (McDMV). See Yzelman & Roose, 2014. Supercedes the older McDirect utility).

##### Warning for when using the parallel schemes:

Most strategies read the`hardware.info`file in its current directory to get the number of threads to use. If this file does not exist, this number will be one by default. E.g., to use four threads, execute

`echo 4 > hardware.info`prior to starting your experiment.

### R-tree library:

The R-tree datastructure family is a tree-like datastructure particularly fit for handling spatial data. The R-tree library aims for implementing many R-tree variants in a generic fashion. See the SourceForge homepage for more details.### Cache Simulator:

- Changelog
- V1.1.0: Library (linux), Source (ANSI C++), License: GPL v3
- Code documentation (html)
- Code documentation (pdf)

*cachegrind*, or have a look at PAPI).

### Audio and picture compression in MATLAB:

- MATLAB code
- Manual (in Dutch)

The main functions of the package are

`sound_demo`,

`basic_compression`and

`jpg`. See their respective

`.m`files for their detailed use and usage. Any comments or additions are most welcome.

`jpg.m`was coded by Arno Swart.

### EMM 2 DMM:

- Souce (C++)
- Executable (Linux x86_64)

`./emm2dmm test.emm`which will produce

`test.emm.mtx`,

`test.emm.v`and

`test.emm.u`.

### text2bin:

- Source (C++)
- Executable (Linux x86_64)

### tabularise:

- Source (C++)
- Executable (Linux x86_64)