For help with compiling the library or attached tools, type 'make help'. Changelog follows: 1.6.0 -Added a vectorised BICRS scheme which generalises the earlier (sliced) ELLPACK, segmented-reduce based SpMVs, and Blocked CRS formats. -Added a wrapper for CUDA CuSparse Hybrid format (for NVIDIA GPUs only). -Added support for the Z=AX and Z=XA operations, with X, Z (tall/skinny) matrices. High-performance implementations are currently only available for the TS, CRS, ICRS, and FBICRS matrix classes. -The driver application now also reports compute speeds (flops and bandwidth). -Updated testing mechanism for all driver schemes; see `make test'. -Added an additional test utility for two-dimensional coordinate to one-dimensional Hilbert coordinate transformation; see `make test'. -Now properly allows for 32-bit index values. -Hilbert coordinate conversion now is independent of the machine native word length. -Fixed some GCC compiler warnings. -Updated documentation and references. -Tested on x86 (Intel Atom) and on x86_64 (Intel Sandy Bridge and Intel Westmere) using GCC 4.8.3. -Tested on x86_64 (Intel Westmere) using ICC 14.0.1. -Tested on Intel Xeon Phi using ICC 14.0.1. -Added the clang compiler to Makefile options, but this compiler is not (yet) fully supported; the driver application requires access to the OpenMP API which, as of yet, is not in the standard clang distribution. 1.5.2 -Cleanly compiles with GCC versions 4.4.3 & 4.4.6. -Cleanly compiles with ICC version 12.1.3; see Makefile. -Added interface to Intel MKL; see Makefile. -Can now read simple text files containing CRS- encoded matrices. -Improved performance of McCRS on NUMA systems. -Bugfix in McCRS. -Added option to remove libNUMA dependencies. -RDBHilbert now can handle overpartitioning correctly. -Destructor of FBICRS is cleaner -McDMV now uses spinlocks for barrier synchronisation. -`make all' now prints its active target to stdout. -Fixed some errors and omissions in documentation. 1.5.1 -Updated `make help' info. -Cleaned up McShared code, and removed experimental parts. McShared is now a default target. -Cleaned up McDMV code. -Fixed GCC 4.7 compatability. -Cleaned up and updated Makefile. -Minor changes to mmio.cpp and mmio.h for better C++ compatability. -Added missing doxygen documentation. 1.5.0 -Added McDMV: does a fully multi-threaded two- dimensional cache-oblivious SpMV. Requires Extended Matrix-Market (EMM) formatted pre-partitioned matrices, as for instance returned by the Mondriaan partitioner. It makes use of CBICRS and HBICRS schemes of this library. HBICRS functions essentially the same way as it does for the sbdmv application, with the difference that it is applied only on a subset of the partitioned matrix. CBICRS is used for the parts of the matrix where inter-thread communication is required. The SpMV algorithm is executed in the Bulk Synchronous Parallel (BSP) style (fan-in, local SpMV, fan-out). -Added RDCSB; same as RDBHilbert, but instead of using the Hilbert curve, CSB's implementation of the Morton curve is used instead. Requires CSB source code to be available in ./csb/, enable via the appropriate flag in driver.cpp and compile as usual. -Added RDBHilbert; a cache-oblivious parallel Hilbert- curve based SpMV scheme. Similar to BetaHilbert, but employs a global row-distribution before blocking or the Hilbert curve are used. -Added BetaHilbert; a fully cache-oblivious parallel Hilbert-curve based SpMV scheme. -Bugfix in ZZ-CRS. -Speedup for zax and zxa for ZZ-CRS. -Driver now uses CLOCK_MONOTONIC to obtain running times. -Driver now outputs max absolute error as well. -Driver now outputs MSE compared to SpMV with TS. -Driver now can take the number of repeats of SpMVs and complete experiment repitition separately, instead of it being fixed as the square root of a user-defined value. -BICRS now handles data more efficiently, can store up to two times larger matrices with the same index data types. -Bugfix in BlockOrderer: it now detects errors due to empty separator blocks. Previously the sbdmv and sbd2trp apps might result in erroneous nonzero removal. -Bugfix in driver: when requesting column-major CBICCS, it now actually returns just that (and not CBICRS). 1.4.0 -Supports a new operation: z=xA, with z the output vector. -Added a CCS wrapper to easily transform CRS-classes into a CCS counterpart. Takes O(nnz) more construction time than a clean CCS implementation, but reduces maintanance (for me). Wrapper is accessible via the driver tool; scheme 1 is CRS, -1 is CCS; scheme 2 is ICRS, scheme -2 is ICCS, et cetera. Reading in is done on a transposed matrix and the zax operation translates to the zxa operation on that transposed matrix, and vice versa. Therefore, it does have some effect in case of `CCS' Hilbert TS (-6) but other than changing the build-up order, no performance improvement would be expected. -Added the sbd2trp utility, converting a matrix in Separated Block Diagonal (SBD) into cache-obliviously ordered triplets. In the singly (1D) SBD case this corresponds to a CRS order. The doubly (2D) SBD form yields (in general) a non-CRS order, depending on the block order chosen. Cache-oblivious SpMV can then be executed by loading in the triplets using *plain* BICRS. -Added the Hilbert scheme, now effeciently using the BICRS storage as backing instead of the TS. Should improve over HTS significantly. -Added the sbdmv utility, loading in a 2D SBD matrix into HBICRS and benchmarks SpMV speed. -Fixed some HTS errors (regarding explicit zero removals). -Added tool to reorder matrix files to CRS order. -Added a hierarchical scheme (HBICRS), allowing to store entire data schemes into a BICRS structure, instead of nonzeroes. When using the automatic build constructor, datastructures are not offset to save construction time, effectively not using BICRS (all starting points of the sublevel structures are set to 0,0). -Resolved some compilation warnings. -Makefile fix. 1.3.1 Documentation fixes. Fixed bug in driver timing code. 1.3.0 The driver application now has an option to calculate an average running time. It now can also read in from binary triplet files. Added new format: Dense Diagonal matrix. Uses template parameters to specify the number of dense diagonals in the sparse matrix, as well as the offset of each such diagonal. This is done to ensure efficient code. This does mean the format is not dynamic and needs be tailored for use on specific matrices; DD_MATRIX format thus is not usable from the driver application. 1.2.0 Makefile builds both a static & shared driver application. Added a command-line driver application for starting benchmarks. Debugged CRS scheme. Made sparse storage schemes more uniform by letting them be derived from a superclass SparseMatrix. Common functions: -m() gets number of matrix rows -n() gets number of matrix columns -mv(x) allocates new z, zeroes it, and calls zax(x,y) -zax(x,y) calculates z=Ax. -load(file) loads from a matrix market file (also in constructor). Library compilation now uses -DNDEBUG flags. Constructors now accept a matrix-market file as argument. 1.1.0 Added new formats: ZZ-CRS, Bi-ICRS, Hilbert TS. See documentation. Various bugfixes. Now uses more standard interface calls for matrix-vector multiplication: double *y = A.mv(x); //with A a sparse matrix stored in one of the //implemented schemes, x of appropiate size //initialised, y uninitialised. or A.mv(x,y); //with x of appropiate size initialised, y of appropiate //size intialised and its elements set to zero. NOTE: stepped down from this in V1.2.0 since method overloading within templates caused SparseMatrix::mv(x) to become invisible. 1.0.0 Initial release of the Sparse Library. Supports the simple Triplet Scheme (TS), Compressed Row Storage (CRS), Incremental CRS, and Zig-Zag ICRS. The Triplet class contains functionality to load and save a vector of 'Triplet' objects (std::vector< Triplet >) in binary format. Also contains a utility (mm2cotrp) able to transform matrix-market files (.mtx) to binary format, with the order of nonzeros determined by using the Hilbert curve. Currently only supports matrix-vector multiplication y=Ax, y unallocated, A (in TS, CRS, ICRS, ZZ-ICRS format) and x given by: double *y = A.MV(x); or z=Ax, z pre-allocated, and pre-initialised to zero: A.zax(x,z);