For help with compiling the library or attached tools,
type 'make help'. Changelog follows:

1.6.0
	-Added a vectorised BICRS scheme which generalises
	 the earlier (sliced) ELLPACK, segmented-reduce based
	 SpMVs, and Blocked CRS formats.
	-Added a wrapper for CUDA CuSparse Hybrid format (for
	 NVIDIA GPUs only).
	-Added support for the Z=AX and Z=XA operations, with
	 X, Z (tall/skinny) matrices. High-performance
	 implementations are currently only available for the
	 TS, CRS, ICRS, and FBICRS matrix classes.
	-The driver application now also reports compute 
	 speeds (flops and bandwidth).
	-Updated testing mechanism for all driver schemes;
	 see `make test'.
	-Added an additional test utility for two-dimensional
	 coordinate to one-dimensional Hilbert coordinate
	 transformation; see `make test'.
	-Now properly allows for 32-bit index values.
	-Hilbert coordinate conversion now is independent of
	 the machine native word length.
	-Fixed some GCC compiler warnings.
	-Updated documentation and references.
	-Tested on x86 (Intel Atom) and on x86_64 (Intel
	 Sandy Bridge and Intel Westmere) using GCC 4.8.3.
	-Tested on x86_64 (Intel Westmere) using ICC 14.0.1.
	-Tested on Intel Xeon Phi using ICC 14.0.1.
	-Added the clang compiler to Makefile options, but
	 this compiler is not (yet) fully supported; the
	 driver application requires access to the OpenMP API
	 which, as of yet, is not in the standard clang
	 distribution.

1.5.2
	-Cleanly compiles with GCC versions 4.4.3 & 4.4.6.
	-Cleanly compiles with ICC version 12.1.3; see Makefile.
	-Added interface to Intel MKL; see Makefile.
	-Can now read simple text files containing CRS-
	 encoded matrices.
	-Improved performance of McCRS on NUMA systems.
	-Bugfix in McCRS.
	-Added option to remove libNUMA dependencies.
	-RDBHilbert now can handle overpartitioning correctly.
	-Destructor of FBICRS is cleaner
	-McDMV now uses spinlocks for barrier synchronisation.
	-`make all' now prints its active target to stdout.
	-Fixed some errors and omissions in documentation.

1.5.1
	-Updated `make help' info.
	-Cleaned up McShared code, and removed experimental
	 parts. McShared is now a default target.
	-Cleaned up McDMV code.
	-Fixed GCC 4.7 compatability.
	-Cleaned up and updated Makefile.
	-Minor changes to mmio.cpp and mmio.h for better C++
	 compatability.
	-Added missing doxygen documentation.

1.5.0
	-Added McDMV: does a fully multi-threaded two-
	 dimensional cache-oblivious SpMV. Requires Extended
	 Matrix-Market (EMM) formatted pre-partitioned matrices,
	 as for instance returned by the Mondriaan partitioner.
	 It makes use of CBICRS and HBICRS schemes of this
	 library. HBICRS functions essentially the same way as
	 it does for the sbdmv application, with the difference
	 that it is applied only on a subset of the partitioned
	 matrix. CBICRS is used for the parts of the matrix
	 where inter-thread communication is required. The
	 SpMV algorithm is executed in the Bulk Synchronous
	 Parallel (BSP) style (fan-in, local SpMV, fan-out).
	-Added RDCSB; same as RDBHilbert, but instead of using
	 the Hilbert curve, CSB's implementation of the Morton
	 curve is used instead. Requires CSB source code to be
	 available in ./csb/, enable via the appropriate flag
	 in driver.cpp and compile as usual.
	-Added RDBHilbert; a cache-oblivious parallel Hilbert-
	 curve based SpMV scheme. Similar to BetaHilbert, but
	 employs a global row-distribution before blocking or
	 the Hilbert curve are used.
	-Added BetaHilbert; a fully cache-oblivious parallel
	 Hilbert-curve based SpMV scheme.
	-Bugfix in ZZ-CRS.
	-Speedup for zax and zxa for ZZ-CRS.
	-Driver now uses CLOCK_MONOTONIC to obtain running times.
	-Driver now outputs max absolute error as well.
	-Driver now outputs MSE compared to SpMV with TS.
	-Driver now can take the number of repeats of SpMVs and
	 complete experiment repitition separately, instead of
	 it being fixed as the square root of a user-defined value.
	-BICRS now handles data more efficiently, can store up to
	 two times larger matrices with the same index data types.
	-Bugfix in BlockOrderer: it now detects errors due to empty
	 separator blocks. Previously the sbdmv and sbd2trp apps 
	 might result in erroneous nonzero removal.
	-Bugfix in driver: when requesting column-major CBICCS, it
	 now actually returns just that (and not CBICRS).

1.4.0
	-Supports a new operation: z=xA, with z the output vector.
	-Added a CCS wrapper to easily transform CRS-classes into a
	 CCS counterpart. Takes O(nnz) more construction time than
	 a clean CCS implementation, but reduces maintanance (for me).

	 Wrapper is accessible via the driver tool; scheme 1 is CRS,
	 -1 is CCS; scheme 2 is ICRS, scheme -2 is ICCS, et cetera.
	 Reading in is done on a transposed matrix and the  zax 
	 operation translates to the zxa operation on that transposed
	 matrix, and vice versa.
	 Therefore, it does have some effect in case of `CCS' Hilbert
	 TS (-6) but other than changing the build-up order, no 
	 performance improvement would be expected.
	-Added the sbd2trp utility, converting a matrix in Separated
	 Block Diagonal (SBD) into cache-obliviously ordered
	 triplets. In the singly (1D) SBD case this corresponds to a
	 CRS order. The doubly (2D) SBD form yields (in general) a 
	 non-CRS order, depending on the block order chosen. 
	 Cache-oblivious SpMV can then be executed by loading in the
	 triplets using *plain* BICRS.
	-Added the Hilbert scheme, now effeciently using the BICRS
	 storage as backing instead of the TS. Should improve over
	 HTS significantly.
	-Added the sbdmv utility, loading in a 2D SBD matrix into 
	 HBICRS and benchmarks SpMV speed.
	-Fixed some HTS errors (regarding explicit zero removals).
	-Added tool to reorder matrix files to CRS order.
	-Added a hierarchical scheme (HBICRS), allowing to store 
	 entire data schemes into a BICRS structure, instead of 
	 nonzeroes. When using the automatic build constructor,
	 datastructures are not offset to save construction time, 
	 effectively not using BICRS (all starting points of the 
	 sublevel structures are set to 0,0).
	-Resolved some compilation warnings.
	-Makefile fix.

1.3.1
	Documentation fixes.
	Fixed bug in driver timing code.

1.3.0
	The driver application now has an option to calculate an average
	running time. It now can also read in from binary triplet files.

	Added new format: Dense Diagonal matrix. Uses template parameters
	to specify the number of dense diagonals in the sparse matrix, as
	well as the offset of each such diagonal. This is done to ensure 
	efficient code. This does mean the format is not dynamic and 
	needs be tailored for use on specific matrices; DD_MATRIX format
	thus is not usable from the driver application.

1.2.0
	Makefile builds both a static & shared driver application.
	Added a command-line driver application for starting benchmarks.
	Debugged CRS scheme.
	Made sparse storage schemes more uniform by letting them be derived
	from a superclass SparseMatrix. Common functions:
		-m()		gets number of matrix rows
		-n()		gets number of matrix columns
		-mv(x)  	allocates new z, zeroes it, and calls zax(x,y)
		-zax(x,y)	calculates z=Ax.
		-load(file)	loads from a matrix market file
				(also in constructor).
	Library compilation now uses -DNDEBUG flags.
	Constructors now accept a matrix-market file as argument.

1.1.0
	Added new formats: ZZ-CRS, Bi-ICRS, Hilbert TS. See documentation. Various bugfixes. 
	Now uses more standard interface calls for matrix-vector multiplication:
		double *y = A.mv(x); //with A a sparse matrix stored in one of the 
		                     //implemented schemes, x of appropiate size 
				     //initialised, y uninitialised.
	or
		A.mv(x,y); //with x of appropiate size initialised, y of appropiate 
		           //size intialised and its elements set to zero.
			   
	NOTE: 	stepped down from this in V1.2.0 since method overloading within
		templates caused SparseMatrix::mv(x) to become invisible.

1.0.0
	Initial release of the Sparse Library. Supports the simple Triplet Scheme (TS), 
	Compressed Row Storage (CRS), Incremental CRS, and Zig-Zag ICRS.
	The Triplet class contains functionality to load and save a vector of 'Triplet' 
	objects (std::vector< Triplet >) in binary format. Also contains a utility 
	(mm2cotrp) able to transform matrix-market files (.mtx) to binary format, 
	with the order of nonzeros determined by using the Hilbert curve.
	Currently only supports matrix-vector multiplication y=Ax, y unallocated, 
	A (in TS, CRS, ICRS, ZZ-ICRS format) and x given by:
		double *y = A.MV(x);
	or z=Ax, z pre-allocated, and pre-initialised to zero:
		A.zax(x,z);