SparseLibrary
Version 1.6.0
|
Full parallel row-distributed SpMV, based on CSB (Morton curve + Cilk) and PThreads. More...
#include <RDScheme.hpp>
Public Member Functions | |
RDScheme (const std::string file, T zero) | |
Base constructor. More... | |
RDScheme (std::vector< Triplet< T > > &input, ULI m, ULI n, T zero) | |
Base constructor. More... | |
virtual | ~RDScheme () |
Base deconstructor. More... | |
void | wait () |
Lets the calling thread wait for the end of the SpMV multiply. | |
virtual void | load (std::vector< Triplet< T > > &input, const ULI m, const ULI n, const T zero) |
Loads a sparse matrix from an input set of triplets. More... | |
virtual T * | mv (const T *x) |
Overloaded mv call; allocates output vector using numa_interleaved. More... | |
virtual void | zxa (const T *x, T *z) |
virtual void | zxa (const T *x, T *z, const unsigned long int repeat) |
virtual void | zax (const T *x, T *z) |
See SparseMatrix::zax. | |
virtual void | zax (const T *x, T *z, const unsigned long int repeat, const clockid_t clock_id, double *elapsed_time) |
See SparseMatrix::zax. | |
virtual size_t | bytesUsed () |
virtual void | getFirstIndexPair (ULI &i, ULI &j) |
Function disabled for parallel schemes! More... | |
Public Member Functions inherited from SparseMatrix< T, ULI > | |
SparseMatrix () | |
Base constructor. More... | |
SparseMatrix (const ULInzs, const ULInr, const ULInc, const T zero) | |
Base constructor. More... | |
virtual | ~SparseMatrix () |
Base deconstructor. More... | |
void | loadFromFile (const std::string file, const T zero=0) |
Function which loads a matrix from a matrix market file. More... | |
virtual unsigned long int | m () |
Queries the number of rows this matrix contains. More... | |
virtual unsigned long int | n () |
Queries the number of columns this matrix contains. More... | |
virtual unsigned long int | nzs () |
Queries the number of nonzeroes stored in this matrix. More... | |
virtual void | zax (const T *__restrict__ x, T *__restrict__ z)=0 |
In-place z=Ax function. More... | |
virtual void | zxa (const T *__restrict__ x, T *__restrict__ z)=0 |
In-place z=xA function. More... | |
Public Member Functions inherited from Matrix< T > | |
Matrix () | |
Base constructor. More... | |
virtual | ~Matrix () |
Base deconstructor. More... | |
virtual void | zax (const T *__restrict__ x, T *__restrict__ z, const size_t k, const clockid_t clock_id=0, double *elapsed_time=NULL) |
Wrapper function to call the zax kernel multiple times successively, while timing the duration of the operation. More... | |
template<size_t k> | |
void | ZaX (const T *__restrict__ const *__restrict__ const X, T *__restrict__ const *__restrict__ const Z) |
In-place Z=AX function, where A is m x n, Z = m x k, and X is n x k. More... | |
template<size_t k> | |
void | ZXa (const T *__restrict__ const *__restrict__ const X, T *__restrict__ const *__restrict__ const Z) |
In-place Z=XA function, where A is m x n, Z = k x n, and X is k x m. More... | |
virtual void | zxa (const T *__restrict__ x, T *__restrict__ z, const unsigned long int repeat, const clockid_t clock_id=0, double *elapsed_time=NULL) |
Wrapper function to call the zxa kernel multiple times successively, while timing the operation duration. More... | |
Static Public Member Functions | |
static void | end (pthread_mutex_t *mutex, pthread_cond_t *cond, size_t *sync, const size_t P) |
End synchronisation code. More... | |
static void | synchronise (pthread_mutex_t *mutex, pthread_cond_t *cond, size_t *sync, const size_t P) |
Synchronises all threads. More... | |
static void * | thread (void *data) |
SPMD code for each thread involved with parallel SpMV multiplication. More... | |
static void | collectY (RDScheme_shared_data< T > *shared) |
Reduces a distributed output vector set into a single contiguous output vector at process 0. More... | |
Protected Attributes | |
pthread_t * | threads |
p_threads associated to this data strcuture | |
RDScheme_shared_data< T > * | thread_data |
array of initial thread data | |
pthread_mutex_t | mutex |
Stop/continue mechanism: mutex. | |
pthread_cond_t | cond |
Stop/continue mechanism: condition. | |
pthread_mutex_t | end_mutex |
Wait for end mechanism: mutex. | |
pthread_cond_t | end_cond |
Wait for end mechanism: condition. | |
size_t | sync |
Used for synchronising threads. | |
size_t | end_sync |
Used for construction end signal. | |
Protected Attributes inherited from SparseMatrix< T, ULI > | |
ULI | nor |
Number of rows. More... | |
ULI | noc |
Number of columns. | |
ULI | nnz |
Number of non-zeros. More... | |
Static Protected Attributes | |
static size_t | P = 0 |
Number of threads to fire up. | |
static const T * | input = NULL |
Input vector. | |
static T * | output = NULL |
Output vector. | |
static clockid_t | global_clock_id = 0 |
Clock type used for thread-local timing. | |
Additional Inherited Members | |
Public Attributes inherited from SparseMatrix< T, ULI > | |
T | zero_element |
The element considered to be zero. More... | |
Full parallel row-distributed SpMV, based on CSB (Morton curve + Cilk) and PThreads.
Inspired by Aydin & Gilbert's CSB, and comments by Patrick Amestoy on the BICRS Hilbert scheme.
|
inline |
Base constructor.
Reads input from a set of triplets.
References RDScheme< T, DS >::input, and RDScheme< T, DS >::load().
Base deconstructor.
References RDScheme< T, DS >::cond, RDScheme< T, DS >::mutex, RDScheme< T, DS >::P, RDScheme< T, DS >::thread_data, and RDScheme< T, DS >::threads.
|
inlinevirtual |
Implements Matrix< T >.
References RDScheme< T, DS >::P, and RDScheme< T, DS >::thread_data.
|
inlinestatic |
Reduces a distributed output vector set into a single contiguous output vector at process 0.
shared | Which set of output vectors to reduce. |
References RDScheme_shared_data< T >::id, RDScheme_shared_data< T >::local_y, RDScheme_shared_data< T >::output_vector_offset, and RDScheme_shared_data< T >::output_vector_size.
Referenced by RDScheme< T, DS >::thread().
|
inlinestatic |
End synchronisation code.
Referenced by RDScheme< T, DS >::thread().
|
inlinevirtual |
Function disabled for parallel schemes!
Implements SparseMatrix< T, ULI >.
|
inlinevirtual |
Loads a sparse matrix from an input set of triplets.
input | The input set of triplets. |
m | The number of rows. |
n | The number of columns. |
zero | What constitutes a zero in this sparse matrix instance. |
Implements SparseMatrix< T, ULI >.
References RDScheme< T, DS >::cond, MachineInfo::cores(), RDScheme< T, DS >::end_cond, RDScheme< T, DS >::end_mutex, RDScheme< T, DS >::end_sync, MachineInfo::getInstance(), RDScheme< T, DS >::input, SparseMatrix< T, ULI >::m(), RDScheme< T, DS >::mutex, SparseMatrix< T, ULI >::n(), SparseMatrix< T, ULI >::nnz, SparseMatrix< T, ULI >::noc, SparseMatrix< T, ULI >::nor, RDScheme< T, DS >::P, RDScheme< T, DS >::sync, RDScheme< T, DS >::thread(), RDScheme< T, DS >::thread_data, RDScheme< T, DS >::threads, RDScheme< T, DS >::wait(), and SparseMatrix< T, ULI >::zero_element.
Referenced by RDScheme< T, DS >::RDScheme().
|
inlinevirtual |
Overloaded mv call; allocates output vector using numa_interleaved.
Reimplemented from SparseMatrix< T, ULI >.
References SparseMatrix< T, ULI >::nor, RDScheme< T, DS >::zax(), and SparseMatrix< T, ULI >::zero_element.
|
inlinestatic |
Synchronises all threads.
Referenced by RDScheme< T, DS >::thread().
|
inlinestatic |
SPMD code for each thread involved with parallel SpMV multiplication.
References RDScheme_shared_data< T >::bytes, RDScheme< T, DS >::collectY(), RDScheme_shared_data< T >::cond, RDScheme< T, DS >::cond, RDScheme< T, DS >::end(), RDScheme_shared_data< T >::end_cond, RDScheme_shared_data< T >::end_mutex, RDScheme_shared_data< T >::end_sync, RDScheme< T, DS >::global_clock_id, RDScheme_shared_data< T >::id, RDScheme_shared_data< T >::local_y, SparseMatrix< T, ULI >::m(), RDScheme_shared_data< T >::mode, RDScheme_shared_data< T >::mutex, RDScheme< T, DS >::mutex, SparseMatrix< T, ULI >::n(), SparseMatrix< T, ULI >::nnz, RDScheme_shared_data< T >::nzb, RDScheme_shared_data< T >::original, RDScheme< T, DS >::output, RDScheme_shared_data< T >::output_vector_offset, RDScheme_shared_data< T >::output_vector_size, RDScheme_shared_data< T >::P, RDScheme< T, DS >::P, RDScheme_shared_data< T >::repeat, RDScheme_shared_data< T >::sync, RDScheme< T, DS >::synchronise(), and RDScheme_shared_data< T >::time.
Referenced by RDScheme< T, DS >::load().
|
inlinevirtual |
|
inlinevirtual |