ALP User Documentation 0.7.0
Algebraic Programming User Documentation
Static Public Member Functions | List of all members
collectives< implementation > Class Template Reference

A static class defining various collective operations on scalars. More...

#include <collectives.hpp>

Static Public Member Functions

template<Descriptor descr = descriptors::no_operation, typename Operator , typename IOType >
static RC allreduce (IOType &inout, const Operator op=Operator())
 Schedules an allreduce operation of a single object of type IOType per process. More...
 
template<typename IOType >
static RC broadcast (IOType &inout, const size_t root=0)
 Schedules a broadcast operation of a single object of type IOType per process. More...
 
template<Descriptor descr = descriptors::no_operation, typename IOType >
static RC broadcast (IOType *inout, const size_t size, const size_t root=0)
 Broadcast on an array of IOType. More...
 
template<Descriptor descr = descriptors::no_operation, typename Operator , typename IOType >
static RC reduce (IOType &inout, const size_t root=0, const Operator op=Operator())
 Schedules a reduce operation of a single object of type IOType per process. More...
 

Detailed Description

template<enum Backend implementation>
class grb::collectives< implementation >

A static class defining various collective operations on scalars.

This class is templated in terms of the backends that are implemented– each implementation provides its own mechanisms to handle collective communications. These are required for users employing grb::eWiseLambda, or for users who perform explicit SPMD programming.

Member Function Documentation

◆ allreduce()

static RC allreduce ( IOType &  inout,
const Operator  op = Operator() 
)
inlinestatic

Schedules an allreduce operation of a single object of type IOType per process.

The allreduce shall be complete by the end of the call. This is a collective graphBLAS operation. After the collective call finishes, each user process will locally have available the allreduced value.

Since this is a collective call, there are P values inout spread over all user processes. Let these values be denoted by \( x_s \), with \( s \in \{ 0, 1, \ldots, P-1 \}, \) such that \( x_s \) equals the argument inout on input at the user process with ID s. Let \( \pi:\ \{ 0, 1, \ldots, P-1 \} \to \{ 0, 1, \ldots, P-1 \} \) be a bijection, some unknown permutation of the process ID. This permutation is must be fixed for any given combination of GraphBLAS implementation and value P. Let the binary operator op be denoted by \( \odot \).

This function computes \( \odot_{i=0}^{P-1} x_{\pi(i)} \) and writes the exact same result to inout at each of the P user processes.

In summary, this means 1) this operation is coherent across all processes and produces bit-wise equivalent output on all user processes, and 2) the result is reproducible across different runs using the same input and P. Yet it does not mean that the order of addition is fixed.

Since each user process supplies but one value, there is no difference between a reduce-to-the-left versus a reduce-to-the-right (see grb::reducel and grb::reducer).

Template Parameters
descrThe GraphBLAS descriptor. Default is grb::descriptors::no_operation.
OperatorWhich operator to use for reduction.
IOTypeThe type of the to-be reduced value.
Parameters
[in,out]inoutOn input: the value at the calling process to be reduced. On output: the reduced value.
[in]opThe associative operator to reduce by.
Note
If op is commutative, the implementation free to employ a different allreduce algorithm, as long as it is documented well enough so that its cost can be quantified.
Returns
grb::SUCCESS When the operation succeeds as planned.
grb::PANIC When the communication layer unexpectedly fails. When this error code is returned, the library enters an undefined state.
Valid descriptors:
  1. grb::descriptors::no_operation
  2. grb::descriptors::no_casting Any other descriptors will be ignored.
Performance semantics:
  1. Problem size N: \( P * \mathit{sizeof}(\mathit{IOType}) \)
  2. local work: \( N*Operator \) ;
  3. transferred bytes: \( N \) ;
  4. BSP cost: \( Ng + N*Operator + l \);

◆ broadcast() [1/2]

static RC broadcast ( IOType &  inout,
const size_t  root = 0 
)
inlinestatic

Schedules a broadcast operation of a single object of type IOType per process.

The broadcast shall be complete by the end of the call. This is a collective graphBLAS operation. The BSP costs are as for the PlatformBSP broadcast.

Template Parameters
IOTypeThe type of the to-be broadcast value.
Parameters
[in,out]inoutOn input at process root: the value to be broadcast. On input at non-root processes: initial values are ignored. On output at process root: the input value remains unchanged. On output at non-root processes: the same value held at process ID root.
[in]rootThe user process which is to send out the given input value inout so that it becomes available at all P user processes. This value must be larger or equal to zero and must be smaller than the total number of user processes P.
Returns
SUCCESS On the successful completion of this function.
ILLEGAL When root is larger or equal to P. If this code is returned, it shall be as though the call to this function had never occurred. return PANIC When the function fails and the library enters an undefined state.
Performance semantics
Backends should define performance semantics in terms of work and data movement, the latter both within and between user processes. Also the number of synchronisations between user processes must be quantified.
Backends furthermore must indicate whether system calls may occur during a call to this primitive, indicate whether additional dynamic may be allocated (and if so, when it is freed), and quantify the required work space.

◆ broadcast() [2/2]

static RC broadcast ( IOType *  inout,
const size_t  size,
const size_t  root = 0 
)
inlinestatic

Broadcast on an array of IOType.

The above documentation applies with size times sizeof(IOType) substituted in.

◆ reduce()

static RC reduce ( IOType &  inout,
const size_t  root = 0,
const Operator  op = Operator() 
)
inlinestatic

Schedules a reduce operation of a single object of type IOType per process.

The reduce shall be complete by the end of the call. This is a collective graphBLAS operation. The BSP costs are as for the PlatformBSP reduce.

Since this is a collective call, there are P values inout spread over all user processes. Let these values be denoted by \( x_s \), with \( s \in \{ 0, 1, \ldots, P-1 \}, \) such that \( x_s \) equals the argument inout on input at the user process with ID s. Let \( \pi:\ \{ 0, 1, \ldots, P-1 \} \to \{ 0, 1, \ldots, P-1 \} \) be a bijection, some unknown permutation of the process ID. This permutation is must be fixed for any given combination of GraphBLAS implementation and value P. Let the binary operator op be denoted by \( \odot \).

This function computes \( \odot_{i=0}^{P-1} x_{\pi(i)} \) and writes the result to inout at the user process with ID root.

In summary, this the result is reproducible across different runs using the same input and P. Yet it does not mean that the order of addition is fixed.

Since each user process supplies but one value, there is no difference between a reduce-to-the-left versus a reduce-to-the-right (see grb::reducel and grb::reducer).

Template Parameters
descrThe GraphBLAS descriptor. Default is grb::descriptors::no_operation.
OperatorWhich operator to use for reduction.
IOTypeThe type of the to-be reduced value.
Parameters
[in,out]inoutOn input: the value at the calling process to be reduced. On output at process root: the reduced value. On output as non-root processes: same value as on input.
[in]opThe associative operator to reduce by.
[in]rootWhich process should hold the reduced value. This number must be larger or equal to zero, and must be strictly smaller than the number of user processes P.
Returns
SUCCESS When the function completes successfully.
ILLEGAL When root is larger or equal than P. When this code is returned, the state of the GraphBLAS shall be as though this call was never made.
PANIC When an unmitigable error within the GraphBLAS occurs. Upon returning this error, the GraphBLAS enters an undefined state.
Note
If op is commutative, the implementation free to employ a different allreduce algorithm, as long as the performance semantics are documented so that its cost can be quantified.
Performance semantics:
  1. Problem size N: \( P * \mathit{sizeof}(\mathit{IOType}) \)
  2. local work: \( N*Operator \) ;
  3. transferred bytes: \( N \) ;
  4. BSP cost: \( Ng + N*Operator + l \);

The documentation for this class was generated from the following file: