ALP User Documentation
0.8.preview
Algebraic Programming User Documentation
|
A static class defining various collective operations on scalars. More...
Static Public Member Functions | |
template<Descriptor descr = descriptors::no_operation, typename Operator , typename IOType > | |
static RC | allreduce (IOType &inout, const Operator op=Operator()) |
Schedules an allreduce operation of a single object of type IOType per process. More... | |
template<typename IOType > | |
static RC | broadcast (IOType &inout, const size_t root=0) |
Schedules a broadcast operation of a single object of type IOType per process. More... | |
template<Descriptor descr = descriptors::no_operation, typename IOType > | |
static RC | broadcast (IOType *inout, const size_t size, const size_t root=0) |
Broadcast on an array of IOType. More... | |
template<Descriptor descr = descriptors::no_operation, typename Operator , typename IOType > | |
static RC | reduce (IOType &inout, const size_t root=0, const Operator op=Operator()) |
Schedules a reduce operation of a single object of type IOType per process. More... | |
A static class defining various collective operations on scalars.
This class is templated in terms of the backends that are implemented– each implementation provides its own mechanisms to handle collective communications. These are required for users employing grb::eWiseLambda, or for users who perform explicit SPMD programming.
|
inlinestatic |
Schedules an allreduce operation of a single object of type IOType per process.
The allreduce shall be complete by the end of the call. This is a collective graphBLAS operation. After the collective call finishes, each user process will locally have available the allreduced value.
Since this is a collective call, there are P values inout spread over all user processes. Let these values be denoted by \( x_s \), with \( s \in \{ 0, 1, \ldots, P-1 \}, \) such that \( x_s \) equals the argument inout on input at the user process with ID s. Let \( \pi:\ \{ 0, 1, \ldots, P-1 \} \to \{ 0, 1, \ldots, P-1 \} \) be a bijection, some unknown permutation of the process ID. This permutation is must be fixed for any given combination of GraphBLAS implementation and value P. Let the binary operator op be denoted by \( \odot \).
This function computes \( \odot_{i=0}^{P-1} x_{\pi(i)} \) and writes the exact same result to inout at each of the P user processes.
In summary, this means 1) this operation is coherent across all processes and produces bit-wise equivalent output on all user processes, and 2) the result is reproducible across different runs using the same input and P. Yet it does not mean that the order of addition is fixed.
Since each user process supplies but one value, there is no difference between a reduce-to-the-left versus a reduce-to-the-right (see grb::reducel and grb::reducer).
descr | The GraphBLAS descriptor. Default is grb::descriptors::no_operation. |
Operator | Which operator to use for reduction. |
IOType | The type of the to-be reduced value. |
[in,out] | inout | On input: the value at the calling process to be reduced. On output: the reduced value. |
[in] | op | The associative operator to reduce by. |
|
inlinestatic |
Schedules a broadcast operation of a single object of type IOType per process.
The broadcast shall be complete by the end of the call. This is a collective graphBLAS operation. The BSP costs are as for the PlatformBSP broadcast.
IOType | The type of the to-be broadcast value. |
[in,out] | inout | On input at process root: the value to be broadcast. On input at non-root processes: initial values are ignored. On output at process root: the input value remains unchanged. On output at non-root processes: the same value held at process ID root. |
[in] | root | The user process which is to send out the given input value inout so that it becomes available at all P user processes. This value must be larger or equal to zero and must be smaller than the total number of user processes P. |
|
inlinestatic |
Broadcast on an array of IOType.
The above documentation applies with size times sizeof(IOType)
substituted in.
|
inlinestatic |
Schedules a reduce operation of a single object of type IOType per process.
The reduce shall be complete by the end of the call. This is a collective graphBLAS operation. The BSP costs are as for the PlatformBSP reduce.
Since this is a collective call, there are P values inout spread over all user processes. Let these values be denoted by \( x_s \), with \( s \in \{ 0, 1, \ldots, P-1 \}, \) such that \( x_s \) equals the argument inout on input at the user process with ID s. Let \( \pi:\ \{ 0, 1, \ldots, P-1 \} \to \{ 0, 1, \ldots, P-1 \} \) be a bijection, some unknown permutation of the process ID. This permutation is must be fixed for any given combination of GraphBLAS implementation and value P. Let the binary operator op be denoted by \( \odot \).
This function computes \( \odot_{i=0}^{P-1} x_{\pi(i)} \) and writes the result to inout at the user process with ID root.
In summary, this the result is reproducible across different runs using the same input and P. Yet it does not mean that the order of addition is fixed.
Since each user process supplies but one value, there is no difference between a reduce-to-the-left versus a reduce-to-the-right (see grb::reducel and grb::reducer).
descr | The GraphBLAS descriptor. Default is grb::descriptors::no_operation. |
Operator | Which operator to use for reduction. |
IOType | The type of the to-be reduced value. |
[in,out] | inout | On input: the value at the calling process to be reduced. On output at process root: the reduced value. On output as non-root processes: same value as on input. |
[in] | op | The associative operator to reduce by. |
[in] | root | Which process should hold the reduced value. This number must be larger or equal to zero, and must be strictly smaller than the number of user processes P. |