Lightweight Parallel Foundations 1.0.1-alpha 2023-06-26T11:02:34Z
A high performance and model-compliant communication layer
Classes | Macros | Typedefs | Functions | Variables
Collaboration diagram for LPF Collectives API:

Classes

struct  lpf_coll_t
 

Macros

#define _LPF_COLLECTIVES_VERSION   201500L
 

Typedefs

typedef void(* lpf_reducer_t) (size_t n, const void *array, void *value)
 
typedef void(* lpf_combiner_t) (size_t n, const void *combine, void *into)
 

Functions

lpf_err_t lpf_collectives_init (lpf_t ctx, lpf_pid_t s, lpf_pid_t p, size_t max_calls, size_t max_elem_size, size_t max_byte_size, lpf_coll_t *coll)
 
lpf_t lpf_collectives_get_context (lpf_coll_t coll)
 
lpf_err_t lpf_collectives_init_strided (lpf_t ctx, lpf_pid_t s, lpf_pid_t p, lpf_pid_t lo, lpf_pid_t hi, lpf_pid_t str, size_t max_calls, size_t max_elem_size, size_t max_byte_size, lpf_coll_t *coll)
 
lpf_err_t lpf_collectives_destroy (lpf_coll_t coll)
 
lpf_err_t lpf_broadcast (lpf_coll_t coll, lpf_memslot_t src, lpf_memslot_t dst, size_t size, lpf_pid_t root)
 
lpf_err_t lpf_gather (lpf_coll_t coll, lpf_memslot_t src, lpf_memslot_t dst, size_t size, lpf_pid_t root)
 
lpf_err_t lpf_scatter (lpf_coll_t coll, lpf_memslot_t src, lpf_memslot_t dst, size_t size, lpf_pid_t root)
 
lpf_err_t lpf_allgather (lpf_coll_t coll, lpf_memslot_t src, lpf_memslot_t dst, size_t size, bool exclude_myself)
 
lpf_err_t lpf_alltoall (lpf_coll_t coll, lpf_memslot_t src, lpf_memslot_t dst, size_t size)
 
lpf_err_t lpf_reduce (lpf_coll_t coll, void *restrict element, lpf_memslot_t element_slot, size_t size, lpf_reducer_t reducer, lpf_pid_t root)
 
lpf_err_t lpf_allreduce (lpf_coll_t coll, void *restrict element, lpf_memslot_t element_slot, size_t size, lpf_reducer_t reducer)
 
lpf_err_t lpf_combine (lpf_coll_t coll, void *restrict array, lpf_memslot_t slot, size_t num, size_t size, lpf_combiner_t combiner, lpf_pid_t root)
 
lpf_err_t lpf_allcombine (lpf_coll_t coll, void *restrict array, lpf_memslot_t slot, size_t num, size_t size, lpf_combiner_t combiner)
 

Variables

const lpf_coll_t LPF_INVALID_COLL
 

Detailed Description

Macro Definition Documentation

◆ _LPF_COLLECTIVES_VERSION

#define _LPF_COLLECTIVES_VERSION   201500L

The version of this collectives specification.

Typedef Documentation

◆ lpf_reducer_t

typedef void(* lpf_reducer_t) (size_t n, const void *array, void *value)

A type of a reducer used with lpf_reduce() and lpf_allreduce(). This function defines how an array of n elements should be reduced into a single value. The value is assumed to be initialised at function entry.

Parameters
[in]nThe number of elements in the array.
[in]arrayThe array to reduce.
[in,out]valueAn initial reduction value into which all values in the array are reduced.

This is a user-defined function that the LPF collectives library makes callbacks to. There are no hard guarantees on the runtime incurred during the induced computation phases, nor are there any guarantees on correctness and failure mitigation.

◆ lpf_combiner_t

typedef void(* lpf_combiner_t) (size_t n, const void *combine, void *into)

A type of combiner used with lpf_combine() and lpf_allcombine(). This function defines how an array of n elements should by combined with an existing array of the same size. All arrays should be assumed to be fully initialised at function entry.

A lpf_combiner_t function may not be applied on the full input array; both arrays of size N may be cut into smaller pieces by the LPF collectives library, which then calls this lpf_combiner_t function repeatedly on each of the smaller pieces of both arrays.

For consistent results, the lpf_combiner_t should be associative and commutative. Other reduces should only be used with greater thought and full understanding of the underlying collective algorithms, which, of course, are implementation dependent.

Warning
The parameter n is not a byte size unless the elements of the arrays happen to be a single byte large.
Parameters
[in]nThe number of elements in both the combine and into arrays.
[in]combineThe data to be combined the destination array.
[in,out]intoAn array with existing data into which to combine the array combine.

Function Documentation

◆ lpf_collectives_init()

lpf_err_t lpf_collectives_init ( lpf_t  ctx,
lpf_pid_t  s,
lpf_pid_t  p,
size_t  max_calls,
size_t  max_elem_size,
size_t  max_byte_size,
lpf_coll_t coll 
)

Initialises a collectives struct, which allows the scheduling of collective calls. The initialised struct is only valid after a next call to lpf_sync().

All operations using the output argument coll need to be made collectively; i.e., all processes must make the same function call with possibly different arguments, as prescribed by the collective in question.

Parameters
[in,out]ctxThe LPF runtime state as provided by lpf_exec() or lpf_hook() via lpf_spmd_t.
[in]sThe unique ID number of this process in this LPF SPMD run. Must be larger or equal to zero; may not be larger or equal to p.
[in]pThe number of processes involved with this LPF SPMD run. Must be larger or equal to zero. May not be larger than the value provided by lpf_exec() or lpf_hook() via lpf_spmd_t.
[in]max_callsThe number of collective calls a single LPF process may make within a single superstep. Must be larger than zero.
[in]max_elem_sizeThe maximum number of bytes of a single element to be reduced through lpf_reduce or lpf_allreduce. Must be larger or equal to zero.
[in]max_byte_sizeThe maximum number of bytes given as a size parameter to lpf_broadcast, lpf_scatter, lpf_gather, lpf_allgather, or lpf_alltoall. Must be larger or equal to zero. For the lpf_combine and lpf_allcombine collectives, the effective byte size is \( \mathit{num}\mathit{size} \).
[out]collThe collective struct needed to perform collective operations using this library. After a successful call to this function, this return lpf_coll_t value may be used immediately.

This implementation will register one memory slot on every call to this function.

Returns
LPF_ERR_OUT_OF_MEMORY When the requested buffers would cause the system to go out of memory. After returning with this error code, it shall be as though the call to this function had not occurred.
LPF_SUCCESS When the function executed successfully.

◆ lpf_collectives_get_context()

lpf_t lpf_collectives_get_context ( lpf_coll_t  coll)

Returns the context embedded within a collectives instance.

Parameters
[in]collA valid collectives instance.
Returns
The LPF core API context used to construct coll with.
See also
lpf_collectives_init
Warning
Calling this function using an invalid coll will result in undefined behaviour.

◆ lpf_collectives_init_strided()

lpf_err_t lpf_collectives_init_strided ( lpf_t  ctx,
lpf_pid_t  s,
lpf_pid_t  p,
lpf_pid_t  lo,
lpf_pid_t  hi,
lpf_pid_t  str,
size_t  max_calls,
size_t  max_elem_size,
size_t  max_byte_size,
lpf_coll_t coll 
)

Initialises a collectives struct, which allows the scheduling of collective calls. The initialised struct is only valid after a next call to lpf_sync(). This variant selects only a subset of processes based on a lower PID bound (inclusive), and upper bound (exclusive), and a stride.

All operations using the output argument coll need to be made collectively; i.e., all processes that are involved with coll must make the same function call with possibly different arguments, as prescribed by the collective in question.

Note
Thus not all processes need to make the same collective call. If, e.g., a range from 10 to 20 with stride 2 is used to construct coll then every collective call at PIDs 10, 12, 14, 16, 18 must be matched with a similar call at all other processes with PID in that same set.
Parameters
[in,out]ctxThe LPF runtime state as provided by lpf_exec() or lpf_hook().
[in]sThe unique ID number of this process in this LPF SPMD run. Must be larger or equal to zero; may not be larger or equal to p. May not be smaller than lo.
[in]pThe number of processes involved with this LPF SPMD run. Must be larger or equal to zero. May not be larger than the value provided by lpf_exec() or lpf_hook() via lpf_spmd_t.
[in]loThe lower process ID participating in these collectives. Must be larger or equal to zero. May not be larger than p. May not be larger than hi.
[in]hiThe upper bound on the process IDs participating in these collectives. Must be larger or equal to zero. May not be larger than p. May not be less than lo.
[in]strThe stride of process IDs. Must be larger or than one.
[in]max_callsnumber of collective calls a single LPF process may make within a single superstep. Must be larger than zero.
[in]max_elem_sizeThe maximum number of bytes of a single element to be reduced through lpf_reduce or lpf_allreduce. May be equal to zero.
[in]max_byte_sizeThe maximum number of bytes given as a size parameter to lpf_broadcast, lpf_scatter, lpf_gather, lpf_allgather, or lpf_alltoall. May be equal to zero.
[out]collThe collective struct needed to perform collective operations using this library. After a successful call to this function, this return lpf_coll_t value may be used immediately.

This implementation will register one memory slot on every call to this function.

Returns
LPF_ERR_OUT_OF_MEMORY When the requested buffers would cause the system to go out of memory. After returning with this error code, it shall be as though the call to this function had not occurred.
LPF_SUCCESS When the function executed successfully.

◆ lpf_collectives_destroy()

lpf_err_t lpf_collectives_destroy ( lpf_coll_t  coll)

Destroys a lpf_coll_t object created via a call to lpf_collectives_init() or lpf_collectives_init_strided().

This function may only be called on a successfully initialised parameter coll. The lpf_coll_t instance shall then become invalid immediately.

This is a collective call conform the descriptions of lpf_coll_t, lpf_collectives_init, and lpf_collectives_init_strided.

Parameters
[in]collThe collectives system to invalidate.

◆ lpf_broadcast()

lpf_err_t lpf_broadcast ( lpf_coll_t  coll,
lpf_memslot_t  src,
lpf_memslot_t  dst,
size_t  size,
lpf_pid_t  root 
)

Schedules a broadcast of a vector of size bytes. The broadcast shall be complete by the end of a next call to lpf_sync(). This is a collective call, meaning that if one LPF process calls this function, all LPF processes in the same SPMD section must make a matching call to this function.

The root process has size bytes to transmit, as uniquely identified by data and size.

For all \( i \in \{ 0, 1, \ldots, \mathit{size} - 1 \} \), after a next call to lpf_sync() all processes with non- root ID will locally have that \( \mathit{dst}_{ i } \) equals \( \mathit{src}_{ i } \) at PID root.

Note
No more than \(\max( P+1, 2P - 3)\) messages have to be reserved in advance with lpf_resize_message_queue()
No memory areas will be written to on the process with PID root.
No supplied memory areas will be read from on processes with PID not equal to root. Internally, the collective state memory slot may be read from on all processes if a two-stage implementation is used.
Parameters
[in,out]collThe LPF collectives state. This argument must match across processes in the same collective call to this function and must provide the same sized memory area of size bytes.
[in]srcAt PID root, the memory slot of the source memory area to read from. This argument is only used read from by the process with PID equal to root; there, the memory slot must correspond to a valid memory area of size at least size bytes. At processes with PID not equal to root, the memory area corresponding to this slot will not be touched and may thus have any size, including zero. The memory slot must be global; this argument must match across processes in the same collective call to this function.
[in]dstAt PID not equal to root, the destination memory area. The memory slot must correspond to a valid memory area of size at least size bytes. At the process with PID root the memory area corresponding to this slot will not be touched and may be of any size, including zero. The memory slot may be local or global. Arguments do not have to match across processes in the same collective call to this function.
[in]sizeThe number of bytes to broadcast. This argument must match across processes in the same collective call to this function.
[in]rootThe PID of the root process of this operation. This argument must match across processes in the same collective call to this function.
Note
It is legal to have that src equals dst.
At PID root, the dst memory slot may equal LPF_INVALID_MEMSLOT.
Performance guarantees: serial
  1. Problem size N: \( size \)
  2. local work: \( 0 \) ;
  3. transferred bytes: \( NP \) ;
  4. BSP cost: \( NPg + l \);
Performance guarantees: two phase
  1. Problem size N: \( size \)
  2. local work: \( 0 \) ;
  3. transferred bytes: \( 2N \) ;
  4. BSP cost: \( 2(Ng + l) \);
Performance guarantees: two level tree
  1. Problem size N: \( size \)
  2. local work: \( 0 \) ;
  3. transferred bytes: \( 2\sqrt{P}N \) ;
  4. BSP cost: \( 2(\sqrt{P}Ng + l) \);

◆ lpf_gather()

lpf_err_t lpf_gather ( lpf_coll_t  coll,
lpf_memslot_t  src,
lpf_memslot_t  dst,
size_t  size,
lpf_pid_t  root 
)

Schedules a gather of a vector of size bytes. The gather shall be complete by the end of a next call to lpf_sync(). This is a collective operation, meaning that if one LPF process calls this function, all LPF processes in the same SPMD section must make a matching call to this function.

The root process will retrieve size bytes from each other process. The source memory areas are identified by src and size.

For all \( i \in \{ 0, 1, \ldots, \mathit{size} - 1 \} \) and for all \( k \in \{ 0, 1, \ldots, p-1 \},\ k \neq \mathit{root} \), after a next call to lpf_sync() the process with ID root will have that \( \mathit{dst}_{ k \cdot p + i } \) equals \( \mathit{src}_{ i } \) at PID \( k \), with \( p \) the number of processes registered in coll. The memory area starting from dst plus root * size with size size bytes will not be touched at PID root.

Note
No more than \(P-1\) messages have to be reserved in advance with lpf_resize_message_queue()
No memory areas will be written to at process with PIDs not equal to root.
Recommended usage has at PID root that the local source data is already in place at dst with offset root times size bytes. Otherwise, a manual copy of the corresponding source data at PID root into dst is required to make the memory area at dst of size size correspond to the globally gathered data.
Parameters
[in,out]collThe LPF collectives state. This argument must match across processes in the same collective call to this function.
[in]srcThe memory slot of the source memory area to read from. When PID is not root, the corresponding memory area to src should be valid to read for at least size bytes. When PID is root, the corresponding memory area will not be touched and can be of any size, including zero. The memory slot can be local or global. This argument may differ across processes in the same collective call to this function.
[in]dstThe memory slot of the destination memory area. At process ID root, the corresponding memory area must must be valid and of size at least p * size bytes. At other processes, the corresponding memory area will not be touched and may be of any size. The memory slot must be global; this argument must match across processes in the same collective call to this function.
[in]sizeThe number of bytes at each source array. This argument must match across processes in the same collective call to this function.
[in]rootWhich process is the root of this operation. This argument must match across processes in the same collective call to this function.
Note
It is legal to have that src equals dst.
At PID root, the src memory slot may equal LPF_INVALID_MEMSLOT.
Performance guarantees:
  1. Problem size N: \( P * size \)
  2. local work: \( 0 \) ;
  3. transferred bytes: \( N \) ;
  4. BSP cost: \( Ng + l \);

◆ lpf_scatter()

lpf_err_t lpf_scatter ( lpf_coll_t  coll,
lpf_memslot_t  src,
lpf_memslot_t  dst,
size_t  size,
lpf_pid_t  root 
)

Schedules a scatter of a vector of size bytes. The operation shall be complete by the end of a next call to lpf_sync(). This is a collective operation, meaning that if one LPF process calls this function, all LPF processes in the same SPMD section must make a matching call to this function.

The root process will split a source memory area in segments of size bytes each, while expecting \( p \) segments. Each \( k \)th segment is sent to process \( k \).

The \( k \)-th process, \( k \neq \mathit{root} \), shall, after the next call to lpf_sync, have that for all \( 0 \leq i < \mathit{size} \), \( \mathit{dest}_i \) equals \( \mathit{src}_{ k \mathit{size} + i } \) at PID root.

Note
No more than \(P-1\) messages have to be reserved in advance with lpf_resize_message_queue()
No memory shall be written to at the process with PID root.
No memory shall be read from at processes with PID other than root.
Parameters
[in,out]collThe LPF collectives state. This argument must match across processes in the same collective call to this function.
[in]srcThe memory slot of the source memory area to read from. When the PID equals root, this must point to a valid memory area of size at least \( p \mathit{size} \) bytes. For all other processes, the corresponding memory area will not be touched and may be of any size, including zero. The slot must be global; this argument must match across processes in the same collective call to this function.
[in]dstThe memory slot of the destination memory area. At processes with PID not equal to root, this must point to a valid memory area of at least size bytes. At PID root, the memory area is not touched and may be of any size, including zero. This argument may differ across processes in the same collective call to this function.
[in]sizeThe number of bytes that need to be scattered to a single process. The total number of bytes scattered, i.e., the size of the src memory area, equals \( p \mathit{size} \). This argument must match across processes in the same collective call to this function.
[in]rootWhich process is the root of this operation. This argument must match across processes in the same collective call to this function.
Note
It is legal to have that src equals dst.
At PID root, the dst memory slot may equal LPF_INVALID_MEMSLOT.
Performance guarantees:
  1. Problem size N: \( size \)
  2. local work: \( 0 \) ;
  3. transferred bytes: \( N \) ;
  4. BSP cost: \( Ng + l \);

◆ lpf_allgather()

lpf_err_t lpf_allgather ( lpf_coll_t  coll,
lpf_memslot_t  src,
lpf_memslot_t  dst,
size_t  size,
bool  exclude_myself 
)

Schedules an allgather of a vector of size bytes. The operation shall be complete by the end of a next call to lpf_sync(). This is a collective, operation, meaning that if one LPF process calls this function, all LPF processes in the same SPMD section must make a matching call to this function.

All processes locally have two memory areas; one of size bytes and another of p * size bytes. After the next call to lpf_sync, each process will have at the latter memory area a concatenation of all of the former memory areas from all processes. More formally:

At the end of the next call to lpf_sync, each process with its PID \( s \in \{ 0, 1, \ldots, p-1 \} \) has \( \forall i \in \{ 0, 1, \ldots, \mathit{size}-1 \} \) and \( \forall k \in \{ 0, 1, \ldots, p-1 \},\ k \neq s \) that \( \mathit{dst}_{ k \cdot \mathit{size} + i } \) local to PID \( s \) equals \( \mathit{src}_{ i } \) local to PID \( k \).

Note
No more than \(2*P\) messages have to be reserved in advance with lpf_resize_message_queue()
There will be no communication outgoing from a process that is incident to that same process, unless exclude_myself is false (see below).

The induced communication pattern as defined above must never cause read and writes to occur at the same memory location, or undefined behaviour will occur.

Parameters
[in,out]collThe LPF collectives state. This argument must match across processes in the same collective call to this function.
[in]srcThe memory slot of the source memory area. This can be a local or global slot. The memory area must be at least size bytes large. This argument may differ across processes in the same collective call to this function. This argument must not equal dst.
[in]dstThe memory slot of the destination memory area. This must be a global slot. On all processes, this must correspond to a valid memory area of size at least p * size. This must be a global slot; this argument must match across processes in the same collective call to this function.
[in]sizeThe number of bytes in a single source memory area. This argument must match across processes in the same collective call to this function.
[in]exclude_myselfSkip myself in the collective communication.
Note
The memory area corresponding to src may overlap with the memory pointed to dst, within any single process. This happens and is valid exactly when the memory area pointed to by src points to that of dst with an offset of exactly s * size , with s that process' PID, and src was registered with length exactly size bytes.
Warning
An implementation does not have to check for erroneously overlapping calls– any such call may be mitigated by an implementation, but will in general lead to undefined behaviour.
Performance guarantees:
  1. Problem size N: \( P * size \)
  2. local work: \( 0 \) ;
  3. transferred bytes: \( N \) ;
  4. BSP cost: \( Ng + l \);

◆ lpf_alltoall()

lpf_err_t lpf_alltoall ( lpf_coll_t  coll,
lpf_memslot_t  src,
lpf_memslot_t  dst,
size_t  size 
)

Schedules an all-to-all of a vector of size bytes. The operation shall be complete by the end of a next call to lpf_sync(). This is a collective operation, meaning that if one LPF process calls this function, all LPF processes in the same SPMD section must make a matching call to this function.

All process locally have \( p \) elements of \( \mathit{size} \) bytes. The elements will be transposed amongst all participating processes.

At the end of this operation, each process with its unique ID \( s \in \{ 0, 1, \ldots, p-1 \} \) has \( \forall i \in \{ 0, 1, \ldots, \mathit{size} - 1 \} \) and \( \forall k \in \{ 0, 1, \ldots, p - 1 \},\ k \neq s \) that \( \mathit{dst}_{ k\mathit{size} + i } \) local to PID \( s \) equals \( \mathit{src}_{ s\mathit{size} + i } \) local to PID \( k \).

It is illegal to have src equal to dst.

Note
The src memory area shall never be overwritten.
No more than \(2*P-2\) messages have to be reserved in advance with lpf_resize_message_queue()
A process shall never local copy data from src to dst. To make dst a full transpose of its src row at all processes, the diagonal has to be copied manually.

All arguments to this function must match across all processes in the collective call to this function.

Parameters
[in,out]collThe LPF collectives state.
[in]srcThe memory slot of the source memory area. This must correspond to a valid memory area of size at least \( p \mathit{size} \). This memory area must not overlap with that of dst. This must be a global memory slot.
[in]dstThe memory slot of the destination memory area to write to. On all processes, this must point to a valid memory area of size at least \( p \mathit{size} \). This parameter must not overlap with that of src. This must be a global memory slot.
[in]sizeThe number of bytes that each process sends to another. In total, each process sends \( (p-1)\mathit{size} \) bytes, and receives the same amount.
Warning
An implementation does not have to check for use of overlapping memory areas, although it may use src_slot and dst_slot to do so. The user must make sure to never supply aliased memory regions. If src_slot does equal dst_slot, an implementation thus may have a mechanism to mitigate that error, but in general this will lead to undefined behaviour.
Performance guarantees:
  1. Problem size N: \( P * size \)
  2. local work: \( 0 \) ;
  3. transferred bytes: \( N \) ;
  4. BSP cost: \( Ng + l \);

◆ lpf_reduce()

lpf_err_t lpf_reduce ( lpf_coll_t  coll,
void *restrict  element,
lpf_memslot_t  element_slot,
size_t  size,
lpf_reducer_t  reducer,
lpf_pid_t  root 
)

Schedules a reduce operation of one array per process. The reduce shall be completed by the end of a next call to lpf_sync(). This is a collective operation, meaning that if one LPF process calls this function, all LPF processes in the same SPMD section must make a matching call to this function.

At the end of the next lpf_sync, the memory area element points to shall equal the reduced value of all the element memory area passed at function entry. This output value shall only be set at the root process. The reduction operator is user-defined through a lpf_reducer_t. Even if the same reducer function is used, this may result in different pointers being passed across the various processes involved in the collective call; hence the reducer argument cannot be enforced to be the same everywhere.

Note
No more than \(P-1\) messages have to be reserved in advance with lpf_resize_message_queue()
Logically, reducer should point to the same function or undeterministic behaviour may result. Only advanced programmers and applications will be able to exploit this meaningfully.
Parameters
[in,out]collThe LPF collectives state. This argument must match across processes in the same collective call to this function.
[in,out]elementAt PID root, a pointer to a memory area of at least size bytes. At function entry, the memory area will contain the element to be reduced. After a next call to lpf_sync, the value of the memory area at PID root will equal the globally reduced value. At PID not equal to root, this memory area at entry will not point to the to be reduced value. At exit the memory area will not have changed.
[in]element_slotThe lpf_memslot_t corresponding to element. The memory slot must be global, and must have registered size bytes starting from element.
[in]sizeThe size of a single element of the type to be reduced, in bytes. This argument must match across processes in the same collective call to this function.
[in]reducerA function that defines the reduction operator. This argument may differ across processes in the same collective call to this function.
[in]rootThe process ID of the root process in this collective. This argument must match across processes in the same collective call to this function.
Performance guarantees: allgather (N < P*P)
  1. Problem size N: \( P * size \)
  2. local work: \( P*reducer \) ;
  3. transferred bytes: \( N \) ;
  4. BSP cost: \( Ng + N*reducer + l \);
Performance guarantees: transpose, reduce and allgather (N >= P*P)
  1. Problem size N: \( P * size \)
  2. local work: \( (N/P)*reducer \) ;
  3. transferred bytes: \( 2(N/P) \) ;
  4. BSP cost: \( 2(N/P)g + (N/P)*reducer + 2l \);

◆ lpf_allreduce()

lpf_err_t lpf_allreduce ( lpf_coll_t  coll,
void *restrict  element,
lpf_memslot_t  element_slot,
size_t  size,
lpf_reducer_t  reducer 
)

Schedules an allreduce operation of a single object per process. The allreduce shall be complete by the end of a next call to lpf_sync. This is a collective operation, meaning that if one LPF process calls this function, all LPF processes in the same SPMD section must make a matching call to this function.

At the end of the next lpf_sync, the memory area element points to shall equal the reduced value of all elements at all processes. The reduction operator is user-defined through a lpf_reducer_t. Even if the same reducer function is used, this may result in different pointers being passed across the various processes involved in the collective call; hence the reducer argument cannot be enforced to be the same everywhere.

Note
No more than \(2*P-2\) messages have to be reserved in advance with lpf_resize_message_queue()
Logically, reducer should point to the same function or undeterministic behaviour may result. Only advanced programmers and applications will be able to exploit this meaningfully.
Parameters
[in,out]collThe LPF collectives state. This argument must match across processes in the same collective call to this function.
[in,out]elementA pointer to a memory area of at least size bytes. At function entry, this equals the local to be reduced value. After a next call to lpf_sync, the value of the memory area will equal the globally reduced value.
[in]element_slotThe lpf_memslot_t corresponding to element. The memory slot must be global, and must have registered size bytes starting from element.
[in]sizeThe size of a single element of the type to be reduced, in bytes. This argument must match across processes in the same collective call to this function.
[in]reducerA function that defines the reduction operator. This argument may differ across processes in the same collective call to this function.
Performance guarantees: allgather (N < P*P)
  1. Problem size N: \( P * size \)
  2. local work: \( P*reducer \) ;
  3. transferred bytes: \( N \) ;
  4. BSP cost: \( Ng + N*reducer + l \);
Performance guarantees: transpose, reduce and allgather (N >= P*P)
  1. Problem size N: \( P * size \)
  2. local work: \( (N/P)*reducer \) ;
  3. transferred bytes: \( 2(N/P) \) ;
  4. BSP cost: \( 2(N/P)g + (N/P)*reducer + 2l \);

◆ lpf_combine()

lpf_err_t lpf_combine ( lpf_coll_t  coll,
void *restrict  array,
lpf_memslot_t  slot,
size_t  num,
size_t  size,
lpf_combiner_t  combiner,
lpf_pid_t  root 
)

Combines an array at all non-root processes into that of the root process.

The operation is guaranteed to be complete after a next call to lpf_sync.

On input, all processes must supply a valid data array. On output, the root process will have its output array equal to \( array^{\mathit{root}} = \oplus_{k=0}^{p-1} array^{(k)}, \) where \( \oplus \) is prescribed by combiner. The order in which the operator \( \oplus \) is applied is undefined. The operator \( \oplus \) may furthermore be applied in stages, when applied to a single set of input and output arrays. See lpf_combiner_t for more details.

All parameters must match across all processes involved with the same collective call to this function.

This implementation synchronises once before exiting.

Note
No more than \(2*P\) messages have to be reserved in advance with lpf_resize_message_queue()
Parameters
[in,out]collThe LPF collectives state.
[in,out]arrayThe array to be combined. The array must point to a valid memory area of size \( \mathit{num}\mathit{size} \). The num elements in this array must be initialised and will be taken as input for the combiner. After a call to this function, this array's contents will be undefined. On the root process, this array's contents will be the combination of all arrays, as prescribed by the combiner.
[in]slotThe memory slot corresponding to array. This must be a globally registered slot.
[in]numThe number of elements in the array.
[in]sizeThe size, in bytes, of a single element of the array.
[in]combinerA function which may combine one or more elements of the appropriate array types. The combining happens element- by-element.
[in]rootWhich process is the root of this collective operation.
Returns
LPF_SUCCESS When the collective communication request was recorded successfully.
Performance guarantees: allgather (N < P*P)
  1. Problem size N: \( P * num * size \)
  2. local work: \( N*Operator \) ;
  3. transferred bytes: \( N \) ;
  4. BSP cost: \( Ng + N*Operator + l \);
Performance guarantees: transpose, reduce and allgather (N >= P*P)
  1. Problem size N: \( P * num * size \)
  2. local work: \( (N/P)*Operator \) ;
  3. transferred bytes: \( 2(N/P) \) ;
  4. BSP cost: \( 2(N/P)g + (N/P)*Operator + 2l \);
Performance guarantees: two level tree
  1. Problem size N: \( P * num * size \)
  2. local work: \( 2(N/\sqrt{P})*Operator \) ;
  3. transferred bytes: \( 2(N/\sqrt{P}) \) ;
  4. BSP cost: \( 2(N/\sqrt{P})g + (N/\sqrt{P})*Operator + 2l \);

◆ lpf_allcombine()

lpf_err_t lpf_allcombine ( lpf_coll_t  coll,
void *restrict  array,
lpf_memslot_t  slot,
size_t  num,
size_t  size,
lpf_combiner_t  combiner 
)

Combines an array at all processes into one array that is broadcasted over all processes.

The operation is guaranteed to be complete after a next call to lpf_sync.

On input, all processes must supply a valid data array. On output, the all processes will have their output array equal to \( \oplus_{k=0}^{p-1} array^{(k)}, \) where \( \oplus \) is prescribed by combiner. The order in which the operator \( \oplus \) is applied is undefined. The operator \( \oplus \) may furthermore be applied in stages, when applied to a single set of input and output arrays. See lpf_combiner_t for more details.

All parameters must match across all processes involved with the same collective call to this function.

Note
No more than \(2*P\) messages have to be reserved in advance with lpf_resize_message_queue()
Parameters
[in,out]collThe LPF collectives state.
[in]sizeThe size, in bytes, of a single element of the array.
[in]numThe number of elements in the array.
[in]combinerA function which may combine one or more elements of the appropriate array types. The combining happens element- by-element.
[in,out]arrayThe array to be combined. The array must point to a valid memory area of size \( \mathit{num}\mathit{size} \). The num elements in this array must be initialised and will be taken as input for the combiner. After a call to this function, this array's contents will be undefined. On the root process, this array's contents will be the combination of all arrays, as prescribed by the combiner.
[in]slotThe memory slot corresponding to array. This must be a globally registered slot.
Returns
LPF_SUCCESS When the collective communication request was recorded successfully.
Performance guarantees: allgather (N < P*P)
  1. Problem size N: \( P * num * size \)
  2. local work: \( N*Operator \) ;
  3. transferred bytes: \( N \) ;
  4. BSP cost: \( Ng + N*Operator + l \);
Performance guarantees: transpose, reduce and allgather (N >= P*P)
  1. Problem size N: \( P * num * size \)
  2. local work: \( (N/P)*Operator \) ;
  3. transferred bytes: \( 2(N/P) \) ;
  4. BSP cost: \( 2(N/P)g + (N/P)*Operator + 2l \);
Performance guarantees: two level tree
  1. Problem size N: \( P * num * size \)
  2. local work: \( 2(N/\sqrt{P})*Operator \) ;
  3. transferred bytes: \( 2(N/\sqrt{P}) \) ;
  4. BSP cost: \( 2(N/\sqrt{P})g + (N/\sqrt{P})*Operator + 2l \);

Variable Documentation

◆ LPF_INVALID_COLL

const lpf_coll_t LPF_INVALID_COLL
extern

An invalid lpf_coll_t. Can be used as a static initialiser, but can never be used as input to any LPF collective as its contents are invalid.