Collaboration diagram for LPF Collectives API:

Classes
struct	lpf_coll_t

Macros
#define	_LPF_COLLECTIVES_VERSION 201500L

Typedefs
typedef void(*	lpf_reducer_t) (size_t n, const void array, void value)

typedef void(*	lpf_combiner_t) (size_t n, const void combine, void into)

Functions
lpf_err_t	lpf_collectives_init (lpf_t ctx, lpf_pid_t s, lpf_pid_t p, size_t max_calls, size_t max_elem_size, size_t max_byte_size, lpf_coll_t *coll)

lpf_t	lpf_collectives_get_context (lpf_coll_t coll)

lpf_err_t	lpf_collectives_init_strided (lpf_t ctx, lpf_pid_t s, lpf_pid_t p, lpf_pid_t lo, lpf_pid_t hi, lpf_pid_t str, size_t max_calls, size_t max_elem_size, size_t max_byte_size, lpf_coll_t *coll)

lpf_err_t	lpf_collectives_destroy (lpf_coll_t coll)

lpf_err_t	lpf_broadcast (lpf_coll_t coll, lpf_memslot_t src, lpf_memslot_t dst, size_t size, lpf_pid_t root)

lpf_err_t	lpf_gather (lpf_coll_t coll, lpf_memslot_t src, lpf_memslot_t dst, size_t size, lpf_pid_t root)

lpf_err_t	lpf_scatter (lpf_coll_t coll, lpf_memslot_t src, lpf_memslot_t dst, size_t size, lpf_pid_t root)

lpf_err_t	lpf_allgather (lpf_coll_t coll, lpf_memslot_t src, lpf_memslot_t dst, size_t size, bool exclude_myself)

lpf_err_t	lpf_alltoall (lpf_coll_t coll, lpf_memslot_t src, lpf_memslot_t dst, size_t size)

lpf_err_t	lpf_reduce (lpf_coll_t coll, void *restrict element, lpf_memslot_t element_slot, size_t size, lpf_reducer_t reducer, lpf_pid_t root)

lpf_err_t	lpf_allreduce (lpf_coll_t coll, void *restrict element, lpf_memslot_t element_slot, size_t size, lpf_reducer_t reducer)

lpf_err_t	lpf_combine (lpf_coll_t coll, void *restrict array, lpf_memslot_t slot, size_t num, size_t size, lpf_combiner_t combiner, lpf_pid_t root)

lpf_err_t	lpf_allcombine (lpf_coll_t coll, void *restrict array, lpf_memslot_t slot, size_t num, size_t size, lpf_combiner_t combiner)

Variables
const lpf_coll_t	LPF_INVALID_COLL

Detailed Description

Macro Definition Documentation

◆ _LPF_COLLECTIVES_VERSION

#define _LPF_COLLECTIVES_VERSION 201500L

The version of this collectives specification.

Typedef Documentation

◆ lpf_reducer_t

typedef void(* lpf_reducer_t) (size_t n, const void *array, void *value)

A type of a reducer used with lpf_reduce() and lpf_allreduce(). This function defines how an array of n elements should be reduced into a single value. The value is assumed to be initialised at function entry.

Parameters

[in]	n	The number of elements in the array.
[in]	array	The array to reduce.
[in,out]	value	An initial reduction value into which all values in the array are reduced.

This is a user-defined function that the LPF collectives library makes callbacks to. There are no hard guarantees on the runtime incurred during the induced computation phases, nor are there any guarantees on correctness and failure mitigation.

◆ lpf_combiner_t

typedef void(* lpf_combiner_t) (size_t n, const void *combine, void *into)

A type of combiner used with lpf_combine() and lpf_allcombine(). This function defines how an array of n elements should by combined with an existing array of the same size. All arrays should be assumed to be fully initialised at function entry.

A lpf_combiner_t function may not be applied on the full input array; both arrays of size N may be cut into smaller pieces by the LPF collectives library, which then calls this lpf_combiner_t function repeatedly on each of the smaller pieces of both arrays.

For consistent results, the lpf_combiner_t should be associative and commutative. Other reduces should only be used with greater thought and full understanding of the underlying collective algorithms, which, of course, are implementation dependent.

Warning: The parameter n is not a byte size unless the elements of the arrays happen to be a single byte large.

Parameters

[in]	n	The number of elements in both the combine and into arrays.
[in]	combine	The data to be combined the destination array.
[in,out]	into	An array with existing data into which to combine the array combine.

Function Documentation

◆ lpf_collectives_init()

lpf_err_t lpf_collectives_init	(	lpf_t	ctx,
		lpf_pid_t	s,
		lpf_pid_t	p,
		size_t	max_calls,
		size_t	max_elem_size,
		size_t	max_byte_size,
		lpf_coll_t *	coll
	)

Initialises a collectives struct, which allows the scheduling of collective calls. The initialised struct is only valid after a next call to lpf_sync().

All operations using the output argument coll need to be made collectively; i.e., all processes must make the same function call with possibly different arguments, as prescribed by the collective in question.

Parameters

[in,out]	ctx	The LPF runtime state as provided by lpf_exec() or lpf_hook() via lpf_spmd_t.
[in]	s	The unique ID number of this process in this LPF SPMD run. Must be larger or equal to zero; may not be larger or equal to p.
[in]	p	The number of processes involved with this LPF SPMD run. Must be larger or equal to zero. May not be larger than the value provided by lpf_exec() or lpf_hook() via lpf_spmd_t.
[in]	max_calls	The number of collective calls a single LPF process may make within a single superstep. Must be larger than zero.
[in]	max_elem_size	The maximum number of bytes of a single element to be reduced through lpf_reduce or lpf_allreduce. Must be larger or equal to zero.
[in]	max_byte_size	The maximum number of bytes given as a size parameter to lpf_broadcast, lpf_scatter, lpf_gather, lpf_allgather, or lpf_alltoall. Must be larger or equal to zero. For the lpf_combine and lpf_allcombine collectives, the effective byte size is \( \mathit{num}\mathit{size} \).
[out]	coll	The collective struct needed to perform collective operations using this library. After a successful call to this function, this return lpf_coll_t value may be used immediately.

This implementation will register one memory slot on every call to this function.

Returns: LPF_ERR_OUT_OF_MEMORY When the requested buffers would cause the system to go out of memory. After returning with this error code, it shall be as though the call to this function had not occurred.; LPF_SUCCESS When the function executed successfully.

◆ lpf_collectives_get_context()

lpf_t lpf_collectives_get_context ( lpf_coll_t coll )

Returns the context embedded within a collectives instance.

Parameters

[in] coll A valid collectives instance.

Returns: The LPF core API context used to construct coll with.

See also: lpf_collectives_init

Warning: Calling this function using an invalid coll will result in undefined behaviour.

◆ lpf_collectives_init_strided()

lpf_err_t lpf_collectives_init_strided	(	lpf_t	ctx,
		lpf_pid_t	s,
		lpf_pid_t	p,
		lpf_pid_t	lo,
		lpf_pid_t	hi,
		lpf_pid_t	str,
		size_t	max_calls,
		size_t	max_elem_size,
		size_t	max_byte_size,
		lpf_coll_t *	coll
	)

Initialises a collectives struct, which allows the scheduling of collective calls. The initialised struct is only valid after a next call to lpf_sync(). This variant selects only a subset of processes based on a lower PID bound (inclusive), and upper bound (exclusive), and a stride.

All operations using the output argument coll need to be made collectively; i.e., all processes that are involved with coll must make the same function call with possibly different arguments, as prescribed by the collective in question.

Note: Thus not all processes need to make the same collective call. If, e.g., a range from 10 to 20 with stride 2 is used to construct coll then every collective call at PIDs 10, 12, 14, 16, 18 must be matched with a similar call at all other processes with PID in that same set.

Parameters

[in,out]	ctx	The LPF runtime state as provided by lpf_exec() or lpf_hook().
[in]	s	The unique ID number of this process in this LPF SPMD run. Must be larger or equal to zero; may not be larger or equal to p. May not be smaller than lo.
[in]	p	The number of processes involved with this LPF SPMD run. Must be larger or equal to zero. May not be larger than the value provided by lpf_exec() or lpf_hook() via lpf_spmd_t.
[in]	lo	The lower process ID participating in these collectives. Must be larger or equal to zero. May not be larger than p. May not be larger than hi.
[in]	hi	The upper bound on the process IDs participating in these collectives. Must be larger or equal to zero. May not be larger than p. May not be less than lo.
[in]	str	The stride of process IDs. Must be larger or than one.
[in]	max_calls	number of collective calls a single LPF process may make within a single superstep. Must be larger than zero.
[in]	max_elem_size	The maximum number of bytes of a single element to be reduced through lpf_reduce or lpf_allreduce. May be equal to zero.
[in]	max_byte_size	The maximum number of bytes given as a size parameter to lpf_broadcast, lpf_scatter, lpf_gather, lpf_allgather, or lpf_alltoall. May be equal to zero.
[out]	coll	The collective struct needed to perform collective operations using this library. After a successful call to this function, this return lpf_coll_t value may be used immediately.

This implementation will register one memory slot on every call to this function.

Returns: LPF_ERR_OUT_OF_MEMORY When the requested buffers would cause the system to go out of memory. After returning with this error code, it shall be as though the call to this function had not occurred.; LPF_SUCCESS When the function executed successfully.

◆ lpf_collectives_destroy()

lpf_err_t lpf_collectives_destroy ( lpf_coll_t coll )

Destroys a lpf_coll_t object created via a call to lpf_collectives_init() or lpf_collectives_init_strided().

This function may only be called on a successfully initialised parameter coll. The lpf_coll_t instance shall then become invalid immediately.

This is a collective call conform the descriptions of lpf_coll_t, lpf_collectives_init, and lpf_collectives_init_strided.

Parameters

[in] coll The collectives system to invalidate.

◆ lpf_broadcast()

lpf_err_t lpf_broadcast	(	lpf_coll_t	coll,
		lpf_memslot_t	src,
		lpf_memslot_t	dst,
		size_t	size,
		lpf_pid_t	root
	)

Schedules a broadcast of a vector of size bytes. The broadcast shall be complete by the end of a next call to lpf_sync(). This is a collective call, meaning that if one LPF process calls this function, all LPF processes in the same SPMD section must make a matching call to this function.

The root process has size bytes to transmit, as uniquely identified by data and size.

For all \( i \in \{ 0, 1, \ldots, \mathit{size} - 1 \} \), after a next call to lpf_sync() all processes with non- root ID will locally have that \( \mathit{dst}_{ i } \) equals \( \mathit{src}_{ i } \) at PID root.

Note: No more than \(\max( P+1, 2P - 3)\) messages have to be reserved in advance with lpf_resize_message_queue(); No memory areas will be written to on the process with PID root.; No supplied memory areas will be read from on processes with PID not equal to root. Internally, the collective state memory slot may be read from on all processes if a two-stage implementation is used.

Parameters

[in,out]	coll	The LPF collectives state. This argument must match across processes in the same collective call to this function and must provide the same sized memory area of size bytes.
[in]	src	At PID root, the memory slot of the source memory area to read from. This argument is only used read from by the process with PID equal to root; there, the memory slot must correspond to a valid memory area of size at least size bytes. At processes with PID not equal to root, the memory area corresponding to this slot will not be touched and may thus have any size, including zero. The memory slot must be global; this argument must match across processes in the same collective call to this function.
[in]	dst	At PID not equal to root, the destination memory area. The memory slot must correspond to a valid memory area of size at least size bytes. At the process with PID root the memory area corresponding to this slot will not be touched and may be of any size, including zero. The memory slot may be local or global. Arguments do not have to match across processes in the same collective call to this function.
[in]	size	The number of bytes to broadcast. This argument must match across processes in the same collective call to this function.
[in]	root	The PID of the root process of this operation. This argument must match across processes in the same collective call to this function.

Note: It is legal to have that src equals dst.; At PID root, the dst memory slot may equal LPF_INVALID_MEMSLOT.

Performance guarantees: serial

Problem size N: \( size \)
local work: \( 0 \) ;
transferred bytes: \( NP \) ;
BSP cost: \( NPg + l \);

Performance guarantees: two phase

Problem size N: \( size \)
local work: \( 0 \) ;
transferred bytes: \( 2N \) ;
BSP cost: \( 2(Ng + l) \);

Performance guarantees: two level tree

Problem size N: \( size \)
local work: \( 0 \) ;
transferred bytes: \( 2\sqrt{P}N \) ;
BSP cost: \( 2(\sqrt{P}Ng + l) \);

◆ lpf_gather()

lpf_err_t lpf_gather	(	lpf_coll_t	coll,
		lpf_memslot_t	src,
		lpf_memslot_t	dst,
		size_t	size,
		lpf_pid_t	root
	)

Schedules a gather of a vector of size bytes. The gather shall be complete by the end of a next call to lpf_sync(). This is a collective operation, meaning that if one LPF process calls this function, all LPF processes in the same SPMD section must make a matching call to this function.

The root process will retrieve size bytes from each other process. The source memory areas are identified by src and size.

For all \( i \in \{ 0, 1, \ldots, \mathit{size} - 1 \} \) and for all \( k \in \{ 0, 1, \ldots, p-1 \},\ k \neq \mathit{root} \), after a next call to lpf_sync() the process with ID root will have that \( \mathit{dst}_{ k \cdot p + i } \) equals \( \mathit{src}_{ i } \) at PID \( k \), with \( p \) the number of processes registered in coll. The memory area starting from dst plus root * size with size size bytes will not be touched at PID root.

Note: No more than \(P-1\) messages have to be reserved in advance with lpf_resize_message_queue(); No memory areas will be written to at process with PIDs not equal to root.; Recommended usage has at PID root that the local source data is already in place at dst with offset root times size bytes. Otherwise, a manual copy of the corresponding source data at PID root into dst is required to make the memory area at dst of size size correspond to the globally gathered data.

Parameters

[in,out]	coll	The LPF collectives state. This argument must match across processes in the same collective call to this function.
[in]	src	The memory slot of the source memory area to read from. When PID is not root, the corresponding memory area to src should be valid to read for at least size bytes. When PID is root, the corresponding memory area will not be touched and can be of any size, including zero. The memory slot can be local or global. This argument may differ across processes in the same collective call to this function.
[in]	dst	The memory slot of the destination memory area. At process ID root, the corresponding memory area must must be valid and of size at least `p * size` bytes. At other processes, the corresponding memory area will not be touched and may be of any size. The memory slot must be global; this argument must match across processes in the same collective call to this function.
[in]	size	The number of bytes at each source array. This argument must match across processes in the same collective call to this function.
[in]	root	Which process is the root of this operation. This argument must match across processes in the same collective call to this function.

Note: It is legal to have that src equals dst.; At PID root, the src memory slot may equal LPF_INVALID_MEMSLOT.

Performance guarantees:

Problem size N: \( P * size \)
local work: \( 0 \) ;
transferred bytes: \( N \) ;
BSP cost: \( Ng + l \);

◆ lpf_scatter()

lpf_err_t lpf_scatter	(	lpf_coll_t	coll,
		lpf_memslot_t	src,
		lpf_memslot_t	dst,
		size_t	size,
		lpf_pid_t	root
	)

Schedules a scatter of a vector of size bytes. The operation shall be complete by the end of a next call to lpf_sync(). This is a collective operation, meaning that if one LPF process calls this function, all LPF processes in the same SPMD section must make a matching call to this function.

The root process will split a source memory area in segments of size bytes each, while expecting \( p \) segments. Each \( k \)th segment is sent to process \( k \).

The \( k \)-th process, \( k \neq \mathit{root} \), shall, after the next call to lpf_sync, have that for all \( 0 \leq i < \mathit{size} \), \( \mathit{dest}_i \) equals \( \mathit{src}_{ k \mathit{size} + i } \) at PID root.

Note: No more than \(P-1\) messages have to be reserved in advance with lpf_resize_message_queue(); No memory shall be written to at the process with PID root.; No memory shall be read from at processes with PID other than root.

Parameters

[in,out]	coll	The LPF collectives state. This argument must match across processes in the same collective call to this function.
[in]	src	The memory slot of the source memory area to read from. When the PID equals root, this must point to a valid memory area of size at least \( p \mathit{size} \) bytes. For all other processes, the corresponding memory area will not be touched and may be of any size, including zero. The slot must be global; this argument must match across processes in the same collective call to this function.
[in]	dst	The memory slot of the destination memory area. At processes with PID not equal to root, this must point to a valid memory area of at least size bytes. At PID root, the memory area is not touched and may be of any size, including zero. This argument may differ across processes in the same collective call to this function.
[in]	size	The number of bytes that need to be scattered to a single process. The total number of bytes scattered, i.e., the size of the src memory area, equals \( p \mathit{size} \). This argument must match across processes in the same collective call to this function.
[in]	root	Which process is the root of this operation. This argument must match across processes in the same collective call to this function.

Note: It is legal to have that src equals dst.; At PID root, the dst memory slot may equal LPF_INVALID_MEMSLOT.

Performance guarantees:

Problem size N: \( size \)
local work: \( 0 \) ;
transferred bytes: \( N \) ;
BSP cost: \( Ng + l \);

◆ lpf_allgather()

lpf_err_t lpf_allgather	(	lpf_coll_t	coll,
		lpf_memslot_t	src,
		lpf_memslot_t	dst,
		size_t	size,
		bool	exclude_myself
	)

Schedules an allgather of a vector of size bytes. The operation shall be complete by the end of a next call to lpf_sync(). This is a collective, operation, meaning that if one LPF process calls this function, all LPF processes in the same SPMD section must make a matching call to this function.

All processes locally have two memory areas; one of size bytes and another of p * size bytes. After the next call to lpf_sync, each process will have at the latter memory area a concatenation of all of the former memory areas from all processes. More formally:

At the end of the next call to lpf_sync, each process with its PID \( s \in \{ 0, 1, \ldots, p-1 \} \) has \( \forall i \in \{ 0, 1, \ldots, \mathit{size}-1 \} \) and \( \forall k \in \{ 0, 1, \ldots, p-1 \},\ k \neq s \) that \( \mathit{dst}_{ k \cdot \mathit{size} + i } \) local to PID \( s \) equals \( \mathit{src}_{ i } \) local to PID \( k \).

Note: No more than \(2*P\) messages have to be reserved in advance with lpf_resize_message_queue(); There will be no communication outgoing from a process that is incident to that same process, unless exclude_myself is false (see below).

The induced communication pattern as defined above must never cause read and writes to occur at the same memory location, or undefined behaviour will occur.

Parameters

[in,out]	coll	The LPF collectives state. This argument must match across processes in the same collective call to this function.
[in]	src	The memory slot of the source memory area. This can be a local or global slot. The memory area must be at least size bytes large. This argument may differ across processes in the same collective call to this function. This argument must not equal dst.
[in]	dst	The memory slot of the destination memory area. This must be a global slot. On all processes, this must correspond to a valid memory area of size at least `p * size`. This must be a global slot; this argument must match across processes in the same collective call to this function.
[in]	size	The number of bytes in a single source memory area. This argument must match across processes in the same collective call to this function.
[in]	exclude_myself	Skip myself in the collective communication.

Note: The memory area corresponding to src may overlap with the memory pointed to dst, within any single process. This happens and is valid exactly when the memory area pointed to by src points to that of dst with an offset of exactly s * size , with s that process' PID, and src was registered with length exactly size bytes.

Warning: An implementation does not have to check for erroneously overlapping calls– any such call may be mitigated by an implementation, but will in general lead to undefined behaviour.

Performance guarantees:

Problem size N: \( P * size \)
local work: \( 0 \) ;
transferred bytes: \( N \) ;
BSP cost: \( Ng + l \);

◆ lpf_alltoall()

lpf_err_t lpf_alltoall	(	lpf_coll_t	coll,
		lpf_memslot_t	src,
		lpf_memslot_t	dst,
		size_t	size
	)

Schedules an all-to-all of a vector of size bytes. The operation shall be complete by the end of a next call to lpf_sync(). This is a collective operation, meaning that if one LPF process calls this function, all LPF processes in the same SPMD section must make a matching call to this function.

All process locally have \( p \) elements of \( \mathit{size} \) bytes. The elements will be transposed amongst all participating processes.

At the end of this operation, each process with its unique ID \( s \in \{ 0, 1, \ldots, p-1 \} \) has \( \forall i \in \{ 0, 1, \ldots, \mathit{size} - 1 \} \) and \( \forall k \in \{ 0, 1, \ldots, p - 1 \},\ k \neq s \) that \( \mathit{dst}_{ k\mathit{size} + i } \) local to PID \( s \) equals \( \mathit{src}_{ s\mathit{size} + i } \) local to PID \( k \).

It is illegal to have src equal to dst.

Note: The src memory area shall never be overwritten.; No more than \(2*P-2\) messages have to be reserved in advance with lpf_resize_message_queue(); A process shall never local copy data from src to dst. To make dst a full transpose of its src row at all processes, the diagonal has to be copied manually.

All arguments to this function must match across all processes in the collective call to this function.

Parameters

[in,out]	coll	The LPF collectives state.
[in]	src	The memory slot of the source memory area. This must correspond to a valid memory area of size at least \( p \mathit{size} \). This memory area must not overlap with that of dst. This must be a global memory slot.
[in]	dst	The memory slot of the destination memory area to write to. On all processes, this must point to a valid memory area of size at least \( p \mathit{size} \). This parameter must not overlap with that of src. This must be a global memory slot.
[in]	size	The number of bytes that each process sends to another. In total, each process sends \( (p-1)\mathit{size} \) bytes, and receives the same amount.

Warning: An implementation does not have to check for use of overlapping memory areas, although it may use src_slot and dst_slot to do so. The user must make sure to never supply aliased memory regions. If src_slot does equal dst_slot, an implementation thus may have a mechanism to mitigate that error, but in general this will lead to undefined behaviour.

Performance guarantees:

Problem size N: \( P * size \)
local work: \( 0 \) ;
transferred bytes: \( N \) ;
BSP cost: \( Ng + l \);

◆ lpf_reduce()

lpf_err_t lpf_reduce	(	lpf_coll_t	coll,
		void *restrict	element,
		lpf_memslot_t	element_slot,
		size_t	size,
		lpf_reducer_t	reducer,
		lpf_pid_t	root
	)

Schedules a reduce operation of one array per process. The reduce shall be completed by the end of a next call to lpf_sync(). This is a collective operation, meaning that if one LPF process calls this function, all LPF processes in the same SPMD section must make a matching call to this function.

At the end of the next lpf_sync, the memory area element points to shall equal the reduced value of all the element memory area passed at function entry. This output value shall only be set at the root process. The reduction operator is user-defined through a lpf_reducer_t. Even if the same reducer function is used, this may result in different pointers being passed across the various processes involved in the collective call; hence the reducer argument cannot be enforced to be the same everywhere.

Note: No more than \(P-1\) messages have to be reserved in advance with lpf_resize_message_queue(); Logically, reducer should point to the same function or undeterministic behaviour may result. Only advanced programmers and applications will be able to exploit this meaningfully.

Parameters

[in,out]	coll	The LPF collectives state. This argument must match across processes in the same collective call to this function.
[in,out]	element	At PID root, a pointer to a memory area of at least size bytes. At function entry, the memory area will contain the element to be reduced. After a next call to lpf_sync, the value of the memory area at PID root will equal the globally reduced value. At PID not equal to root, this memory area at entry will not point to the to be reduced value. At exit the memory area will not have changed.
[in]	element_slot	The lpf_memslot_t corresponding to element. The memory slot must be global, and must have registered size bytes starting from element.
[in]	size	The size of a single element of the type to be reduced, in bytes. This argument must match across processes in the same collective call to this function.
[in]	reducer	A function that defines the reduction operator. This argument may differ across processes in the same collective call to this function.
[in]	root	The process ID of the root process in this collective. This argument must match across processes in the same collective call to this function.

Performance guarantees: allgather (N < P*P)

Problem size N: \( P * size \)
local work: \( P*reducer \) ;
transferred bytes: \( N \) ;
BSP cost: \( Ng + N*reducer + l \);

Performance guarantees: transpose, reduce and allgather (N >= P*P)

Problem size N: \( P * size \)
local work: \( (N/P)*reducer \) ;
transferred bytes: \( 2(N/P) \) ;
BSP cost: \( 2(N/P)g + (N/P)*reducer + 2l \);

◆ lpf_allreduce()

lpf_err_t lpf_allreduce	(	lpf_coll_t	coll,
		void *restrict	element,
		lpf_memslot_t	element_slot,
		size_t	size,
		lpf_reducer_t	reducer
	)

Schedules an allreduce operation of a single object per process. The allreduce shall be complete by the end of a next call to lpf_sync. This is a collective operation, meaning that if one LPF process calls this function, all LPF processes in the same SPMD section must make a matching call to this function.

At the end of the next lpf_sync, the memory area element points to shall equal the reduced value of all elements at all processes. The reduction operator is user-defined through a lpf_reducer_t. Even if the same reducer function is used, this may result in different pointers being passed across the various processes involved in the collective call; hence the reducer argument cannot be enforced to be the same everywhere.

Note: No more than \(2*P-2\) messages have to be reserved in advance with lpf_resize_message_queue(); Logically, reducer should point to the same function or undeterministic behaviour may result. Only advanced programmers and applications will be able to exploit this meaningfully.

Parameters

[in,out]	coll	The LPF collectives state. This argument must match across processes in the same collective call to this function.
[in,out]	element	A pointer to a memory area of at least size bytes. At function entry, this equals the local to be reduced value. After a next call to lpf_sync, the value of the memory area will equal the globally reduced value.
[in]	element_slot	The lpf_memslot_t corresponding to element. The memory slot must be global, and must have registered size bytes starting from element.
[in]	size	The size of a single element of the type to be reduced, in bytes. This argument must match across processes in the same collective call to this function.
[in]	reducer	A function that defines the reduction operator. This argument may differ across processes in the same collective call to this function.

Performance guarantees: allgather (N < P*P)

Problem size N: \( P * size \)
local work: \( P*reducer \) ;
transferred bytes: \( N \) ;
BSP cost: \( Ng + N*reducer + l \);

Performance guarantees: transpose, reduce and allgather (N >= P*P)

Problem size N: \( P * size \)
local work: \( (N/P)*reducer \) ;
transferred bytes: \( 2(N/P) \) ;
BSP cost: \( 2(N/P)g + (N/P)*reducer + 2l \);

◆ lpf_combine()

lpf_err_t lpf_combine	(	lpf_coll_t	coll,
		void *restrict	array,
		lpf_memslot_t	slot,
		size_t	num,
		size_t	size,
		lpf_combiner_t	combiner,
		lpf_pid_t	root
	)

Combines an array at all non-root processes into that of the root process.

The operation is guaranteed to be complete after a next call to lpf_sync.

On input, all processes must supply a valid data array. On output, the root process will have its output array equal to \( array^{\mathit{root}} = \oplus_{k=0}^{p-1} array^{(k)}, \) where \( \oplus \) is prescribed by combiner. The order in which the operator \( \oplus \) is applied is undefined. The operator \( \oplus \) may furthermore be applied in stages, when applied to a single set of input and output arrays. See lpf_combiner_t for more details.

All parameters must match across all processes involved with the same collective call to this function.

This implementation synchronises once before exiting.

Note: No more than \(2*P\) messages have to be reserved in advance with lpf_resize_message_queue()

Parameters

[in,out]	coll	The LPF collectives state.
[in,out]	array	The array to be combined. The array must point to a valid memory area of size \( \mathit{num}\mathit{size} \). The num elements in this array must be initialised and will be taken as input for the combiner. After a call to this function, this array's contents will be undefined. On the root process, this array's contents will be the combination of all arrays, as prescribed by the combiner.
[in]	slot	The memory slot corresponding to array. This must be a globally registered slot.
[in]	num	The number of elements in the array.
[in]	size	The size, in bytes, of a single element of the array.
[in]	combiner	A function which may combine one or more elements of the appropriate array types. The combining happens element- by-element.
[in]	root	Which process is the root of this collective operation.

Returns: LPF_SUCCESS When the collective communication request was recorded successfully.

Performance guarantees: allgather (N < P*P)

Problem size N: \( P * num * size \)
local work: \( N*Operator \) ;
transferred bytes: \( N \) ;
BSP cost: \( Ng + N*Operator + l \);

Performance guarantees: transpose, reduce and allgather (N >= P*P)

Problem size N: \( P * num * size \)
local work: \( (N/P)*Operator \) ;
transferred bytes: \( 2(N/P) \) ;
BSP cost: \( 2(N/P)g + (N/P)*Operator + 2l \);

Performance guarantees: two level tree

Problem size N: \( P * num * size \)
local work: \( 2(N/\sqrt{P})*Operator \) ;
transferred bytes: \( 2(N/\sqrt{P}) \) ;
BSP cost: \( 2(N/\sqrt{P})g + (N/\sqrt{P})*Operator + 2l \);

◆ lpf_allcombine()

lpf_err_t lpf_allcombine	(	lpf_coll_t	coll,
		void *restrict	array,
		lpf_memslot_t	slot,
		size_t	num,
		size_t	size,
		lpf_combiner_t	combiner
	)

Combines an array at all processes into one array that is broadcasted over all processes.

The operation is guaranteed to be complete after a next call to lpf_sync.

On input, all processes must supply a valid data array. On output, the all processes will have their output array equal to \( \oplus_{k=0}^{p-1} array^{(k)}, \) where \( \oplus \) is prescribed by combiner. The order in which the operator \( \oplus \) is applied is undefined. The operator \( \oplus \) may furthermore be applied in stages, when applied to a single set of input and output arrays. See lpf_combiner_t for more details.

All parameters must match across all processes involved with the same collective call to this function.

Note: No more than \(2*P\) messages have to be reserved in advance with lpf_resize_message_queue()

Parameters

[in,out]	coll	The LPF collectives state.
[in]	size	The size, in bytes, of a single element of the array.
[in]	num	The number of elements in the array.
[in]	combiner	A function which may combine one or more elements of the appropriate array types. The combining happens element- by-element.
[in,out]	array	The array to be combined. The array must point to a valid memory area of size \( \mathit{num}\mathit{size} \). The num elements in this array must be initialised and will be taken as input for the combiner. After a call to this function, this array's contents will be undefined. On the root process, this array's contents will be the combination of all arrays, as prescribed by the combiner.
[in]	slot	The memory slot corresponding to array. This must be a globally registered slot.

Returns: LPF_SUCCESS When the collective communication request was recorded successfully.

Performance guarantees: allgather (N < P*P)

Problem size N: \( P * num * size \)
local work: \( N*Operator \) ;
transferred bytes: \( N \) ;
BSP cost: \( Ng + N*Operator + l \);

Performance guarantees: transpose, reduce and allgather (N >= P*P)

Problem size N: \( P * num * size \)
local work: \( (N/P)*Operator \) ;
transferred bytes: \( 2(N/P) \) ;
BSP cost: \( 2(N/P)g + (N/P)*Operator + 2l \);

Performance guarantees: two level tree

Problem size N: \( P * num * size \)
local work: \( 2(N/\sqrt{P})*Operator \) ;
transferred bytes: \( 2(N/\sqrt{P}) \) ;
BSP cost: \( 2(N/\sqrt{P})g + (N/\sqrt{P})*Operator + 2l \);

Variable Documentation

◆ LPF_INVALID_COLL

const lpf_coll_t LPF_INVALID_COLL

extern

An invalid lpf_coll_t. Can be used as a static initialiser, but can never be used as input to any LPF collective as its contents are invalid.

Classes

Macros

Typedefs

Functions

Variables

Detailed Description

Macro Definition Documentation

◆ _LPF_COLLECTIVES_VERSION

Typedef Documentation

◆ lpf_reducer_t

◆ lpf_combiner_t

Function Documentation

◆ lpf_collectives_init()

◆ lpf_collectives_get_context()

◆ lpf_collectives_init_strided()

◆ lpf_collectives_destroy()

◆ lpf_broadcast()

◆ lpf_gather()

◆ lpf_scatter()

◆ lpf_allgather()

◆ lpf_alltoall()

◆ lpf_reduce()

◆ lpf_allreduce()

◆ lpf_combine()

◆ lpf_allcombine()

Variable Documentation

◆ LPF_INVALID_COLL