Albert-Jan N. Yzelman
What's new
- Our paper The Impact of Partial Computations on the Red-Blue Pebble Game has been accepted at this year's SPAA! The Red-Blue Pebble Game by Hong and Kung from 1981 is the canonical model for determining an algorithm's I/O complexity-- how many times data must be read or written to complete its computations. The model defines that a computation may only proceed if all its inputs reside in a fast memory, which additionally must have space free to store the output of the computation. Fast memory has some limited capacity, and data that is stored in fast memory is marked by red pebbles. The model also assumes the presence of unlimited-capacity slow memory-- and data that resides there is marked by blue pebbles. Blue pebbles may be replaced with red pebbles, signifying a read from slow to fast memory. The reverse, replacing a red pebble with a blue one, signifies writing back data from fast to slow memory. Initially, the input data of the algorithm have blue pebbles, and the game is to pebble all outputs of the algorithm by moving pebbles around while respecting data dependences and the limited number of red pebbles, using the least number of moves. The number of moves corresponds to the I/O cost, and the limited number of red pebbles corresponds to the limited cache size.
In our new paper, we study an extension of Red-Blue Pebbling to partial computations originally envisioned by Sobczyk. This extension revolves around the following observation: most computations, for example computing the sum of an array of n numbers, do not require all inputs to reside in fast memory simultaneously; one may accumulate partial output from partial inputs at different points in time. Indeed for some integer k smaller than n, we may compute the sum of adding n numbers by n/k (rounded up) rounds of adding up to k numbers. This is a valid transformation for sequences of any repeated associative computation-- and hence fairly generic beyond simple addition of arrays.
We rephrase the modified pebbling game as a new set of rules, introducing light-red and dark-red pebbles instead of the canonical red ones. Dark-red pebbles indicate data that is updated in fast memory but not yet written back to slow memory, while light-red pebbles indicate data that is present in both fast and slow memory. Data with a light-red pebble thus also must have a blue pebble, while data marked using dark-red pebbles must not have a blue one. The completion of a computation is tracked via edges of the dependency graph; if an input is aggregated into the output of the computation, the corresponding edge is marked. A computation is only complete once all its incoming edges are marked. Given data may only be used as input if its corresponding computation is complete; incomplete outputs may not be prematurely used as inputs to subsequent computations. We then set about the question if this model refinement can lead to significantly different I/O bounds, and, if so, whether this is the case for a small set of practical, often-used algorithms.
We first show that there exists an algorithm consisting of n computations where the I/O difference between regular Red-Blue Pebbling (RBP) and Partial-compution Red-Blue Pebbling (PRBP) can be a factor n away from each other. However, we can also prove that for any given algorithm, it is NP-hard to decide whether the optimal I/O cost in PRBP is lower than that in RBP. When proving lower bounds on I/O for specific algorithms in RBP, so-called S-partitions, introduced by Hong and Kung, proved to be a useful construct. We found that without adaptation, S-partitions cannot guarantee lower bounds in PRBP. We propose two adaptations instead: S-edge partitions and S-dominator partitions. We use these to prove lower bounds in PRBP for the fast Fourier transform ((nlogn)/logk), matrix multiplication (n3/r1/2), and FlashAttention (n2d/r1/2) -- these bounds match their counterparts in RBP, confirming we retrieve useful lower bounds using these adapted concepts. For some specific computations we show that the optimal I/O strategy is lower for PRBP than for RBP; this is e.g. the case for matrix—vector multiplication (for large enough vectors). Finally, we prove that for PRBP, there exists no polynomial-time algorithm that approximates solutions within any useful factor away from the optimum, as we recently also proved for RBP in our work on multi-processor pebbling. - We release our pre-print on high-performance shared-memory parallel sparse triangular solves using our OneStopParallel toolkit for partitioning and scheduling. Our barrier-list-scheduling approach achieves more than 3.2x, respectively, 1.4x geometric-mean speedups versus the state-of-the-art HDagg and SpMP. Read all about it here!
- My text on Humble Heroes (citation) has left press. Humble here refers to the notion of the humble programmer, a concept first coined by Edsger Dijkstra, who urged us to approach programming while respecting the intrinsic limitations of the human mind. Today, few programming models are humble. Even fewer are both humble and scalable. Almost none are humble, scalable, and high performance.
The ever-increasing scale and complexity of computing systems – inspired by ever-increasing scales of problems and data – however, requires ever-more programmers, each with ever-deeper expertise, to write programs that require ever-higher scalability and performance.
In a bid to resolve this juxtaposition, Humble Heroes argues many humble programming models can be built on top of a few foundational ones, while retaining high performance and optimal scalability– and demonstrates this is feasible by introducing and evaluating ALP/Pregel, freely available on GitHub.
Contact info
Department: | Computing Systems Laboratory |
Postal: | Huawei Zürich Research Center Leonardo Thurgauerstrasse 80 8050 Zürich, Switzerland |
E-mail: | albertjan.<last name>@huawei.com |
albert-jan@<last name>.net | |
Telephone: | +41 7 6556 0 676 |
ORCID: | 0000-0001-8842-3689 |
Overview
- Publications
- Presentations
- ALP/Pregel & ALP/GraphBLAS (gitee), MulticoreBSP, and other software
- A short biography, and an even shorter one