Parallel computing hands-on session 1
Be sure to have worked through your home work assignements first! If anything is unclear, ask your practical session leader for help. You can use the following queue if you run out of personal credits: lp_alg_parallel_comp.
Exercise 1
Let us try to benchmark one of the VIC-3 nodes. We use the benchmarking principle described in Chapter 1 of the book. The software is found on Rob Bisseling's web page:
- Go to your base course directory: cd ~/parco;
- Get the BSPedupack wget http://www.staff.science.uu.nl/~bisse101/Book/Edupack/BSPedupack1.02.tar
- Untar the BSPedupack: tar xvf BSPedupack1.02.tar
- Get a replacement makefile:
cd BSPedupack1.02;
rm -f Makefile;
wget http://people.cs.kuleuven.be/~albert-jan.yzelman/education/parco14/practical1/Makefile - Compile the BSPedupack programs: make
- Now run the benchmarking program: ./bench
Try getting good readings for r, l, g for the node type(s) of your choice. Remember to use the batch system as otherwise your benchmarks will be disrupted by other users!
Exercise 2
So far we exploited parallelism within a single node (intra-node). We now try something else than BSP to work on the inter-node level, using MPI. This time we are going to measure bandwidth and latency.
- Go back to your project directory (cd ~/parco;)
- Get latency.c (wget http://people.cs.kuleuven.be/~albert-jan.yzelman/education/parco14/practical1/latency.c)
- Load openmpi module: module load OpenMPI
- Compile latency.c: mpicc latency.c -o latency
- Submit a job to run latency. Note that latency only works for two processes! Use "mpirun -n 2 latency" to get it executed.
- Be sure that latency got executed in two different physical nodes (and not on two cores of one physical node!).Hint: check the manual page for mpirun for helpful options (man mpirun).
- latency generates the file ``latency.dat'', which contains the results. Download latency.gnuplot and bandwidth.gnuplot. You can use gnuplot (first you have to load the gnuplot module!) to visualize the data. Interpret the results both for the latency and bandwidth results.
Exercise 3
Write your own version of the BSP inner-product code. You can start from the Hello World example.
Exercise 4
Adapt the MPI inter-node benchmarking code from exercise 4 so that it measures h-relations like the BSP benchmarking tool does. Then try to get valid inter-node values for r, l, g. Note that a C-code for linear regression may be found in the bench.c file of edupack; you do not have to write that part yourselves.