Parallel computing hands-on session 1

Log on the VIC-3 supercomputer by opening a terminal and writing ssh guest<your login number>@login2.vic3.cc.kuleuven.be. If asked, accept the remote fingerprint (f0:32:b5:1a:b8:fa:0d:67:a9:da:5f:36:d8:f4:f1:2d), and use your assigned password when asked. If all went well, you are now logged on to a head node of the VIC-3 supercomputer!

Exercise 1

We will run a Hello World example program using batch-processing on a supercomputer. Follow the below steps:

  1. Create a clean directory for this course, e.g., mkdir ParCo (suggested) and change your working directory to there (cd ParCo).
  2. Get the MulticoreBSP for C library: wget http://www.multicorebsp.com/downloads/c/1.1.0/MulticoreBSP-for-C.tar.gz.
  3. Let VIC-3 know you would like to use the GNU compiler collection module load gcc/4.4.5 (try module avail to see what other packages VIC-3 has available for you).
  4. Untar the MulticoreBSP for C package: tar xvfz MulticoreBSP-for-C.tar.gz
  5. Compile the library: cd MulticoreBSP-for-C; make
  6. Get the Hello World example: wget http://people.cs.kuleuven.be/~albert-jan.yzelman/education/parco13/practical1/hello.c
  7. Compile the Hello World example:
    gcc -Iinclude/ -pthread -c -o hello.o hello.c
    gcc -o hello hello.o lib/libmcbsp1.1.0.a -lpthread -lrt
  8. Run the example: ./hello.

If all is well, you just ran your first parallel application on VIC-3. Note you executed the program on the login-node, however; this node we share with a lot of other supercomputer users (like your classmates). You can try the command who to see exactly with whom we are sharing the machine we're currently logged-on to. If we run proper parallel codes, however, we

  1. do not want to be disrupted by other users, and
  2. want to use more than a single node.
To get what we want, we have to use the batch system.

Exercise 2

Read the documentation on VIC-3's batch system, and submit the Hello-world job to run on a single node of VIC-3. Remember MulticoreBSP is shared-memory only, so using more than one node is not useful. Also note VIC-3 has various types of nodes, and that you can use the batch system to choose which node you want to run on.

Note: our guest project account is named guest_project.

Exercise 3

Let us try to benchmark one of the VIC-3 nodes. We use the benchmarking principle described in Chapter 1 of the book. The software is found on Rob Bisseling's web page:

  1. Go up one directory: cd ~/ParCo;
  2. Get the BSPedupack wget http://www.staff.science.uu.nl/~bisse101/Book/Edupack/BSPedupack1.02.tar
  3. Untar the BSPedupack: tar xvf BSPedupack1.02.tar
  4. Get a replacement makefile:
    cd BSPedupack1.02;
    rm -f Makefile;
    wget http://people.cs.kuleuven.be/~albert-jan.yzelman/education/parco13/practical1/Makefile
  5. Compile the BSPedupack programs: make
  6. Now run the benchmarking program: ./bench

Try getting good readings for r, l, g for the node type(s) of your choice. Remember to use the batch-system as otherwise your benchmarks will be disrupted by other users.

Exercise 4

So far we exploited parallelism within a single node (intra-node). We now try something else than BSP to work on the inter-node level, using MPI. This time we are going to measure bandwidth and latency.

  1. Get latency.c (wget http://people.cs.kuleuven.be/~albert-jan.yzelman/education/parco13/practical1/latency.c)
  2. Load openmpi module: module load openmpi
  3. Compile latency.c: mpicc latency.c -o latency
  4. Submit a job to run latency. Note that latency only works for two processes! Use "mpirun latency" to get it executed.
  5. Be sure that latency got executed in two different physical nodes (and not on two cores of one physical node!). You will need a trick for this. Hint: use the man command to check the mpirun manual, check the -n and -pernode option
  6. latency generates the file ``latency.dat'', which contains the results. Download latency.gnuplot and bandwidth.gnuplot. You can use gnuplot (first you have to load the gnuplot module!) to visualize the data. Interpret the results both for the latency and bandwidth results.

Exercise 5

Write your own version of the BSP inner-product code. You can start from the Hello World example.

Exercise 6 (optional)

Adapt the MPI inter-node benchmarking code from exercise 4 so that it measures h-relations like the BSP benchmarking tool does. Then try to get valid inter-node values for r, l, g. Note that a C-code for linear regression may be found in the bench.c file of edupack; you do not have to write that part yourselves.