VT FDI 2009
MPI Exercises


During the FDI sessions, we should have access to MPI only on a remote machine, System X. We have already logged into System X before, when we did the MATLAB exercises, because the matlab1 cluster uses the System X file system.

The System X "head nodes" are sysx1, sysx2 and sysx3, but temporarily, sysx2 is unavailable.

The System X cluster has 1100 dual core G5 processors; users can run parallel jobs on System X by using MPI.


Exercise 1: Compile and run HELLO_MPI on System X

In the directory hello_mpi, we have MPI versions of the "hello" program in C, C++, FORTRAN77 and FORTRAN90.

First, we must copy the program from the desktop to sysx1 or sysx3 using sftp:

  1. Start a terminal program on your desktop.
  2. Use the cd command to move to the hello_mpi directory.
  3. Transfer two files from this desktop directory:
                sftp USERNAME@sysx1.arc.vt.edu
                put hello_mpi.cc    <-- choose the language you want, here!
                put hello_mpi.sh     
              
  4. Leave the sftp terminal window available for further file transfers!

Second, we log in to sysx1 or sysx3. These commands are entered through your terminal:

  1. Start a second terminal program on your desktop and connect to System X:
  2. ssh USERNAME@sysx1.arc.vt.edu
  3. ls (... You should see the copies of your "hello_mpi" files.)
  4. Compile, using one of the MPI compilers:
    mpicc hello_mpi.c
    mpiCC hello_mpi.cc
    mpif77 hello_mpi.f
    mpif90 hello_mpi.f90
  5. Rename:
    mv a.out hello_mpi
  6. Submit the job:
    qsub hello_mpi.sh

Your job script is set up for 1 Node and 1 Processor per node (ppn). Modify your job script to request 2 Nodes and 2 Processors per node. Rerun your job.

Did your choice of 2 nodes and 2 processors per node result in 2 MPI processes or 4?

(Because the System X nodes have just two cores, the maximum value of ppn that you can request is 2).


Exercise 2: Compile and run PRIME1_MPI on System X

In the directory prime1_mpi, we have MPI versions of the "prime_number" program in C, C++, FORTRAN77 and FORTRAN90.

Copy the prime1_mpi program from the desktop to System X using sftp:

  1. If your sftp window is still open, simply shift the local directory:
                lcd ../prime1_mpi
                put prime1_mpi.f90    <-- Choose your language here
                put prime1_mpi.sh
            

Now, in your interactive session:

  1. ls (... Now you should see the "prime1_mpi" files.)
  2. Compile, using one of the MPI compilers:
  3. Rename:
    mv a.out prime1_mpi
  4. Submit the job:
    qsub prime1_mpi.sh

Run this job with 8 MPI processes and record the time reported for the final calculation with the biggest value of N.

(Something is inefficient about the way this program divides up the work among the MPI processes!)


Exercise 3: Compile and run PRIME2_MPI on System X

In the directory prime2_mpi, we have a different MPI version of the "prime_number" program in C, C++, FORTRAN77 and FORTRAN90.

Copy the prime2_mpi program from the desktop to System X using sftp:

  1. If your sftp window is still open, simply shift the local directory:
                lcd ../prime2_mpi
                put prime2_mpi.c   <-- Choose your language here
                put prime2_mpi.sh
            

Now, in your interactive session:

  1. ls (... Now you should see the "prime2_mpi" files.)
  2. Compile, using one of the MPI compilers:
  3. Rename:
    mv a.out prime2_mpi
  4. Submit the job:
    qsub prime2_mpi.sh

Run this job with 8 MPI processes and record the time reported for the final calculation with the biggest value of N.

Both prime1_mpi and prime2_mpi do the same work and get the same answers. But I believe you will find that prime2_mpi executes faster. The reason for this is that the two programs divide up the work differently.

The prime1_mpi program divides the N numbers into consecutive chunks, while prime2_mpi divides the numbers into shuffled chunks. That is, if we have 4 processes and N = 20, the first program divides the work as

        (1,2,3,4,5)  (6,7,8,9,10)  (11,12,13,14,15) (16,17,18,19,20)
      
while the second program divides them as
        (1,5,9,13,17)  (2,6,10,14,18) (3,7,11,15,19) (4,8,12,16,20)
      
Since big numbers are harder to check than small ones, the second way divides the work more evenly, and hence the whole program finishes faster.

(In some calculations, it is very hard to tell in advance how to divide up the work so that it is approximately equal!)


Exercise 4: Compile and run HEAT_MPI on System X

Copy the heat_mpi program from the desktop to System X using sftp:

  1. If your sftp window is still open, simply shift the local directory:
                lcd ../heat_mpi
                put heat_mpi.cc   <-- Choose your language here
                put heat_mpi.sh
            

Now, in your interactive session:

  1. ls (... Now you should see the "heat_mpi" files.)
  2. Compile, using one of the MPI compilers:
  3. Rename:
    mv a.out heat_mpi
  4. Submit the job:
    qsub heat_mpi.sh

Run this job with 8 MPI processes and record the time.

This program solves the heat equation over the spatial interval [0,1]. Instead of picking N points and then splitting them up over P processes, this program assigns N points to each process, so that there are a total of N*P nodes.

The program you have uses N=12, so we are using 12*8=96 spatial points. That's not very many. Let's keep P fixed at 8, but double the value of N to 24. (You must edit the program, change the value of N, and recompile, before running it again.) That means we are roughly doubling the amount of computation. Because P is still 8, the amount of communication stays the same. Rerun this problem and note the time. Did it double? What do you think will happen if you double N again?

(An MPI program is doing both communication and computation. Since communication is slow, it is often possible to increase the amount of computation without increasing the run time by very much.)


You can go back to the FDI 2009 page.


Last revised on 28 May 2009.