VT FDI 2009
OpenMP Exercises


During the FDI sessions, we should have access to OpenMP both locally on the desktop machines and remotely, by logging into the charon1 or charon2 nodes of the SGI cluster known as inferno2.

The local machines are only dual core; OpenMP will let you ask for any number of threads, but it's likely that 2 is the best you can do. To run an OpenMP program on the local machine, we will need to use the Gnu compilers.

The inferno2 cluster has 128 cores; however, users are limited to using no more than 12. To run an OpenMP program on the cluster, we will need to compile using the Intel compilers, and create a job script.


Exercise 1: Run HELLO_OPEN_MP on the Desktop

In the directory hello_open_mp, we have OpenMP versions of the "hello" program in C, C++, FORTRAN77 and FORTRAN90.

Open a terminal window; use the "cd" command to move to this directory, when you type ls you should four files listed, starting with hello_open_mp.c.

Compile one of the programs, using the appropriate Gnu compiler, with the switches necessary to access OpenMP, and then rename the "a.out" file. Depending on your language, the compilation is

        gcc -fopenmp hello_open_mp.c
        g++ -fopenmp hello_open_mp.cc
        gfortran -fopenmp hello_open_mp.f
        gfortran -fopenmp hello_open_mp.f90
      
If the compilation was successful, the executable program is called "a.out". Rename this:
        mv a.out hello_open_mp
      

We're almost ready to run. But now we need to indicate how many threads of execution we would like to use. Let's choose 2:

        export OMP_NUM_THREADS=2
        ./hello_open_mp
      
You should get "hello" messages from processes 0 and 1.

To convince yourself that the thread number makes a difference, try 4:

        export OMP_NUM_THREADS=4
        ./hello_open_mp
      

Did your choice of 4 threads mean that you see messages from 4 processes?


Exercise 2: Run PRIME_NUMBER_OPEN_MP on the Desktop

Now move to the "prime_number_open_mp" directory.

        cd ../prime_number_open_mp
      
The "ls" command will show you four files, including "prime_number_open_mp.cc".

Compile the program and rename it "prime".

Since this program actually does some work, and it takes some time to execute, we can compare the program execution times for different numbers of threads. The program returns a time for many different sizes of N, and we might expect to see good results at least for large values of N.

Run "prime" using 1, 2 and 4 threads. For each run, record the time required for the biggest value of N.

Is there an improvement going from 1 thread to 2? From 2 to 4?


Exercise 3: Run MD_OPEN_MP on the Desktop

Move to the "md_open_mp" directory.

        cd ../md_open_mp
      
The "ls" command will show you four files, including "md_open_mp.f".

Compile the "md_open_mp" program with OpenMP. Run it with 1 thread and then with 2 threads. It should take about one minute to run.

Does the program run faster with 2 threads?


Exercise 4: Try to make an OpenMP version of QUAD.

QUAD is a program that estimates an integral by adding up values of a function at equally spaced points. We are going to try to make an OpenMP version of it.

Move to the "quad_test" directory.

        cd ../quad_test
      
There is a version of the "quad_test" program in each of the four language choices.

Change 1: the "include" file. The C and C++ programs require an include statement of the form

        # include <omp.h>
      
The FORTRAN programs can both use an include in the main program;
        include 'omp_lib.h'
      

Change 2: the timing calls. Replace the calls to cpu_time by calls to the function omp_get_wtime():

        wtime = omp_get_wtime ( );
        (stuff to time )
        wtime = omp_get_wtime ( ) - wtime;
      

Change 3: the PARALLEL directive. Here's the hard part! You need to type in an OpenMP PARALLEL directive just before the loop. It should include information about the SHARED, PRIVATE and REDUCTION variables. Here is an example of the format for C and C++

        # pragma omp parallel \
          shared ( a, b, c ) \
          private ( d, ) \
          reduction ( + : e )
      
FORTRAN77 looks like this:
        c$omp parallel
        c$omp&  shared ( a, b, c )
        c$omp&  private ( d )
        c$omp&  reduction ( + : e )
        (parallel region)
        c$omp end parallel
      
and FORTRAN90 looks like this:
        !$omp parallel &
        !$omp shared ( a, b, c ) &
        !$omp private ( d ) &
        !$omp reduction (  + : e )
        (parallel region)
        !$omp end parallel
      

The names "a, b, c" and "d" and "e" are placeholders. You need to put the names of the appropriate variables in these lists. The variables you must classify are i, n, pi, total and x.

Change 4: mark the loop: For C and C++:

        # pragma omp for
      
and for FORTRAN77:
        c$omp do
        (loop)
        c$omp end do
      
and for FORTRAN90:
        !$omp do
        (loop)
        !$omp end do
      

Try to get the program to compile. If you get that far, then run it with 1 and 2 threads and confirm that the 2 thread version runs faster!

If you get stuck or confused, there is are OpenMP versions of the programs in the directory quad_open_mp!


Exercise 5: Run HELLO_OPEN_MP on INFERNO2

First, we must copy the program from the desktop to charon1 or charon2 using sftp:

  1. Start a terminal program on your desktop.
  2. Use the cd command to move to the hello_open_mp directory.
  3. Transfer two files from this desktop directory:
                sftp USERNAME@charon1.arc.vt.edu
                put hello_open_mp.f90    <-- choose the language you want, here!
                put hello_open_mp.sh     
              
  4. Leave the sftp terminal window available for further file transfers!

Second, we log in to charon1 or charon2. These commands are entered through your terminal:

  1. Start a second terminal program on your desktop.
  2. ssh USERNAME@charon1.arc.vt.edu
  3. ls (... You should see the copies of your "hello_open_mp" files.)
  4. Compile, using one of the Intel compilers:
    icc -openmp -parallel hello_open_mp.c
    icpc -openmp -parallel hello_open_mp.cc
    ifort -openmp -parallel -fpp hello_open_mp.f
    ifort -openmp -parallel -fpp hello_open_mp.f90
  5. Rename:
    mv a.out hello_open_mp
  6. Submit the job:
    qsub hello_open_mp.sh

Your job script is set up for 2 CPU's and 2 threads. Modify your job script to run on 4 CPU's and 4 threads. Resubmit your job.

Does your 4 thread job print out messages from 4 threads, as it should?


Exercise 6: Run PRIME_NUMBER_OPEN_MP on INFERNO2

Copy the prime_number_open_mp program from the desktop to charon1 or charon2 using sftp:

  1. If your sftp window is still open, simply shift the local directory:
                lcd ../prime_number_open_mp
                put prime_number_open_mp.f    <-- Choose your language here
                put prime_number_open_mp.sh
            

Now, in your interactive session on charon1 or charon2:

  1. ls (... Now you should see the "prime_number_open_mp" files.)
  2. Compile, using one of the Intel compilers:
  3. Rename:
    mv a.out prime_number_open_mp
  4. Submit the job:
    qsub prime_number_open_mp.sh

Your job script is set up for 1 CPU and 1 thread. Modify your job script to run on 2 CPU's and 2 threads, then on 4 CPU's and 4 threads.

Compare the times of your 1, 2 and 4 thread jobs for the biggest problem size N. Is it decreasing? Is it decreasing by a factor of 2?


Exercise 7: Run MD_OPEN_MP on INFERNO2

Copy the md_open_mp program from the desktop to charon1 or charon2 using sftp:

  1. If your sftp window is still open, simply shift the local directory:
                lcd ../md_open_mp
                put md_open_mp.cc    <-- Choose your language here
                put md_open_mp.sh
            

Now, in your interactive session on charon1 or charon2:

  1. ls (... Now you should see the "md_open_mp" files.)
  2. Compile, using one of the Intel compilers:
  3. Rename:
    mv a.out md_open_mp
  4. Submit the job:
    qsub md_open_mp.sh

Your job script is set up for 1 CPU and 1 thread. Modify your job script to run on 4 CPU's and 4 threads.

How does the time compare for the two runs?


You can go back to the FDI 2009 page.


Last revised on 28 May 2009.