During the FDI sessions, we should have access to OpenMP both locally on the desktop machines and remotely, by logging into the charon1 or charon2 nodes of the SGI cluster known as inferno2.
The local machines are only dual core; OpenMP will let you ask for any number of threads, but it's likely that 2 is the best you can do. To run an OpenMP program on the local machine, we will need to use the Gnu compilers.
The inferno2 cluster has 128 cores; however, users are limited to using no more than 12. To run an OpenMP program on the cluster, we will need to compile using the Intel compilers, and create a job script.
In the directory hello_open_mp, we have OpenMP versions of the "hello" program in C, C++, FORTRAN77 and FORTRAN90.
Open a terminal window; use the "cd" command to move to this directory, when you type ls you should four files listed, starting with hello_open_mp.c.
Compile one of the programs, using the appropriate Gnu compiler, with the switches necessary to access OpenMP, and then rename the "a.out" file. Depending on your language, the compilation is
gcc -fopenmp hello_open_mp.c g++ -fopenmp hello_open_mp.cc gfortran -fopenmp hello_open_mp.f gfortran -fopenmp hello_open_mp.f90If the compilation was successful, the executable program is called "a.out". Rename this:
mv a.out hello_open_mp
We're almost ready to run. But now we need to indicate how many threads of execution we would like to use. Let's choose 2:
export OMP_NUM_THREADS=2 ./hello_open_mpYou should get "hello" messages from processes 0 and 1.
To convince yourself that the thread number makes a difference, try 4:
export OMP_NUM_THREADS=4 ./hello_open_mp
Did your choice of 4 threads mean that you see messages from 4 processes?
Now move to the "prime_number_open_mp" directory.
cd ../prime_number_open_mpThe "ls" command will show you four files, including "prime_number_open_mp.cc".
Compile the program and rename it "prime".
Since this program actually does some work, and it takes some time to execute, we can compare the program execution times for different numbers of threads. The program returns a time for many different sizes of N, and we might expect to see good results at least for large values of N.
Run "prime" using 1, 2 and 4 threads. For each run, record the time required for the biggest value of N.
Is there an improvement going from 1 thread to 2? From 2 to 4?
Move to the "md_open_mp" directory.
cd ../md_open_mpThe "ls" command will show you four files, including "md_open_mp.f".
Compile the "md_open_mp" program with OpenMP. Run it with 1 thread and then with 2 threads. It should take about one minute to run.
Does the program run faster with 2 threads?
QUAD is a program that estimates an integral by adding up values of a function at equally spaced points. We are going to try to make an OpenMP version of it.
Move to the "quad_test" directory.
cd ../quad_testThere is a version of the "quad_test" program in each of the four language choices.
Change 1: the "include" file. The C and C++ programs require an include statement of the form
# include <omp.h>The FORTRAN programs can both use an include in the main program;
Change 2: the timing calls. Replace the calls to cpu_time by calls to the function omp_get_wtime():
wtime = omp_get_wtime ( ); (stuff to time ) wtime = omp_get_wtime ( ) - wtime;
Change 3: the PARALLEL directive. Here's the hard part! You need to type in an OpenMP PARALLEL directive just before the loop. It should include information about the SHARED, PRIVATE and REDUCTION variables. Here is an example of the format for C and C++
# pragma omp parallel \ shared ( a, b, c ) \ private ( d, ) \ reduction ( + : e )FORTRAN77 looks like this:
c$omp parallel c$omp& shared ( a, b, c ) c$omp& private ( d ) c$omp& reduction ( + : e ) (parallel region) c$omp end paralleland FORTRAN90 looks like this:
!$omp parallel & !$omp shared ( a, b, c ) & !$omp private ( d ) & !$omp reduction ( + : e ) (parallel region) !$omp end parallel
The names "a, b, c" and "d" and "e" are placeholders. You need to put the names of the appropriate variables in these lists. The variables you must classify are i, n, pi, total and x.
Change 4: mark the loop: For C and C++:
# pragma omp forand for FORTRAN77:
c$omp do (loop) c$omp end doand for FORTRAN90:
!$omp do (loop) !$omp end do
Try to get the program to compile. If you get that far, then run it with 1 and 2 threads and confirm that the 2 thread version runs faster!
If you get stuck or confused, there is are OpenMP versions of the programs in the directory quad_open_mp!
First, we must copy the program from the desktop to charon1 or charon2 using sftp:
sftp USERNAME@charon1.arc.vt.edu put hello_open_mp.f90 <-- choose the language you want, here! put hello_open_mp.sh
Second, we log in to charon1 or charon2. These commands are entered through your terminal:
Your job script is set up for 2 CPU's and 2 threads. Modify your job script to run on 4 CPU's and 4 threads. Resubmit your job.
Does your 4 thread job print out messages from 4 threads, as it should?
Copy the prime_number_open_mp program from the desktop to charon1 or charon2 using sftp:
lcd ../prime_number_open_mp put prime_number_open_mp.f <-- Choose your language here put prime_number_open_mp.sh
Now, in your interactive session on charon1 or charon2:
Your job script is set up for 1 CPU and 1 thread. Modify your job script to run on 2 CPU's and 2 threads, then on 4 CPU's and 4 threads.
Compare the times of your 1, 2 and 4 thread jobs for the biggest problem size N. Is it decreasing? Is it decreasing by a factor of 2?
Copy the md_open_mp program from the desktop to charon1 or charon2 using sftp:
lcd ../md_open_mp put md_open_mp.cc <-- Choose your language here put md_open_mp.sh
Now, in your interactive session on charon1 or charon2:
Your job script is set up for 1 CPU and 1 thread. Modify your job script to run on 4 CPU's and 4 threads.
How does the time compare for the two runs?
You can go back to the FDI 2009 page.