Using Open MP on Virginia Tech's SGI Systems


Introduction

This document describes the "mechanics" of using an Open MP program on one of Virginia Tech's SGI systems. It assumes that you already have an Open MP program written, and that you have an account on the SGI systems.

This document will simply walk you through a typical series of steps that start with an Open MP program on your "home" machine, transfer it to an SGI system, compile it, run it, and bring the results back home.

At the end of this document are instructions on how to actually carry out these steps, using sample files available on the web.

Some Assumptions

For this simple introduction, we'll make a number of assumptions.

(One file, simple name): We'll assume you have a program, already written, which uses OpenMP, that the program consists of a single file, written in FORTRAN90, and that this file is called prog.f90.

(No input/Only standard output): We'll also assume for now that the program needs no input, and that the output of the program is entirely directed to the standard output device. In other words, the executing program does not read from or write to any auxilliary files.

(Source code on home machine): We'll assume this source code file is sitting in the source_code subdirectory on your home machine home_mac.

Our goal then is to transfer the file to one of the SGI machines, compile it, run it, and retrieve the output.

Transferring the Source Code to an SGI System

There are currently three SGI systems available at Virginia Tech through the Advanced Research Computing Facility:
NameAddressTotal ProcessorsUser Limit
inferno inferno.arc.vt.edu 20 CPU's 2 CPU's
inferno2 inferno2.arc.vt.edu 128 CPU's 10 CPU's
cauldron cauldron.arc.vt.edu 64 CPU's 6 CPU's

To compile your program, you will need to transfer the source code of your Open MP program to one of these nodes. This can be done with the secure FTP program sftp. Here is a typical session, which suggests how you might transfer the file to inferno2. We are assuming here that you already set up a subdirectory on inferno2 called work_directory.


        home_mac: sftp inferno2.arc.vt.edu

        inferno2: Password for user: xxxxx
        inferno2: cd work_directory
        inferno2: lcd source_code
        inferno2: put prog.f90
        inferno2: ls
        inferno2: prog.f90
        inferno2: quit

        home_mac:
     

Note that the commands cd, pwd and ls are carried out on the remote machine (inferno2 in this case) while the corresponding commands lcd, lpwd and lls will be carried out on the local machine (home_mac in this example). The put command moves files from the local to the remote machine, while the get command brings files from the remote machine to the local one. If multiple files are to be transferred, the mget and mput commands can be used instead.

Compiling the Source Code into an Executable

Once the source code file has been transferred to the SGI system, you can log in and compile the program.

To log in interactively, we use the Secure Shell program, ssh.


        home_mac: ssh inferno2.arc.vt.edu

        inferno2: Password for user: xxxxx
        inferno2: cd work_directory
        inferno2: ifort -fpp -openmp -parallel prog.f90
        inferno2: mv a.out prog
     

The FORTRAN compiler being used is ifort, the Intel Fortran compiler. The compiler assumes the program is written in Fortran90 based on its file extension of .f90. The switches -fpp, -openmp and -parallel are necessary in order that the Open MP directives be processed correctly.

If the compilation fails, you will need to revise your program. You can either edit the program on your home machine and transfer it again, or make the changes directly on the System X copy.

In our example, we assume the compilation was successful. We allowed the compiler to assign the default name of a.out to the executable program it created, and then we renamed it to program. We're now ready to submit the program to execution, so we're staying logged in.

Compiling programs in other languages

If the program was written in Fortran77, then the file extension should be simply .f. When compiling an Open MP program with the Intel FORTRAN compiler, the -fpp, -openmp and -parallel switches are needed:


        ifort -fpp -openmp -parallel prog.f
      

If the program was written in C, then the Intel C compiler would be used. This compiler is named icc. A C program has a file extension of .c. When compiling an Open MP program, the -openmp and -parallel switches are needed:


        icc -openmp -parallel prog.c
      

If the program was written in C++, then the Intel C++ compiler would be used. This compiler is named icpc. A C++ program has a file extension of .cc, cpp, cxx or .C. When compiling an Open MP program, the -openmp and -parallel switches are needed:


        icpc -openmp -parallel prog.C
      

Submitting a Script to Run the Executable

Once the executable program has been created, you need a shell script to run the program in parallel. This script specifies the number of processors to be used, the time limit, and so on.

Here is a simplified shell script for our example, which we will call prog.sh.


#!/bin/bash
##
##  Account information:
##
#PBS -W group_list=YOURGROUPNAME

##
##  The queue you want.
##
#PBS -q inferno2_q

##
##  Maximum wallclock time :( Hours : Minutes : Seconds )
##
#PBS -lwalltime=00:02:00

##
##  The number of processors requested.
##
#PBS -lncpus=4

##
##  Define the number of threads (should equal NCPUS above!)
##
export OMP_NUM_THREADS=4 

##
##  The job starts in the same directory from which the script was submitted.
##

cd $PBS_O_WORKDIR 

##
##  The name of the program to run.
##
./prog > prog_output.txt

exit;
     

Replace the "YOURGROUPNAME" field in this file by your group information. To get your group, log into one of the SGI compile nodes and type


        groups
      
Ignore the "staff" group in the output; use the other group that is listed as the value of the "???" field in the shell script.

We'll assume that the shell script prog.sh is stored in work_directory, the subdirectory which contains prog.c and the executable prog. To run the job, we must "submit" the shell script to the queuing system. To do this, we must move to the subdirectory containing the job script and the executable (which we're assuming is subdirectory work_directory), and issue the command:


        qsub prog.sh
     

The qsub command asks the queing system to schedule your job to run. The immediate response from the queueing system is a message that assigns a job number. The job number can be used to check on the progress of your job, and it will also be used as part of the name of the log files created when your job is done.

For example, the response to your qsub command might be


        40316.queue.tcf-int.vt.edu
      
in which case your job number is 40316.

Although our example job is small (only 30 seconds on 4 processors) and should run quickly, it is always possible to check on the status of all the jobs you have in the queue, by issuing the command


        showq | grep YOUR_NAME
      
which might show you

        JOBID    USERNAME   STATE  PROCS  REMAINING            STARTTIME

        40316   your_name    Idle      4   00:00:30  Mon Oct 15 14:06:00
      

You can also use the convenient command


        qstat -u YOUR_NAME
      
whose output format is a little different

                                                                           Req'd  Req'd   Elap
        Job ID               Username Queue    Jobname    SessID NDS   TSK Memory Time  S Time
        -------------------- -------- -------- ---------- ------ ----- --- ------ ----- - ----
        40316.queue.tcf-int. YOUR_NAM producti program      --      2    1   --   00:00 Q --
      
This command gives you information about the number of nodes requested, the amount of time and memory requested and so on. The "S" (for "status") field lists a value of "Q", which means the job has been queued, but has not started to run. (Note that the output under each heading is truncated if it is long).

Retrieving the Output File

If the program has finished execution, then we may want to retrieve the output file, and possibly other files that the program created.

Doing this is essentially the "inverse" of the process we went through in copying the source code program to inferno2. So one way to do this would begin by connecting to inferno2 using the sftp program:


        home_mac: sftp inferno2.arc.vt.edu

        inferno2: Password for user: xxxxx
        inferno2: cd work_directory
        inferno2: lcd source_code
        inferno2: get prog_output.txt
        inferno2: lls
        inferno2: prog.f90  prog_output.txt
        inferno2: quit

        home_mac:
     


Sample Files for Experimentation

Sample files are available, so that you can try out the procedures for file transfer, compilation, job submission, and output file recovery.

  1. Copy the appropriate source code file (choose your favorite language) to your home machine.
  2. Copy the BASH shell script prime_sum.sh to your home machine. The first step in this script is to compile the source code program. Change this line, depending on which language you want to use. The current line uses the C++ source code and compiler.
  3. Transfer the source code file and the shell script to your directory on inferno, inferno2 or cauldron.
  4. Submit the shell script by typing qsub prime_sum.sh;
  5. Retrieve the output file prime_sum_output.txt to your home machine and compare it to the sample results.

You can return to the HTML web page.


Last revised on 26 January 2009