This document describes the "mechanics" of using an Open MP program on one of Virginia Tech's SGI systems. It assumes that you already have an Open MP program written, and that you have an account on the SGI systems.
This document will simply walk you through a typical series of steps that start with an Open MP program on your "home" machine, transfer it to an SGI system, compile it, run it, and bring the results back home.
At the end of this document are instructions on how to actually carry out these steps, using sample files available on the web.
For this simple introduction, we'll make a number of assumptions.
(One file, simple name): We'll assume you have a program, already written, which uses OpenMP, that the program consists of a single file, written in FORTRAN90, and that this file is called prog.f90.
(No input/Only standard output): We'll also assume for now that the program needs no input, and that the output of the program is entirely directed to the standard output device. In other words, the executing program does not read from or write to any auxilliary files.
(Source code on home machine): We'll assume this source code file is sitting in the source_code subdirectory on your home machine home_mac.
Our goal then is to transfer the file to one of the SGI machines, compile it, run it, and retrieve the output.
There are currently three SGI systems available at Virginia Tech through the Advanced Research Computing Facility:
| Name | Address | Total Processors | User Limit |
|---|---|---|---|
| inferno | inferno.arc.vt.edu | 20 CPU's | 2 CPU's |
| inferno2 | inferno2.arc.vt.edu | 128 CPU's | 10 CPU's |
| cauldron | cauldron.arc.vt.edu | 64 CPU's | 6 CPU's |
To compile your program, you will need to transfer the source code of your Open MP program to one of these nodes. This can be done with the secure FTP program sftp. Here is a typical session, which suggests how you might transfer the file to inferno2. We are assuming here that you already set up a subdirectory on inferno2 called work_directory.
home_mac: sftp inferno2.arc.vt.edu
inferno2: Password for user: xxxxx
inferno2: cd work_directory
inferno2: lcd source_code
inferno2: put prog.f90
inferno2: ls
inferno2: prog.f90
inferno2: quit
home_mac:
Note that the commands cd, pwd and ls are carried out on the remote machine (inferno2 in this case) while the corresponding commands lcd, lpwd and lls will be carried out on the local machine (home_mac in this example). The put command moves files from the local to the remote machine, while the get command brings files from the remote machine to the local one. If multiple files are to be transferred, the mget and mput commands can be used instead.
Once the source code file has been transferred to the SGI system, you can log in and compile the program.
To log in interactively, we use the Secure Shell program, ssh.
home_mac: ssh inferno2.arc.vt.edu
inferno2: Password for user: xxxxx
inferno2: cd work_directory
inferno2: ifort -fpp -openmp -parallel prog.f90
inferno2: mv a.out prog
The FORTRAN compiler being used is ifort, the Intel Fortran compiler. The compiler assumes the program is written in Fortran90 based on its file extension of .f90. The switches -fpp, -openmp and -parallel are necessary in order that the Open MP directives be processed correctly.
If the compilation fails, you will need to revise your program. You can either edit the program on your home machine and transfer it again, or make the changes directly on the System X copy.
In our example, we assume the compilation was successful. We allowed the compiler to assign the default name of a.out to the executable program it created, and then we renamed it to program. We're now ready to submit the program to execution, so we're staying logged in.
If the program was written in Fortran77, then the file extension should be simply .f. When compiling an Open MP program with the Intel FORTRAN compiler, the -fpp, -openmp and -parallel switches are needed:
ifort -fpp -openmp -parallel prog.f
If the program was written in C, then the Intel C compiler would be used. This compiler is named icc. A C program has a file extension of .c. When compiling an Open MP program, the -openmp and -parallel switches are needed:
icc -openmp -parallel prog.c
If the program was written in C++, then the Intel C++ compiler would be used. This compiler is named icpc. A C++ program has a file extension of .cc, cpp, cxx or .C. When compiling an Open MP program, the -openmp and -parallel switches are needed:
icpc -openmp -parallel prog.C
Once the executable program has been created, you need a shell script to run the program in parallel. This script specifies the number of processors to be used, the time limit, and so on.
Here is a simplified shell script for our example, which we will call prog.sh.
#!/bin/bash
##
## Account information:
##
#PBS -W group_list=YOURGROUPNAME
##
## The queue you want.
##
#PBS -q inferno2_q
##
## Maximum wallclock time :( Hours : Minutes : Seconds )
##
#PBS -lwalltime=00:02:00
##
## The number of processors requested.
##
#PBS -lncpus=4
##
## Define the number of threads (should equal NCPUS above!)
##
export OMP_NUM_THREADS=4
##
## The job starts in the same directory from which the script was submitted.
##
cd $PBS_O_WORKDIR
##
## The name of the program to run.
##
./prog > prog_output.txt
exit;
Replace the "YOURGROUPNAME" field in this file by your group information. To get your group, log into one of the SGI compile nodes and type
groups
Ignore the "staff" group in the output; use the other group
that is listed as the value of the "???" field in the shell script.
We'll assume that the shell script prog.sh is stored in work_directory, the subdirectory which contains prog.c and the executable prog. To run the job, we must "submit" the shell script to the queuing system. To do this, we must move to the subdirectory containing the job script and the executable (which we're assuming is subdirectory work_directory), and issue the command:
qsub prog.sh
The qsub command asks the queing system to schedule your job to run. The immediate response from the queueing system is a message that assigns a job number. The job number can be used to check on the progress of your job, and it will also be used as part of the name of the log files created when your job is done.
For example, the response to your qsub command might be
40316.queue.tcf-int.vt.edu
in which case your job number is 40316.
Although our example job is small (only 30 seconds on 4 processors) and should run quickly, it is always possible to check on the status of all the jobs you have in the queue, by issuing the command
showq | grep YOUR_NAME
which might show you
JOBID USERNAME STATE PROCS REMAINING STARTTIME
40316 your_name Idle 4 00:00:30 Mon Oct 15 14:06:00
You can also use the convenient command
qstat -u YOUR_NAME
whose output format is a little different
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - ----
40316.queue.tcf-int. YOUR_NAM producti program -- 2 1 -- 00:00 Q --
This command gives you information about the number of nodes requested, the amount of
time and memory requested and so on. The "S" (for "status") field lists a
value of "Q", which means the job has been queued, but has not started to run.
(Note that the output under each heading is truncated if it is long).
If the program has finished execution, then we may want to retrieve the output file, and possibly other files that the program created.
Doing this is essentially the "inverse" of the process we went through in copying the source code program to inferno2. So one way to do this would begin by connecting to inferno2 using the sftp program:
home_mac: sftp inferno2.arc.vt.edu
inferno2: Password for user: xxxxx
inferno2: cd work_directory
inferno2: lcd source_code
inferno2: get prog_output.txt
inferno2: lls
inferno2: prog.f90 prog_output.txt
inferno2: quit
home_mac:
Sample files are available, so that you can try out the procedures for file transfer, compilation, job submission, and output file recovery.
You can return to the HTML web page.