pbs_psc


pbs_psc, examples which illustrate the use of the Portable Batch System (PBS), a scheduler which controls the submission and execution of jobs on the Pittsburgh Supercomputing Center (PSC) computer clusters.

A user typically logs into a special login node of the cluster, which is intended only for editing, file management, job submission, and other small interactive tasks.

The user wishes to run a parallel program on several processors of the cluster. To do so, the user must create an executable version of the program, write a suitable PBS batch job script describing the job limits, and listing the commands to be executed, and then submit the script to for processing using the qsub command.

The job script can be thought of as consisting of two parts:

The user has several separate issues when preparing a first job script:

and and

To log into the system, use ssh and the address of the component to which you have been assigned. For instance, you might log into the PSC Greenfield system by inserting your username in the following command:

        ssh USERNAME@greenfield.psc.xsede.org
      

To transfer files, you need the sftp command. I do this in a second window, so that I have interactive access in my ssh window, while file transfers occur in the sftp window. The command that makes the connection for me is:

        sftp USERNAME@greenfield.psc.xsede.org
      
and I put files from my local system to the PSC by a command like
        put fred.txt
      
and get files from the PSC back to the local system by
        get jeff.txt
      

The main reason for using a cluster is to be able to compute in batch mode - one or many jobs, submitted to a queue, to run "eventually". You can log out after you submit jobs, and log in later at your convenience to collect the output from completed jobs. Parallel programs can be run on multiple processors this way. Matlab programs, whether parallel or sequential, can also be submitted to the batch queue.

The commands that make a job run in the batch queue form a PBS script. The first part of the script contains commands to the job scheduler. These commands begin with the string "#PBS" and specify the maximum time limit, the number of processes, the particular queue you will use, and so on. Most batch jobs should go into the "batch" queue.

The PBS command "#PBS -l nodes=1:ppn=15" specifies a limit of just 1 node, and all 15 cores on that node. In fact, on the PSC greenfield machine, the number of cores should always be 15. In fact, your job won't run with any other choice.

After the PBS commands come a sequence of commands that you might imagine typing in interactively; that is, these might be the normal sequence of UNIX commands you would issue to run a particular job.

Briefly, if you have a PBS script file called fred.sh you can submit it to the queue by a command like

        qsub fred.sh
      
You can check to see the status of your job by the command
        qstat
      
or
        qstat -uUSERNAME
      
When the job is completed, you should find an output file in your directory containing the output, or the error messages that explain why you didn't actually get any output.

Licensing:

The information on this web page is distributed under the MIT license.

Related Data and Programs:

cplex_slurm_arc, examples which uses the slurm() job scheduler to submit a cplex() job to Virginia Tech's Advanced Research Computing (ARC) computer cluster.

mpi, a C code which uses the MPI application program interface for carrying out parallel computations in a distributed memory environment.

openmp, a Fortran90 code which uses the OpenMP application program interface for carrying out parallel computations in a shared memory environment.

slurm_h2p, examples which demonstrate the use of the SLURM batch job scheduler for the h2p computer cluster, as administered by the Center for Research Computing (CRC) at the University of Pittsburgh.

slurm_rcc, examples which use slurm, which is a job scheduler for batch execution of jobs on the FSU Research Computing Center (RCC) computer cluster.

Source Code:

ENVIRON is a batch job script that simply queries the values of certain environment variables, in particular PBS_O_WORKDIR, which can be useful when trying to set up a program to run under the batch system.

FENICS_BVP_01 is a batch job script that invokes the FENICS program with the input file bvp_01.py, and an auxilliary program timestamp.py

FENICS_POISSON is a batch job script that invokes the FENICS program with the input file poisson.py, and an auxilliary program timestamp.py.

FREEFEM_MPI is a batch job script that runs the parallel version of FreeFem++ with the input file schwarz_mpi.edp.

FREEFEM_SEQUENTIAL is a batch job script that runs the sequential version of FreeFem++ with the input file membrane.edp.

HELLO is a batch job script that compiles and runs a "hello" program.

HELLO_MPI illustrates the compilation and execution of a program that includes MPI directives.

HELLO_OPENMP illustrates the compilation and execution of a program that includes OpenMP directives.

MATLAB_PARALLEL runs MATLAB with a parfor command for parallel execution.

POWER_TABLE shows how a MATLAB program can be run through the batch system. We prepare a file of input commands, and invoke MATLAB on the commandline.


Last revised on 30 September 2024.