MATLAB_CONDOR
Running MATLAB Under the CONDOR Batch Queueing System


MATLAB_CONDOR is a directory of examples which demonstrate how a MATLAB program can be submitted to the CONDOR batch queueing system.

CONDOR allows a user to submit jobs for batch execution on an informal cluster composed of various computers that often have idle time. Based on information from the user's submission file, CONDOR chooses one or more appropriate and available computers, transfers files to the target systems, executes the program, and returns data to the user.

CONDOR has many features, and its proper use varies from site to site. The information in this document was inspired by the CONDOR system supported by the FSU Research Computing Center (RCC). Some of the information therefore is peculiar to this local installation.

The first thing to note is that executing MATLAB is done indirectly. The user has a MATLAB program or script to run, of course. Let's say the main user script is called "program.m". In order for this script to be run through CONDOR, we need to write a BASH shell script that "knows" where MATLAB is stored, knows how to invoke MATLAB for a noninteractive job, and knows the name of the user script. Such a shell script might be called "program_run.sh", and look like this:

        #!/bin/bash
        /opt/matlab/current/bin/matlab -nosplash -nodesktop -nojvm -r "run('./program.m'); quit"
      

Finally, the user must write a CONDOR script that copies necessary files to an unknown machine, executes the shell script, which executes MATLAB, which executes the user's MATLAB commands, and then copies the output files back. The script might be called "program.condor".

The user must then log into the CONDOR submit node interactively:

ssh condor-login.rcc.fsu.edu
and, if necessary, transfer the CONDOR script, the BASH script, and the MATLAB files to this node using SFTP, and then submit the CONDOR script with a command like:
condor_submit program.condor
The user can check on the status of the job with the command
        condor_q
      
If all goes well, the job output will be returned to the CONDOR submit node. However, if things do not go well, or the job is taking too much time, user "username" can delete all jobs in the condor queue with the command
        condor_rm username
      

Using Files:

On the FSU RCC Condor cluster, you must first copy your files to the CONDOR login machine. When you submit your job to the CONDOR queue, however, the program execution will take place on some unknown machine, which initially does not have any of your files - and may not even have the executable program you want to use, unless it is MATLAB, for instance. Therefore, an important part of using CONDOR is making sure that you copy to the remote machine all the files needed for input, make sure the remote machine already has the executable, or send a copy, and then copy all your output files back.

Because the file system is not shared, the following commands should appear in your CONDOR script:

        should_transfer_files = yes
        when_to_transfer_output = on_exit
      
that allows you to specify the name of this file.

If your executable reads from "standard input", then your CONDOR job will need a file containing that information. CONDOR includes a command of the form

        input = filename
      
that allows you to specify the name of this file. Similarly, if your program writes to "standard output", CONDOR allows you to specify the name of a file where this information will go:
        output = filename
      
and if your program writes to the "standard error" device, you can specify this with
        error = filename
      
The input file must exist on your CONDOR login node before you submit the job. The output and error files are created during the run, and will automatically be copied back to your CONDOR login node when the job is completed.

Your job may require many more files to run than simply the standard input file. In particular, a MATLAB job will usually need one or more M files. You need to tell CONDOR the names of these files, in a comma-separated list:

        transfer_input_files = file1, file2, ..., file99
      

Your job may create many files aside from simply standard output. Luckily, all files created by the run will be automatically copied back.

We happen to know that MATLAB is installed on certain CONDOR nodes. To guarantee that CONDOR sends our job to such a node, we use a command like the following:

        requirements = ( OpSYS="LINUX" && Arch=="X86_64 && Matlab=="true" )
      

To run a MATLAB job on the remote machine, we have to use a special form of the MATLAB command that specifies where the program is, how it is to be run, and what M file it is to execute. This is done by writing a short BASH shell script. If our M file is called "my_prog.m", then the script could be called "run_my_prog.sh", and could look like this:

        #!/bin/bash
        /opt/matlab/current/bin/matlab -nosplash -nodesktop -nojvm -r "run('./myprog.m') quit"
      
Essentially, CONDOR will treat this shell script as your "executable", so your CONDOR script must include the statement:
        executable = run_my_prog.sh
      

A Sample CONDOR Script for MATLAB

Here is a file called "my_prog.condor":

        universe = vanilla
        executable = run_my_prog.sh
        arguments =
        input =
        requirements = ( OpSYS="LINUX" && Arch=="X86_64 && Matlab=="true" )
        should_transfer_files = yes
        transfer_input_files = my_prog.m
        when_to_transfer_files = on_exit
        notification = never
        output = output.txt
        log = log.txt
        error = error.txt
        queue
      

A few comments are in order.

Licensing:

The computer code and data files made available on this web page are distributed under the GNU LGPL license.

Languages:

MATLAB_CONDOR is available in a C version and a C++ version and a FORTRAN77 version and a FORTRAN90 version and a MATLAB version

Related Data and Programs:

C_CONDOR, C programs which illustrate how a C program can be run in batch mode using the condor queueing system.

C++_CONDOR, C++ programs which illustrate how a C++ program can be run in batch mode using the condor queueing system.

CONDOR, examples which demonstrates the use of the CONDOR queueing system to submit jobs that run on a one or more remote machines.

F77_CONDOR, FORTRAN77 programs which illustrate how a FORTRAN77 program can be run in batch mode using the condor queueing system.

F90_CONDOR, FORTRAN90 programs which illustrate how a FORTRAN90 program can be run in batch mode using the condor queueing system.

MATLAB_COMMANDLINE, programs which illustrate how MATLAB can be run from the UNIX command line, that is, not with the usual MATLAB command window.

MATLAB_COMPILER, MATLAB programs which illustrate the use of the Matlab compiler, which allows you to run a Matlab application outside the Matlab environment.

Reference:

  1. condor.pdf,
    Condor Team,
    University of Wisconsin, Madison,
    Condor Version 8.0.2 Manual;
  2. http://www.cs.wisc.edu/htcondor/,
    The HTCondor home page;

Examples and Tests:

SIMPLE is a simple example, in which a MATLAB function is to be called with certain input.

PRIMES is an example which tries to count the prime numbers from 1 to some power of 10.

You can go up one level to the MATLAB source codes.


Last modified on 28 August 2013.