The BLACS, or "Basic Linear Algebra Communication Subprograms", form a linear algebra-oriented message passing interface that may be implemented efficiently and uniformly across a large range of distributed memory platforms.
The length of time required to implement efficient distributed memory algorithms makes it impractical to rewrite programs for every new parallel machine. The BLACS exist in order to make linear algebra applications both easier to program and more portable. It is for this reason that the BLACS are used as the communication layer of the distributed memory linear algebra package SCALAPACK, for instance.
MPI is one example of a distributed memory system. A program written at the BLACS level can run on under MPI. The same program should run correctly on systems that use other distributed memory systems. The key is that on each system, the installation of the BLACS library takes into account the interfact between the standard BLACS routines and the local distributed memory system.
BLACS is a FORTRAN example which demonstrates the use of the BLACS.
MPI is a FORTRAN90 example directory which demonstrates the use of MPI for distributed memory computing.
SCALAPACK is a FORTRAN90 example which demonstrates the use of SCALAPACK.
One of the main strengths of the BLACS is that code which uses the BLACS for its communication layer can run unchanged on any supported platform. There are various packages designed to provide a message passing interface that remains unchanged on several platforms, including PICL, and more recently, MPI. These packages are not available on all of the platforms that we wish to use. More importantly, they are attempts at general libraries, and are thus somewhat harder to use than a more restricted code.
The BLACS are written specifically for linear algebra programming. Since the audience of the BLACS is known, the interface and methods of using the routines can be simpler than for those of more general message passing layers.
The BLACS have been written on top of the following message passing layers:
- CMMD
- Allows the BLACS to run on Thinking Machine's CM-5.
- MPI
- Allows the BLACS to run across most parallel platforms.
- MPL
- Allows the BLACS to run on IBM's eServer pSeries 690 system.
- NX
- Allows the BLACS to run on Intel's supercomputer series (iPSC2, iPSC/860, DELTA and PARAGON).
- PVM
- Allows the BLACS to run anywhere PVM is supported, which includes most UNIX systems.
The processes of a parallel machine with P processes are often presented to the user as a linear array of process IDs, labeled 0 through (P - 1). For reasons described below, it is often more convenient to map this 1-D array of P processes into a logical two dimensional process mesh, or grid. This grid will have R process rows and C process columns, where R * C = G <= P. A process can now be referenced by its coordinates within the grid (indicated by the notation {i, j}, where 0 <= i < R, and 0 <= j < C), rather than a single number. An example of such a mapping is shown below.
A diagram of 8 processes mapped to a 2 by 4 process grid:
0 1 2 3 +----------- 0 | 0 1 2 3 1 | 4 5 6 7
An operation which involves more than just a sender and a receiver is called a scoped operation. All processes that participate in a scoped operation are said to be within the operation's scope.
On a system using a linear array of processes, the only natural scope is all processes. Using a 2-D grid, we have 3 natural scopes, as shown in the following table.
SCOPE MEANING ------ ---------------------------------------------- Row All processes in a process row participate. Column All processes in a process column participate. All All processes in the process grid participate.
These groupings of processes are of particular interest to the linear algebra programmer, since distributed data decompositions of a 2D array (a linear algebra matrix) tend to follow this process mapping. For instance, all of a distributed matrix row can be found on a process row, and so on.
Viewing the rows/columns of the process grid as essentially autonomous subsystems provides the programmer with additional levels of parallelism. Of course, how independent these rows and columns actually are will depend upon the underlying machine. For instance, if the grid's processors are connected via ethernet, we can see that the only gain will be in ease of programming. Speed is unlikely to increase, since if one processor is communicating, no others can. If this is the case, process rows or columns will not be able to perform different distributed tasks at the same time. Fortunately, most modern supercomputer interconnection networks are at least as rich as a 2D grid, so that these additional levels of parallelism can be exploited.
In the BLACS, each process grid is enclosed in a context. A context may be thought of as a message passing universe. This means that a grid can safely communicate even if other (possibly overlapping) grids are also communicating.
In most respects, we can use the terms "grid" and "context" interchangeably. Thus, we may say "perform operation in context X" or "in grid X". The slight difference here is that the user may define two exactly identical grids (say, two 1x3 process grids, both of which use processes 0, 1, and 2), but each will be wrapped in its own context, so that they are distinct in operation, even though they are indistinguishable from a process grid standpoint.
Contexts are used so that individual routines using the BLACS can, when required, safely operate without worrying if other distributed codes are being executed on the same machine.
Another example of the use of context might be to define a normal 2D process grid about which most computation takes place. However, at certain sections it may be more convenient to access the processes as a 1D grid, and at certain others we may wish, for instance, to share information among nearest neighbors. We will therefore want each process to have access to three contexts:
Therefore, we see that context allows us to:
In the BLACS, there are two grid creation routines, BLACS_GRIDINIT and BLACS_GRIDMAP, which create a process grid and its enclosing context. These routines return context handles, which are simple integers. Subsequent BLACS routines will be passed these handles, which allow the BLACS to determine what context or grid a routine is being called from. The user should never actually manipulate these handles: they are opaque data objects which are only meaningful for the BLACS routines.
Contexts consume resources. It is therefore advisable to release contexts when they are no longer needed. This is done via the routine BLACS_GRIDEXIT. When the entire BLACS system is shut down (via a call to BLACS_EXIT), all outstanding contexts are automatically freed.
Many communication packages can be classified as having operations based on one dimensional arrays, which are the machine representation for linear algebra's vector class. In programming linear algebra problems, however, it is more natural to express all operations in terms of matrices. Vectors and scalars are, of course, simply subclasses of matrices. On computers, a linear algebra matrix is represented by a two dimensional array (2D array), and therefore the BLACS operate on 2D arrays.
The BLACS recognize the two most common classes of matrices for dense linear algebra. The first of these classes consist of general rectangular matrices, which in machine storage are 2D arrays consisting of M rows and N columns, with a leading dimension, LDA, that determines the distance between successive columns in memory.
The general rectangular matrices therefore take the following parameters as input when determining what array to operate on:
- M
- (input) INTEGER
The number of matrix rows to be operated on.- N
- (input) INTEGER
The number of matrix columns to be operated on.- A
- (input/output) TYPE DEPENDS ON ROUTINE, array of dimension (LDA,N)
A pointer to the beginning of the (sub)array to be sent.- LDA
- (input) INTEGER
The distance between two elements in matrix row.
The second class of matrices recognized by the BLACS are trapezoidal matrices (triangular matrices are a sub-class of trapezoidal) Trapezoidal arrays are defined by M, N, and LDA, as above, but they have two additional parameters as well. These parameters are:
- UPLO
- (input) CHARACTER
Indicates whether the matrix is upper or lower trapezoidal, as discussed below.- DIAG
- (input) CHARACTER
Indicates whether the diagonal of the matrix is unit diagonal (will not be operated on) or otherwise (will be operated on).
The shape of a trapezoidal array is determined by these parameters as follows:
UPLO M < N M = N M > N -------------------------------------- | | 5 x 10 5 x 5 7 x 5 | | XXXXXXXXXX XXXXX XXXXX | XXXXXXXXX XXXX XXXXX 'U' | XXXXXXXX XXX XXXXX | XXXXXXX XX XXXX | XXXXXX X XXX | XX | X | |-------------------------------- | | XXXXX X X | XXXXXX XX XX 'L' | XXXXXXX XXX XXX | XXXXXXXX XXXX XXXX | XXXXXXXXX XXXXX XXXXX | XXXXX | XXXXX | | 5 x 10 5 x 5 7 x 5
The packing of arrays (if required) so that they may be sent efficiently is hidden, allowing the user to concentrate on the logical matrix, rather than how the data is organized in the machine's memory.
One of the things that sets the BLACS apart from other message passing layers is that the user does not need to specify message IDs, (abbreviated msgid). A msgid (also referred to as a message type or tag) is usually an integer which allows a receiving process to distinguish between incoming messages. The generation of these IDs can become problematic. Non-deterministic code may result if non-unique IDs are used in any two sections of code not separated by an explicit barrier.
Therefore, to add to the programmability of the BLACS, it was decided that the BLACS would generate the required msgids. These generated ID's had to have certain properties. First, it must never be the case that unrelated messages with the same destination would get the same ID. Second, in order to maintain performance, the ID generating algorithm had to use only local information: off-processor memory access could not be allowed. Further, it is necessary to allow for BLACS packages to be used alongside other communication platforms. An example that occurs regularly is linking a BLACS package with a machine specific package.
Therefore, the BLACS must allow the user to specify what range of IDs the BLACS can use. The user may do this by a call to the support routine SHIFT_RANGE. By placing two restrictions on communication, these goals were achieved. First, a receiver must know the coordinates of the sending processor. Second, communication between two processes is strictly ordered. This means that if process {0, 0} sends two messages to {0, 1}, then {0, 1} must receive them in the same order that they were sent.
You can return to the HTML web page.