ss_gg_align, a FORTRAN90 code which implements the sequence/sequence global gap alignment, one of the string matching algorithms described in the reference [Chao].
These algorithms carry out the computation in linear space, and compute not just the optimal alignment score, but also the corresponding optimal alignment.
It's important to be able to compute alignments using "linear space", that is, just a few vectors whose length N is equal to that of a typical string. A quadratic algorithm would require a two dimensional array of total dimension N*N. Realistic alignment problems can involve strings of N=100,000 elements or more, so a quadratic algorithm would be expensive or impossible to use.
The "matching" being considered does not actually require that every element of string A match an identical element of string B. Instead, the matching algorithm is essentially looking for the cheapest way of transforming one string into the other, allowing the operations of "mutation" (change one letter to another), "deletion" (drop a string of consecutive letters) and "insertion (insert a string of consecutive letters). Thus, at the same time, we are measuring a sort of evolutionary distance between two strings and a formal "editing distance" between them.
This set of routines assumes that an insertion or deletion of length K is penalized using an "affine gap penalty formula" of the form:
Penalty = Gap_Open + K * Gap_ExtendThis choice of penalty function has a major effect on the form of the matching algorithms, particularly in the linear space case. (Compare the simpler "global distance" routines in SS_GD_ALIGN.)
The score for the actual best matching is determined without explicitly constructing the best matching. It is a matter of some difficulty to recover the matching corresponding to the best score. This is particularly true if the algorithm is a linear space one, which discards a great deal of intermediate information. However, it is possible to set up a recursive algorithm which determines the best alignment, using only linear space.
Routines that use quadratic space are included as well, so the algorithms can be compared for storage, speed, and correctness.
The names of the scoring and path routines include information about whether they use a forward, backward, or recursive algorithm, whether they compute the score or the path, and whether they use linear or quadratic space. Thus, the routine SS_GG_FSQ uses the forward algorithm to compute the score, with quadratic space requirements.
The computer code and data files described and made available on this web page are distributed under the GNU LGPL license.
ss_gg_align is available in a FORTRAN90 version.
PS_GG_ALIGN, a FORTRAN90 library which implements a profile/sequence global alignment using an affine gap penalty.
PS_LG_ALIGN, a FORTRAN90 library which implements a profile/sequence local alignment using an affine gap penalty.
PS_QG_ALIGN, a FORTRAN90 library which implements a profile/sequence quasiglobal alignment using an affine gap penalty.
SS_GD_ALIGN, a FORTRAN90 library which globally aligns two sequences using a distance matrix.
SS_LG_ALIGN, a FORTRAN90 library which locally aligns two sequences using an affine gap penalty.
SS_QG_ALIGN, a FORTRAN90 library which quasi-globally aligns two sequences using an affine gap penalty.