partial_digest


partial_digest, a MATLAB code which seeks solutions of the partial digest problem.

In the partial digest problem, we assume that there are N objects arranged along a line. We denote the position of object I by X(I). The positions of the objects are unknown. Instead, we have a list of the distances between every distinct pair of objects. Note that the distances are not "tagged"; that is, if there is a 175 on the list of distances, we don't know which two objects are separated by that distance. In the partial digest problem, we start with the (N*(N-1))/2 distances, and must come up with at least one linear arrangement of N objects that corresponds to the distances.

In the algorithm used here, we begin by arbitrarily setting X(1) to zero.

For our second step, we find the largest entry in the distance table, and set X(2) to that value.

On each recursive step thereafter, we find the largest unused distance, D, and note that this must represent the distance of the next object from either X(1) or X(2).

Starting with the first possibility, we consider placing this next object at X(K)=D. We now must search the distance table, and ensure that the distances |X(1)-X(K)| through |X(K-1)-X(K)| all show up. If so, then our tentative placement of the object is "plausible", and we can proceed to the next step of recursion, seeking the location of X(K+1).

The second possibility to check on this recursive step is that we should set X(K)=X(2)-D, since this would also explain the occurrence of the distance D. The analysis of this case is otherwise the same as for the first one.

This recursion is guaranteed to "encounter" every solution. (Of course, there might be no solutions whatever.)

This approach has the advantage that recursion is relatively clean and neat to program. Disadvantages include the fact that the amount of memory required to store partial results will grow explosively as the size of the problem increases. Also, it is difficult to intervene or interrupt the recursive process. For instance, the calling program never receives the computed solutions directly. Instead, the recursive routine "realizes" that it has computed a solution, and can print it out.

For these reasons, it would be worth developing an equivalent version of the routines that uses backtracking instead.

Note that this program used integers for the distances. While this is somewhat unnatural, it is convenient when programming, since we are searching the list of distances for values that we arrive at by subtraction, and the slightest roundoff would mean that the algorithm would fail. An alternative would be to allow floating point distances, but to allow a very slight margin of error when looking for a distance in the table that is equal to a difference calculated between two positions.

Licensing:

The computer code and data files made available on this web page are distributed under the MIT license

Languages:

partial_digest is available in a C version and a C++ version and a FORTRAN90 version and a MATLAB version and a Python version.

Related Data and Programs:

partial_digest_test

test_partial_digest, a MATLAB code which generates versions of the partial digest problem.

Reference:

  1. Pavel Pevzner,
    Computational Molecular Biology,
    MIT Press, 2000,
    ISBN: 0-262-16197-4,
    LC: QH506.P47.

Source Code:


Last revised on 19 February 2019.