CLEAN77 -- A FORTRAN77 Reformatter Reference Manual by G. D. Flint ABSTRACT There are many programs in existence that, through repeated modifications, have become almost unreadable. Many of these same programs are still in use and still being changed. With each successive program modification, it becomes harder for a programmer to follow the logic and to make the desired changes. Often, when asked to modify such a program or convert it for use on another machine, the programmer would prefer to rewrite it for clarity but a lack of time or money (or both) prohibits such an effort. A reasonable compromise to this desire to rewrite the program is for the programmer to "clean up" the routines before proceeding with the changes. Usually this means retyping the routine using new statement labels and statement indentation. This is tedious work and prone to human error. CLEAN77 is designed to do these and other functions automatically for the programmer. General Description CLEAN77 is designed to reformat programs written in FORTRAN such that the cleaned programs adhere as closely as possible to FORTRAN 77 standards (ANSI X3.9-1978). This can be useful when attempting to convert programs that run on one machine for use on another (e.g. IBM 370 to CDC CYBER 205). CLEAN77 will do such things as resequencing statement numbers, converting Hollerith fields to quoted fields and indenting the contents of DO-loop, IF-THEN-ELSE and WHILE blocks. Some of these and other actions are user controllable via "commands." In general, commands are directives that enable or disable a particular user-controllable option (e.g., indenting statements). Certain commands allow the user to establish numeric or character values for a specific option (e.g., statement number increment). Commands may appear in a "command" file (see Usage section) or embedded in the file containing the routines to be cleaned. Further information about commands is given in the "Commands" section. Usage After the user has invoked the program, the names of four files are to be typed in to the program. These files are: An input file containing program units to be cleaned. A command file containing directives to CLEAN77. A result file that will contain the cleaned program units. A listing file that will contain a description of the changes. Please note that these file names will be used repeatedly in the remainder of this document. Remarks/Restrictions The program units to be cleaned should compile with no errors under a FORTRAN 77 compiler. If the program units are not written in FORTRAN 77 but do compile under some FORTRAN compiler (e.g., a FORTRAN 66 compiler), they may be processed correctly by CLEAN77, but the results are unpredictable. "Cleaned" program units should still be compilable under the original compiler if the correct CLEAN77 options are selected. (E.g., some compilers differentiate between Hollerith and quote-mark-delimited strings. Program units destined for such a compiler should have the Hollerith-to-quoted-string conversion disabled.) The following old (non FORTRAN 77) features confuse CLEAN77 and cause it to generate an incorrect result file: IF DIVIDE CHECK and similar "strange" IF statements. End of file checks of the form IF (EOF,5) 10,20 Hollerith strings not in a FORMAT statement that are delimited by asterisks (*) or dollar signs ($). Parenthesized DATA statements of the form: DATA (A=1.0),(B=2.0) Errors/Messages A normal CLEAN77 run will generate four message lines in the listing file as follows: 1234 CARDS READ. 1118 STATEMENTS PROCESSED. 50 ROUTINES DETECTED. 1238 CARDS WRITTEN. The first line specifies the number of card images read from the input file. The second line records the number of apparent FORTRAN statements. Comments are excluded from this count as are lines from any copied routines. The third line lists the number of PROGRAM, SUBROUTINE, FUNCTION and BLOCKDATA statements detected. Note that main programs without the "PROGRAM" card are included in this total. The fourth line tells the number of card images written to the result file. Error and warning messages will appear in the listing file (unless the use of the listing file is disabled at the time of the error). A complete list of these messages and their explanations is given in Appendix A. Commands There are currently 34 user-controllable options available for CLEAN77. These options all have a default value that may be changed by "commands." These commands may appear in the command file or before/between routines in the original source (input) file. Note that the command file is processed before reading the input file. This allows the user to change the processing style of CLEAN77 without having to alter the input file. Commands are of the form: C+COMMAND or C-COMMAND where the "C" must appear in column 1, the plus or minus in column 2 and the first character of the command in column 3. After column 3, blanks are ignored. A command of the form "C+" enables (selects) the specified option. A command of the form "C-" disables (deselects) the specified option. The enabling or disabling of an option may cause other options to have no effect on reformatting. For example, disabling the issuance of comments (C-ISSUEC) would make the comment bracketing option (BRACKET) meaningless. A list of commands and their effects is given below. A "+" in front of the command means that its default status is enabled; A "-", disabled. A summary appears in table form in Appendix F. (+) ADDCONT If enabled, add CONTINUE statements to the ends of DO-loops that end with anything other than a CONTINUE statement. (+) BRACKET If enabled, issue blank comments before and after each set of non-blank comments. (-) COLLECT If enabled, collect all format statements encountered within a routine and issue them just before the END card of the routine. (-) COPY If enabled, copy the following routine intact with no cleaning being performed. NOTE: This command is in effect for one routine. (-) EXEMPTC If enabled, does not convert column 1 of a comment to a C and does not attempt to indent the contents of a non-blank comment (should INDENTC be enabled). (-) EXEMPTF If enabled, FORMAT statements are not converted or indented. They may still be collected. (-) EXEMPTN If enabled, non-executable statements are not converted or indented. (-) EXEMPTS If enabled, all lines that contain X in column 1 (where X is any non-blank, non-alphanumeric, printable character) are exempted from conversion or indentation. The statement is of the form: C+EXEMPTS=X\ \ \ (e\.g\., C+EXEMPTS=*) (-) FORMATB If enabled, the label base (starting number) for FORMAT statement labels is set to N where N is an integer from 1 (one) through 99999 appearing in the command as follows: C+FORMATB=N If disabled, FORMAT statement labels are treated as any other label. (-) FORMATI If enabled, the label increment for FORMAT statement labels is set to N where N is an integer from 1 (one) through 99999 appearing in the command as follows: C+FORMATI=N\ \ \ (Default: 10) NOTE: This command is significant only when the FORMATB command is enabled. (+) HFIELD If enabled, convert all Hollerith fields to quoted fields. NOTE: In formats, all Hollerith fields are converted to quoted fields, regardless of this directive unless EXEMPTF is enabled. (-) INDENTC If enabled, attempt to align the first non-blank character of the comment with the starting column for the next non-comment. If the comment is too long for this, the comment will be right justified. (+) INDENTI If enabled, the indentation increment for DO-loops and other block constructs is set to N where N is an integer from 0 (zero) through 5 (five) appearing in the command as follows: C+INDENTI=N (Default: 3) (+) INDENTS If enabled, attempt to indent the contents of DO-loops and blocks up to a maximum of 30 columns. If disabled, start all statements in column 7. (+) ISSUEC If enabled, issue any detected comments to the result file. If disabled, issue no comments except within routines copied intact. (+) LABELB If enabled, the label base (starting number for statement labels) is set to N where N is an integer from 1 (one) through 99999 appearing in the command as follows: C+LABELB=N (Default: 10) (+) LABELI If enabled, the label increment is set to N where N is an integer from 1 (one) through 99999 appearing in the command as follows: C+LABELI=N (Default: 10) NOTE: If disabled or N=0, no label resequencing is performed. (-) LEFTJ If enabled, the numbers generated by the statement label resequencing will appear left-justified in column 1. (+) LINELEN If enabled, the maximum line length of a FORTRAN statement in a cleaned routine is set toN whereN is an integer from 40 through 125 appearing in the command as follows: C+LINELEN=N (Default: 72) NOTE: If the line length specified is greater than 72 columns, the FORTRAN generated will not be compilable on many machines. It may be useful, however, in making formats and long equations more readable. (-) LISTIDS If enabled, any card identifiers appearing in columns 73-120 of the input will be listed on listing file. (+) LISTNEW If enabled, the cleaned or copied routines written to the result file will be listed on the listing file. (+) LISTOLD If enabled, the old routines from the input file will be listed on the listing file. (+) OLDLEN If enabled, the maximum line length of a FORTRAN statement on the input file is set toN whereN is an integer from 40 through 125 appearing in the command as follows: C+OLDLEN=N (Default: 72) (+) PAGELEN If enabled, the number of lines per page (including the title and subsequent blank line) is N where N is an integer greater than or equal to 30 (thirty) appearing in the command as follows: C+PAGELEN=N (Default: 60) (-) PROP If enabled, any commands appearing in the input file will be propagated to result file. If disabled, any commands appearing in the input file will be processed, but will not appear in the result file. NOTE: Commands appearing in the command file will never be propagated to the result file. (+) QUOTECH If enabled, all quote characters will be converted to X where X is any character in the FORTRAN character set appearing in the command as follows: C+QUOTECH=X (Default: ') NOTE: If disabled, no quote conversion is performed, but Hollerith field conversion (using ') will still occur if C+HFIELD. (+) RETPRFX If enabled, non-standard return labels in the argument list of CALL statements will have their prefix character set to X where X is any character in the FORTRAN character set appearing in the command as follows: C+RETPRFX=X (Default: *) (+) RIGHTJ If enabled, the numbers generated by the statement label resequencing will appear right-justified in column 5. (-) SIZEDEF If enabled, permit a size definition on specification statements. If disabled, remove any size definitions. For example, if disabled: REAL*8 POINTS(11) ==> REAL POINTS(11) (-) SPLITNV If enabled, numbers and variables may be split over two lines. If disabled, numbers and variables will not be split over two lines unless this causes excessive (>19) continuation cards. (-) SPLITST If enabled, strings may be split over two lines. If disabled, strings will not be split over two lines unless either this causes excessive (>19) continuation cards or the string is too long to fit on one line by itself. (+) STATUS If enabled, a table containing the values of the various CLEAN77 options will be printed at the point where this directive is detected. Unless explicitly disabled, this status table will be printed when the processing of the command file is complete. (-) UNCOND If enabled, any statement creating an unconditional program flow change (see Notes section for a list of these statements) will be followed by a card whose first character is X (where X is any character in the FORTRAN character set) and whose remaining columns are blanks. The command formats are: C+UNCOND (creating an all blank card) C+UNCOND=X (e.g., C+UNCOND=*) (-) VARUECS If enabled, variables may use an extended character set consisting of the standard letters and numbers plus the characters "$" (dollar sign) and "_" (underscore or equivalence sign). NOTE 2: If this command is enabled, CLEAN77 is unable to process lines that have more than one statement per line separated by dollar signs (e.g., I=5$J=6). Such a line would normally be split into two or more lines, each containing a single statement. Notes The input file may contain statements that do not compile correctly if CLEAN77 is directed to exempt certain types of statements from processing. For example, if "EXEMPTC" is enabled (do not alter comments), comments that begin with an asterisk (*) in column one will not have that asterisk replaced with a "C" (necessary with some compilers). If "EXEMPTN" is enabled (do not clean nonexecutable statements), data statements that contain Hollerith fields would not have those fields converted to quoted strings; an erroneous condition with some compilers. Although CLEAN77 does not completely parse FORMAT statements, any such statements missing commas before strings will have those commas added. For example, 10 FORMAT (1X3HTOP) would be converted (depending on what commands are in force) to either 10 FORMAT (1X,'TOP') or 10 FORMAT (1X,3HTOP) Continuation lines are indented by one extra increment from the starting column of the first line of the statement. Exceptions to this are: if not indenting. if the indentation limit has been reached. if a character string would split over two lines. In exceptions 1 and 2 above, continuation lines would be indented to align with the first line of the statement. In exception 3, each line would start in column 7. Statement labels that are never referenced are removed. If the statement label on a CONTINUE or FORMAT statement is never referenced, the CONTINUE or FORMAT statement is deleted. When copying a routine, the line length is returned to its initial default of 72 for the duration of the copy. When the new line length ("LINELEN") is set to less than 72, comments may be truncated to fit. If the old line length ("OLDLEN") is set to 72 or less, at least 19 continuation cards (the standard limit) can be read and processed by CLEAN77. Because of limitations in its internal string buffer, if "OLDLEN" is set to a number greater than 72, CLEAN77 may not be able to process all the standard 19 continuation cards. The number of continuation cards that CLEAN77 is assured of being able to handle may be approximated by the following formula: TRUNCATE(1452/(OLDLEN-6)) - 1 If the statement that ends a DO-loop is a multiple assignment statement (a non-standard condition), CLEAN77 will correctly split the assignment statement into its component statements, but it will issue the loop-ending statement label on at least two cards. Such a set of statements obviously would not compile and acts as a notice to the user that a condition that CLEAN77 cannot handle has been detected. The set of statements would have to be re-coded manually. (See the Examples section.) On most mainframe systems, CLEAN77 will process an average of 50 lines from the input file per cpu second. This is a good starting point when estimating the total cpu time needed to run CLEAN77. Programs that contain numerous extremely long or short lines (i.e., card images, not FORTRAN statements), many statement labels or many character strings may differ significantly from this estimate. Statements that cause an unconditional program flow change are: RETURN STOP PAUSE (on some systems this equates to a "no-op"; others to a "STOP") GO TO 10 (unconditional GO TO) GO TO K (assigned GO TO with no labels) Examples Using default values (i.e., no commands in the command file) plus the commands embedded in the program source, CLEAN77 would convert: C+FORMATB=1000 PROGRAM X(OUTPUT,TAPE6=OUTPUT) CALL Y(I,J) 5 IF(I.GT.7)GOTO 2 I=I+1 CALL Z(J) GOTO 5 C 2 WRITE(6,999)J 999 FORMAT(1X,3HJ =,I10) STOP END C+LABELB=100 C+INDENTI=4 C+EXEMPTN SUBROUTINE Y (I,J) I=10.0*RANDOM(1.5) DO 1 K=1,I J=7.5*RANDOM(0.5) IF(J.GT.5)GOTO 2 IF(J.LE.2)GOTO 1 CALL Z(J) 1 CONTINUE 2 CALL Z(J) RETURN END TO: PROGRAM X (OUTPUT,TAPE6=OUTPUT) CALL Y (I,J) 10 IF (I.GT.7) GO TO 20 I = I+1 CALL Z (J) GO TO 10 C 20 WRITE (6,1000) J 1000 FORMAT (1X,'J =',I10) STOP END SUBROUTINE Y (I,J) I = 10.0*RANDOM(1.5) DO 100 K = 1,I J = 7.5*RANDOM(0.5) IF (J.GT.5) GO TO 110 IF (J.LE.2) GO TO 100 CALL Z (J) 100 CONTINUE 110 CALL Z (J) RETURN END Using default values (i.e., no commands in the command file) plus the commands embedded in the program source, CLEAN77 would convert: C+UNCOND=C SUBROUTINE SETC (A,B,C) C IF(A.GT.0.0)THEN IF(A.GT.B)THEN C=A ELSE C=B/A ENDIF ELSE C=-A/B ENDIF RETURN END TO: SUBROUTINE SETC (A,B,C) C IF (A.GT.0.0) THEN IF (A.GT.B) THEN C = A ELSE C = B/A ENDIF ELSE C = -A/B ENDIF RETURN C END The first two examples show no "unusual" processing by CLEAN77. They represent standard CLEAN77 runs. The next two examples show unusual conditions that CLEAN77 detects. More information appears beneath each example. Using default values (i.e., no commands in the command file) plus the commands embedded in the program source, CLEAN77 would convert: SUBROUTINE LOOPS COMMON/XYZ/A(30,50),B(30,50) C DO 1 I=1,30 DO 1 J=1,50 DO 2 K=1,3 2 A(I,J)=A(I,J)*FUNCT(K) 1 CONTINUE C DO 3 I=1,30 DO 3 J=1,50 IF (B(I,J).LT.0.0)GOTO3 B(I,J)=A(I,J) 3 B(I,J)=B(I,J)F*UNCT2(83.7) C RETURN C END TO: SUBROUTINE LOOPS COMMON /XYZ/ A(30,50),B(30,50) C DO 20 I = 1,30 DO 20 J = 1,50 DO 10 K = 1,3 A(I,J) = A(I,J)*FUNCT(K) 10 CONTINUE 20 CONTINUE C DO 30 I = 1,30 DO 30 J = 1,50 IF (B(I,J).LT.0.0) GO TO 30 B(I,J) = A(I,J) 30 B(I,J) = B(I,J)*FUNCT2(83.7) C RETURN C END No CONTINUE statement is added to the end of loop 30 because of the jump to the terminal statement of the loop from within the scope of the loop. Using default values (i.e., no commands in the command file) plus the commands embedded in the program source, CLEAN77 would convert: SUBROUTINE DOBAD (X,Y) COMMON /XYZ/ A(10) C DO 10 I = 1, 10 IF (A(I).GT.0.0) RETURN A(I) = I 10 X = Y = A(I) C DO 20 I = 1, 10 IF (A(I).GT.10.0) GO TO 20 A(I) = 2*I 20 X = Y = A(I) C RETURN C END TO: SUBROUTINE DOBAD (X,Y) COMMON /XYZ/ A(10) C DO 10 I = 1, 10 IF (A(I).GT.0.0) RETURN A(I) = I Y = A(I) 10 CONTINUE 10 X = Y C DO 20 I = 1, 10 IF (A(I).GT.10.0) GO TO 20 A(I) = 2*I 20 Y = A(I) 20 X = Y C RETURN C END Two lines have label "10" and two lines have label "20." This is because CLEAN77 is unable to determine whether the labels should appear before the first or after the last assignment statement (or both, meaning that a new label would have to be generated). Appendix A Error and Warning Messages "ABOVE COMMAND POSSIBLY MISSPELLED" The card just listed from the command file appears to be a command (the first two characters are "C+" or "C-"), but any remaining text does not match a valid command name. "COMPASS ROUTINE - COPIED TO =END= CARD" A Control Data Corporation Cyber-170/180 assembler (COMPASS) routine has been detected. (An IDENT card was detected.) The check for COMPASS routines appears only in versions of CLEAN77 designed to be run on CDC operating systems. No processing is done on this routine. It is copied from the input file to the result file. "EXCESSIVE CONTINUATION CARDS (>19)" A count of continuation cards generated has exceeded the maximum defined in the ANSI standard. In versions of CLEAN77 designed to be run under CDC operating systems, the routine type and name appear on the next line of the dayfile. "=EXEMPTS= CANNOT BE ALPHANUMERIC/BLANK" There was either no non-blank character following the equal sign (=) on the "EXEMPTS" command or the character was a letter or number. "GENERATED LABELS OVERLAP OR > 99999" When attempting to generate new statement labels, either a label would be generated greater than the largest allowable label (99999) or both the "FORMATB" and "LABELB" commands were enabled and a statement label of one type would fall within the range of label numbers of the other type. Label processing for the routine in which this occurs is abandoned. In versions of CLEAN77 designed to be run on CDC operating systems, the routine type and name appear on the next line of the dayfile. "LABEL-NUMBER TABLE OVERFLOW" There were more than 509 labels in a given routine. Label processing for the routine in which this occurs is abandoned. In versions of CLEAN77 designed to be run on CDC operating systems, the routine type and name appear on the next line of the dayfile. "NEXT CARD IMPLIES BAD BLOCK STRUCTURE" The next line to be printed is a card that shows that an unbalanced block structure exists (e.g., an "IF-THEN" statement followed by more than one "ENDIF" statement). "NO =END= CARD. =END= SUPPLIED." This warning appears in the listing when the last line of the source file is not an "END" card and something other than comments have appeared since the last "END" card. Processing continues as if an "END" card were appended to the source file. "POSSIBLY MISSPELLED COMMAND(S)." In versions of CLEAN77 designed to be run on CDC operating systems, when an "ABOVE COMMAND POSSIBLY MISSPELLED" message would be issued to the listing file, this message is issued to the dayfile. Appendix F Table of Commands and Default Values. COMMAND DEFAULT OPTION CONTROLLED ADDCONT Enabled Add CONTINUEs BRACKET Enabled Bracket comments COLLECT Disabled Collect FORMATs COPY Disabled Copy routine intact EXEMPTC Disabled Exempt comments EXEMPTF Disabled Exempt FORMATs EXEMPTN Disabled Exempt nonexecutable statements EXEMPTS Disabled Exempt statement character FORMATB Disabled FORMAT label base FORMATI Disabled FORMAT label increment HFIELD Enabled Hollerith field conversion INDENTC Disabled Indent comments INDENTI 3 Indentation increment INDENTS Enabled Indent statements ISSUEC Enabled Issue comments LABELB 10 Label base LABELI 10 Label increment LEFTJ Disabled Left justify labels LINELEN 72 Line length of input file LISTIDS Disabled List old card identifiers LISTNEW Enabled List new (result) file LISTOLD Enabled List old (input) file OLDLEN 72 Line length of input file PAGELEN 60 Page length of listing file PROP Disabled Propagate commands QUOTECH ' Quote conversion character RETPRFX * Return prefix character RIGHTJ Enabled Right justify labels SIZEDEF Disabled Specification statement size definitions SPLITNV Disabled Split numbers and variables SPLITST Disabled Split strings STATUS Enabled Command status report UNCOND Disabled Unconditional program flow change VARUECS Disabled Variables may use an extended char set