1 ^PY^- ^IGE^- ^IS010^G^- ^SM10^- ^IOP^- ^IS532^G^- ^SM532^- ^IL0650^- ^ISTF00^- ^IC0000^- ^IMV0100011000^-^IMH0110008500^- ^IS532^G^- ^SM532^- ^IL0650^- ^-^PN^- 1 USER'S GUIDE TO STARPAC THE STANDARDS TIME SERIES AND REGRESSION PACKAGE ------------------------------------------------ (revised 10/5/87) Janet R. Donaldson Peter V. Tryon U.S. Department of Commerce Center for Applied Mathematics National Bureau of Standards Boulder, Colorado 80303 1 In memory of Peter V. Tryon 1941 - 1982 1 Disclaimer No warranties, express or implied, are made by the distributors or developers that STARPAC or its constituent parts are free of error. They should not be relied upon as the sole basis for solving a problem whose incorrect solution could result in injury to person or property. If the programs are employed in such a manner, it is at the user's own risk and the distributors and developers disclaim all liability for such misuse. Computers have been identified in this paper in order to adequately specify the sample programs and test results. Such identification does not imply recommendation or endorsement by the National Bureau of Standards, nor does it imply that the equipment identified is necessarily the best available for the purpose. 1 Preface STARPAC, the Standards Time Series and Regression Package, is a library of Fortran subroutines for statistical data analysis developed by the Statistical Engineering Division (SED) of the National Bureau of Standards (NBS), Boulder, Colorado. Earlier versions of this library were distributed by the SED under the name STATLIB [Tryon and Donaldson, 1978]. Chapter 1 and chapter 9 of this document were previously distributed as NBS Technical Notes 1068-1 and 1068-2, respectively [Donaldson and Tryon, 1983a and 1983b]. The whole of this document was also previously released as NBSIR 86-3448. STARPAC incorporates many changes to STATLIB, including additional statistical techniques, improved algorithms and enhanced portability. STARPAC consists of families of subroutines for nonlinear least squares regression, time series analysis (in both time and frequency domains), line printer graphics, basic statistical analysis, and linear least squares regression. These subroutines feature: * ease of use, alone and with other Fortran subroutine libraries; * extensive error handling facilities; * comprehensive printed reports; * no problem size restrictions other than effective machine size; and * portability. Notation, format and naming conventions are constant throughout the STARPAC documentation, allowing the documentation for each family of subroutines to be used alone or in conjunction with the documentation for another. The code for STARPAC was developed at the U.S. Department of Commerce Boulder Laboratories on a CDC CYBER 180/840 under NOS 2.3. All examples presented in this documentation were executed in this environment using the FTN 5.1 compiler with rounding. STARPAC is written in ANSI Fortran 77 [American National Standards Institute, 1977]. Workspace and machine-dependent constants are supplied using subroutines based on the Bell Laboratories "Framework for a Portable Library" [Fox et al., 1978a]. We have also used subroutines from LINPACK [Dongarra et al., 1979], from the "Basic Linear Algebra Subprograms for Fortran Usage" [Lawson et al., 1979], from DATAPAC [Filliben, 1977] and from the portable special function subroutines of Fullerton [1977]. The analyses generated by several of the subroutine families have been adapted from OMNITAB II [Hogben et al., 1971]; users are directed to Peavy et al. [1985] for information about OMNITAB 80, the current version of OMNITAB. Computer facilities for the STARPAC project have been provided in part by the National Oceanic and Atmospheric Administration Mountain Administrative Support Center Computer Division, Boulder, Colorado, and we gratefully acknowledge their support. The STARPAC subroutine library is the result of the programming efforts of Janet R. Donaldson and John E. Koontz, with assistance from Ginger A. Caldwell, Steven M. Keefer, and Linda L. Mitchell. Valuable contributions have also been made by each of the members of the Statistical Engineering Division in Boulder, and from many within the STARPAC 1user community. We are grateful for the many valuable comments that we have received on early drafts of the STARPAC documentation; we wish especially to thank Paul T. Boggs, Ginger A. Caldwell, Sally E. Howe, John E. Koontz, James T. Ringland, Ralph J. Slutz, and Dominic F. Vecchia. Finally, we wish to thank Lorna Buhse for excellent manuscript support. Janet R. Donaldson Peter V. Tryon (deceased) September 1987 1 Contents Preface 1. INTRODUCTION TO USING STARPAC A. Overview of STARPAC and Its Contents B. Documentation Conventions C. A Sample Program D. Using STARPAC D.1 The PROGRAM Statement D.2 The Dimension Statements D.3 The CALL Statements D.4 STARPAC Output D.5 STARPAC Error Handling D.6 Common Programming Errors When Using STARPAC 2. LINE PRINTER GRAPHICS A. Introduction B. Subroutine Descriptions C. Subroutine Declaration and CALL Statements D. Dictionary of Subroutine Arguments and COMMON Variables E. Computational Details F. Examples 3. NORMAL RANDOM NUMBER GENERATION A. Introduction B. Subroutine Descriptions C. Subroutine Declaration and CALL Statements D. Dictionary of Subroutine Arguments E. Computational Methods F. Example G. Acknowledgments 4. HISTOGRAMS A. Introduction B. Subroutine Descriptions C. Subroutine Declaration and CALL Statements D. Dictionary of Subroutine Arguments E. Computational Methods E.1 Algorithms E.2 Computed Results and Printed Output F. Example G. Acknowledgments 5. STATISTICAL ANALYSIS OF A UNIVARIATE SAMPLE A. Introduction B. Subroutine Descriptions C. Subroutine Declaration and CALL Statements D. Dictionary of Subroutine Arguments and COMMON Variables E. Computational Methods E.1 Algorithms E.2 Computed Results and Printed Output F. Example G. Acknowledgments 16. ONE-WAY ANALYSIS OF VARIANCE A. Introduction B. Subroutine Descriptions C. Subroutine Declaration and CALL Statements D. Dictionary of Subroutine Arguments and COMMON Variables E. Computational Methods E.1 Algorithms E.2 Computed Results and Printed Output F. Example G. Acknowledgments 7. CORRELATION ANALYSIS A. Introduction B. Subroutine Descriptions C. Subroutine Declaration and CALL Statements D. Dictionary of Subroutine Arguments and COMMON Variables E. Computational Methods E.1 Algorithms E.2 Computed Results and Print F. Example G. Acknowledgments 8. LINEAR LEAST SQUARES A. Introduction B. Subroutine Descriptions C. Subroutine Declaration and CALL Statements D. Dictionary of Subroutine Arguments and COMMON Variables E. Computational Methods E.1 The Linear Least Squares Algorithm E.2 Computed Results and Printed Output F. Examples G. Acknowledgments 9. NONLINEAR LEAST SQUARES A. Introduction B. Subroutine Descriptions B.1 Nonlinear Least Squares Estimation Subroutines B.2 Derivative Step Size Selection Subroutines B.3 Derivative Checking Subroutines C. Subroutine Declaration and CALL Statements D. Dictionary of Subroutine Arguments and COMMON Variables E. Computational Methods E.1 Algorithms E.1.a Nonlinear Least Squares Estimation E.1.b Derivative Step Size Selection E.1.c Derivative Checking E.2 Computed Results and Printed Output E.2.a The Nonlinear Least Squares Estimation Subroutines E.2.b The Derivative Step Size Selection Subroutines E.2.c The Derivative Checking Subroutines F. Examples G. Acknowledgments 110. DIGITAL FILTERING A. Introduction B. Subroutine Descriptions B.1 Symmetric Linear Filter Subroutines B.2 Autoregressive or Difference Linear Filter Subroutines B.3 Gain and Phase Function Subroutines C. Subroutine Declaration and CALL Statements D. Dictionary of Subroutine Arguments and COMMON Variables E. Computational Methods E.1 Algorithms E.2 Computed Results and Printed Output F. Examples G. Acknowledgments 11. COMPLEX DEMODULATION A. Introduction B. Subroutine Descriptions C. Subroutine Declaration and CALL Statements D. Dictionary of Subroutine Arguments and COMMON Variables E. Computational Methods E.1 Algorithms E.2 Computed Results and Printed Output F. Example G. Acknowledgments 12. CORRELATION AND SPECTRUM ANALYSIS A. Introduction B. Subroutine Descriptions B.1 Correlation Analysis B.1.a Univariate Series B.1.b Bivariate Series B.2 Spectrum Estimation B.2.a Univariate Series B.2.b Bivariate Series C. Subroutine Declaration and CALL Statements D. Dictionary of Subroutine Arguments and COMMON Variables E. Computational Methods E.1 Algorithms E.1.a Correlation Analysis E.1.b Spectrum Analysis E.2 Computed Results and Printed Output E.2.a Correlation Analysis E.2.b Spectrum Analysis F. Examples G. Acknowledgments 113. ARIMA MODELING A. Introduction B. Subroutine Descriptions B.1 ARIMA Estimation Subroutines B.2 ARIMA Forecasting Subroutines C. Subroutine Declaration and CALL Statements D. Dictionary of Subroutine Arguments and COMMON Variables E. Computational Methods E.1 Algorithms E.1.a ARIMA Estimation E.1.b ARIMA Forecasting E.2 Computed Results and Printed Output E.2.a The ARIMA Estimation Subroutines E.2.b The ARIMA Forecasting Subroutines F. Examples G. Acknowledgments Appendix A. CONTINUITY OF VERTICAL PLOTS ON THE CDC CYBER 840 AND 855 Appendix B. WEIGHTED LEAST SQUARES Appendix C. ESTIMATING THE NUMBER OF RELIABLE DIGITS IN THE RESULTS OF A FUNCTION Appendix D. LIST OF STARPAC SUBPROGRAM NAMES D.1 Subprograms specifically written for STARPAC D.2 Subprograms from NL2SOL D.3 Subprograms from miscellaneous public domain sources D.4 Subprograms from LINPACK and BLAS D.5 Subprograms specifying machine dependent constants Appendix E. LIST OF STARPAC LABELED COMMON NAMES References. 1 STARPAC The Standards Time Series and Regression Package Janet R. Donaldson and Peter V. Tryon National Bureau of Standards Washington, DC 20234 STARPAC, the Standards Time Series and Regression Package, is a library of Fortran subroutines for statistical data analysis developed by the Statistical Engineering Division of the National Bureau of Standards, Boulder, Colorado. Earlier versions of this library were distributed by the SED under the name STATLIB [Tryon and Donaldson, 1978]. STARPAC incorporates many changes to STATLIB, including additional statistical techniques, improved algorithms and enhanced portability. STARPAC emphasizes the statistical interpretation of results, and, for this reason, comprehensive printed reports of auxiliary statistical information, often in graphical form, are automatically provided to augment the basic statistical computations performed by each user-callable STARPAC subroutine. STARPAC thus provides the best features of many stand-alone statistical software programs within the flexible environment of a subroutine library. Key words: data analysis; nonlinear least squares; STARPAC; statistical computing; statistical subroutine library; statistics; STATLIB; time series analysis. 1 ^PY^- ^IGE^- ^IS010^G^- ^SM10^- ^IOP^- ^IS532^G^- ^SM532^- ^IL0650^- ^ISTF00^- ^IC0000^- ^IMV0100011000^-^IMH0110008500^- ^IS532^G^- ^SM532^- ^IL0650^- ^-^PN^- 1 CHAPTER 1 INTRODUCTION TO USING STARPAC A. Overview of STARPAC and Its Contents STARPAC is a portable library of approximately 150 user-callable ANSI 77 Fortran subroutines for statistical data analysis. Designed primarily for time series analysis and for nonlinear least squares regression, STARPAC also includes subroutines for normal random number generation, line printer plots, basic statistical analyses and linear least squares. Emphasis has been placed on facilitating the interpretation of statistical analyses, and, for this reason, comprehensive printed reports of auxiliary statistical information, often in graphical form, are automatically provided to augment the basic statistical computations performed by each user-callable STARPAC subroutine. STARPAC thus provides the best features of many stand-alone statistical software programs within the flexible environment of a subroutine library. STARPAC is designed to be easy to use; in many situations, only a few lines of elementary Fortran code are required for the users' main programs. A fundamental STARPAC philosophy is to provide two or more user-callable subroutines for each method of analysis: one which minimizes the complexity of the CALL statement, automatically producing a comprehensive printed report of the results; and one or more others which provide user control of the computations, allow suppression of all or part of the printed reports, and/or provide storage of computed results for further analyses. STARPAC was developed and is maintained by the Center for Applied Mathematics of the National Bureau of Standards, Boulder, Colorado. Users' comments and suggestions, which have had significant impact already, are highly valued and always welcomed. Comments and suggestions should be directed to: Janet R. Donaldson National Bureau of Standards Mail Code 713 325 Broadway Boulder, CO 80303-3328. B. Documentation Conventions The documentation for the various STARPAC subroutine families uses a standard format description of the information needed to use a STARPAC subroutine, including one or more examples. References to chapter sections within the STARPAC documentation refer to the identified section within the current chapter unless explicitly stated otherwise. Figures are identified by the section in which they occur. For example, figure B-1 refers to the first figure in section B of this chapter (chapter 1). Names of INTEGER and REAL STARPAC subroutine arguments are consistent with the implicit Fortran convention for specifying variable type. That is, variable names beginning with I through N are type INTEGER while all others are type REAL unless otherwise explicitly typed DOUBLE PRECISION or COMPLEX. All dimensioned variables are explicitly declared in STARPAC documentation by means of INTEGER, REAL, DOUBLE PRECISION, or COMPLEX statements, as <1-1> 1appropriate. The convention used to specify the dimension statements is discussed below in section D.2. The precision of the STARPAC library is indicated in the printed reports generated by STARPAC: an S following the STARPAC version number in the output heading indicates the single precision version is being used, while a D indicates the double precision version. The STARPAC documentation is designed for use with both single and double precision versions. Subroutine arguments which are double precision in both versions are declared DOUBLE PRECISION; similarly, arguments which are single precision in both versions are declared REAL. Arguments whose precision is dependent upon whether the single or double precision version of STARPAC is being used are declared . If the double precision version of the STARPAC library is being used, then the user should substitute DOUBLE PRECISION for ; if the single precision version is being used, then the user should substitute REAL for . Other precision-dependent features are discussed as they occur. C. A Sample Program The sample program shown below illustrates the use of STARPAC subroutines. This example was executed on the CDC CYBER 180/840 at the U.S. Department of Commerce Boulder Laboratories under the NOS 2.3 operating system using a single precision version of the STARPAC library. The code shown is portable ANSI 77 Fortran. Section D below uses this example to discuss Fortran programming as it relates to STARPAC. The data used in this example are 84 relative humidity measurements taken at Pikes Peak, Colorado. STARPAC subroutine PP, documented in chapter 2, plots the data versus time-order and STARPAC subroutine STAT, documented in chapter 5, prints a comprehensive statistical analysis of the data. <1-2> 1Program: PROGRAM EXAMPL C C DEMONSTRATE STAT AND PP USING SINGLE PRECISION VERSION OF STARPAC C RUN ON CYBER 180/840. C C N.B. DECLARATION OF Y AND X MUST BE CHANGED TO DOUBLE PRECISION C IF DOUBLE PRECISION VERSION OF STARPAC IS USED. C REAL Y(100), X(100) DOUBLE PRECISION DSTAK(100) C COMMON /CSTAK/ DSTAK COMMON /ERRCHK/ IERR C C SET UP INPUT AND OUTPUT FILES C [CHAPTER 1, SECTION D.4, DESCRIBES HOW TO CHANGE OUTPUT UNIT.] C CALL IPRINT(IPRT) OPEN (UNIT=IPRT, FILE='FILENM') OPEN (UNIT=5, FILE='DATA') C C DEFINE LDSTAK, THE LENGTH OF DSTAK C LDSTAK = 100 C C READ NUMBER OF OBSERVATIONS INTO N AND C DATA INTO VECTOR Y C READ (5,100) N READ (5,101) (Y(I), I=1,N) C C CREATE A VECTOR OF ORDER INDICES IN X C DO 10 I=1,N X(I) = I 10 CONTINUE C C PRINT TITLE, PLOT OF DATA AND ERROR INDICATOR C WRITE (IPRT,102) CALL PP (Y, X, N) WRITE (IPRT,103) IERR C C PRINT TITLE, STATISTICAL ANALYSIS OF DATA AND ERROR INDICATOR C WRITE (IPRT,102) CALL STAT (Y, N, LDSTAK) WRITE (IPRT,103) IERR C C FORMAT STATEMENTS C 100 FORMAT (I5) 101 FORMAT (11F7.4) 102 FORMAT ('1DAVIS-HARRISON PIKES PEAK RELATIVE HUMIDITY DATA') 103 FORMAT (' IERR = ', I1) END <1-3> 1Data: 84 0.6067 0.6087 0.6086 0.6134 0.6108 0.6138 0.6125 0.6122 0.6110 0.6104 0.7213 0.7078 0.7021 0.7004 0.6981 0.7242 0.7268 0.7418 0.7407 0.7199 0.6225 0.6254 0.6252 0.6267 0.6218 0.6178 0.6216 0.6192 0.6191 0.6250 0.6188 0.6233 0.6225 0.6204 0.6207 0.6168 0.6141 0.6291 0.6231 0.6222 0.6252 0.6308 0.6376 0.6330 0.6303 0.6301 0.6390 0.6423 0.6300 0.6260 0.6292 0.6298 0.6290 0.6262 0.5952 0.5951 0.6314 0.6440 0.6439 0.6326 0.6392 0.6417 0.6412 0.6530 0.6411 0.6355 0.6344 0.6623 0.6276 0.6307 0.6354 0.6197 0.6153 0.6340 0.6338 0.6284 0.6162 0.6252 0.6349 0.6344 0.6361 0.6373 0.6337 0.6383 <1-4> 1 ^PY^- ^IGE^- ^IOL^- ^IS010^G^- ^SM10^- ^IL0810^- ^ISTF00^- ^IC0000^- ^IMV0055008500^-^IMH0075011000^- ^IS010^G^- ^SM10^- ^-^PN^- 1DAVIS-HARRISON PIKES PEAK RELATIVE HUMIDITY DATA STARPAC 2.07S (11/07/87) -I---------I---------I---------I---------I---------I---------I---------I---------I---------I---------I- .7418 - + + - I I I I I I I I .7271 - + - I + I I + + I I I I I .7125 - - I I I + I I I I + + I .6978 - + - I I I I I I I I .6831 - - I I I I I I I I .6685 - - I I I + I I I I I .6538 - + - I I I I I ++ I I + + + + I .6391 - + + + - I + + + + + + I I + + + ++ + + I I + + ++ + ++ + + + I I + + + + + I .6244 - + + + + + + + - I + + + +++ + I I + ++ + + I I + + + I I + +++ + I .6098 - ++ + ++ - I + I I I I I I I .5951 - ++ - -I---------I---------I---------I---------I---------I---------I---------I---------I---------I---------I- 1.0000 9.3000 17.6000 25.9000 34.2000 42.5000 50.8000 59.1000 67.4000 75.7000 84.0000 IERR = 0 <1-5> 1DAVIS-HARRISON PIKES PEAK RELATIVE HUMIDITY DATA STARPAC 2.07S (11/07/87) +STATISTICAL ANALYSIS N = 84 FREQUENCY DISTRIBUTION (1-6) 5 25 35 8 1 0 0 4 4 2 MEASURES OF LOCATION (2-2) MEASURES OF DISPERSION (2-6) UNWEIGHTED MEAN = 6.3734048E-01 WTD STANDARD DEVIATION = 3.2405213E-02 WEIGHTED MEAN = 6.3734048E-01 WEIGHTED S.D. OF MEAN = 3.5356987E-03 MEDIAN = 6.2915000E-01 RANGE = 1.4670000E-01 MID-RANGE = 6.6845000E-01 MEAN DEVIATION = 2.1076417E-02 25 PCT UNWTD TRIMMED MEAN= 6.2885952E-01 VARIANCE = 1.0500979E-03 25 PCT WTD TRIMMED MEAN = 6.2885952E-01 COEF. OF. VAR. (PERCENT) = 5.0844430E+00 A TWO-SIDED 95 PCT CONFIDENCE INTERVAL FOR MEAN IS 6.3030811E-01 TO 6.4437284E-01 (2-2) A TWO-SIDED 95 PCT CONFIDENCE INTERVAL FOR S.D. IS 2.8137110E-02 TO 3.8211736E-02 (2-7) LINEAR TREND STATISTICS (5-1) OTHER STATISTICS SLOPE = -2.4736661E-04 MINIMUM = 5.9510000E-01 S.D. OF SLOPE = 1.4414086E-04 MAXIMUM = 7.4180000E-01 SLOPE/S.D. OF SLOPE = T = -1.7161450E+00 BETA ONE = 3.7288258E+00 PROB EXCEEDING ABS VALUE OF OBS T = .090 BETA TWO = 5.9283926E+00 WTD SUM OF VALUES = 5.3536600E+01 WTD SUM OF SQUARES = 3.4208200E+01 TESTS FOR NON-RANDOMNESS WTD SUM OF DEV SQUARED = 8.7158122E-02 STUDENTS T = 1.8025871E+02 NO. OF RUNS UP AND DOWN = 47 WTD SUM ABSOLUTE VALUES = 5.3536600E+01 EXPECTED NO. OF RUNS = 55.7 WTD AVE ABSOLUTE VALUES = 6.3734048E-01 S.D. OF NO. OF RUNS = 3.82 MEAN SQ SUCCESSIVE DIFF = 3.6382337E-04 MEAN SQ SUCC DIFF/VAR = .346 DEVIATIONS FROM WTD MEAN NO. OF + SIGNS = 22 NO. OF - SIGNS = 62 NO. OF RUNS = 14 EXPECTED NO. OF RUNS= 33.5 S.D. OF RUNS = 3.51 DIFF./S.D. OF RUNS = -5.550 NOTE - ITEMS IN PARENTHESES REFER TO PAGE NUMBER IN NBS HANDBOOK 91 (NATRELLA, 1966) IERR = 0 <1-6> 1 ^PY^- ^IGE^- ^IS010^G^- ^SM10^- ^IOP^- ^IS532^G^- ^SM532^- ^IL0650^- ^ISTF00^- ^IC0000^- ^IMV0100011000^-^IMH0110008500^- ^IS532^G^- ^SM532^- ^IL0650^- ^-^PN^- 1D. Using STARPAC The following subsections provide general information needed when using STARPAC, including a discussion of Fortran programming as it relates to STARPAC usage. Although only elementary knowledge of Fortran is required to use STARPAC, users may still have to consult with a Fortran text and/or their Computing Center staff when questions arise. D.1 The PROGRAM Statement The PROGRAM statement is used to name the user's main program. The name EXAMPL is assigned to the main program in this example. The program name cannot be the name of any variable in the user's main program and, in addition, cannot be the name of any other subroutine or function called during execution of the user's code. Specifically, it cannot be the name of any subroutine within STARPAC. To ensure that the name of a STARPAC subroutine is not inadvertently chosen for the name of the main program, users should consult with the local installer of STARPAC to obtain a list of the STARPAC subroutine names. D.2 The Dimension Statements The user's program must include dimension statements to define the sizes and types of the vectors, matrices and three-dimensional arrays required by each STARPAC subroutine used; STARPAC itself has no inherent upper limit on problem size. Within the STARPAC documentation for the subroutine declaration and CALL statements, lowercase identifiers in the dimension statements represent integer constants which must equal or exceed the value of the identically- spelled uppercase argument. For example, if the documentation specifies the minimum dimension of a variable as XM(n,m), and if the number of observations N is 15, and the number of columns of data M is 3, then (assuming the single precision version of STARPAC is being used) the minimum array size is given by the dimension statement REAL XM(15,3). The exact dimensions assigned to some vectors and matrices must be supplied in the CALL statements to some STARPAC subroutines. For example, the argument IXM is defined as "the exact value of the first dimension of the matrix XM as declared in the calling program." Continuing the example from the preceding paragraph, if the statement REAL XM(20,5) is used to dimension the matrix XM for a particular subroutine, and IXM is an argument in the CALL statement, then IXM must have the value 20 regardless of the value assigned to the variable N. Many STARPAC subroutines require a work area for internal computations. This work area is provided by the DOUBLE PRECISION vector DSTAK. The rules for defining DSTAK are as follows. 1. Programs which call subroutines requiring the work vector DSTAK must include the statements DOUBLE PRECISION DSTAK (ldstak) COMMON /CSTAK/ DSTAK where ldstak indicates the integer constant used to dimension DSTAK. <1-7> 1 2. Since all STARPAC subroutines use the same work vector, the length of DSTAK must equal or exceed the longest length required by any of the individual STARPAC subroutines called by the user's program. 3. The length, LDSTAK, of the work vector DSTAK must be specified in the CALL statement of any STARPAC subroutine using DSTAK to enable STARPAC to verify that there will be sufficient work area for the problem. It is recommended that a variable LDSTAK be set to the length of DSTAK, and that this variable be used in each CALL statement requiring the length of DSTAK to be specified. Then, if a future modification to the user's program requires the length of DSTAK to be changed, the only alterations required in the existing code would be to the DOUBLE PRECISION dimension statement and to the statement which assigns the length of DSTAK to LDSTAK. STARPAC manages its work area using subroutines modeled after those in ACM Algorithm 528: Framework for a Portable Library [Fox et al. 1978a]. Although STARPAC and the Framework share the same COMMON for their work areas, there are differences between the STARPAC management subroutines and those of the Framework. In particular, the STARPAC management subroutines re- initialize DSTAK each time the user invokes a STARPAC subroutine requiring work area, destroying all data previously stored in DSTAK; the Framework only initializes DSTAK the first time any of its management subroutines are invoked, preserving work area allocations still in use. Thus, users must be cautious when utilizing STARPAC with other libraries which employ the Framework, such as PORT [Fox et al., 1978b]. The sample program shown in figure C-1 provides an example of the use of dimensioned variables with STARPAC. The REAL vector Y, used by both subroutines PP and STAT, contains the 84 relative humidity measurements; its minimum length, N (the number of observations), is 84. The REAL vector X used by subroutine PP contains the corresponding time order indices of the data; its minimum length is also 84. The DOUBLE PRECISION vector DSTAK contains the work area needed by STAT for intermediate computations; its minimum length, 49 in this case, is defined in section D of chapter 4. In this example, the dimensions of Y, X, and DSTAK, are each 100, exceeding the required minimum values. D.3 The CALL Statements The STARPAC CALL statement arguments provide the interface for specifying the data to be used, controlling the computations, and providing space for any returned results. The CALL statements used in the example (fig. C-1) are CALL PP(Y, X, N) and CALL STAT(Y, N, LDSTAK). Note that scalar arguments may be specified either by a variable preset to the desired value, as was done in the example, or by the actual numerical values. For example, CALL PP(Y, X, 84) and CALL STAT(Y, 84, 100) could have been used instead of the forms shown. We recommend using variables rather than the actual numerical values in order to simplify future changes in the program. When variables are used, changes need to be made in only one place; numerical values have to be changed every place they occur. The use of variables can also clarify the meaning of the program. <1-8> 1D.4 STARPAC Output Most STARPAC subroutines produce extensive printed reports, freeing the user from formatting and printing all statistics of interest. The standard output device is used for these reports. The user has the options of titling the reports and changing the output device. The first page of the report from each STARPAC subroutine does not start on a new page. This allows the user to supply titles. For example, WRITE (6, 100) 100 FORMAT ('1DAVIS-HARRISON PIKES PEAK RELATIVE HUMIDITY DATA') CALL PP (Y, X, N) will print the title DAVIS-HARRISON PIKES PEAK RELATIVE HUMIDITY DATA on the top line of a new page, immediately preceding the output produced by the call to subroutine PP. Users should note that titles more than one line in length can cause a printed report designed for one page to extend beyond the bottom of the page. The unit number, IPRT, of the output device used by STARPAC is returned by STARPAC subroutine IPRINT. Users can change the output device unit number by including with their program a subroutine IPRINT which will supersede the STARPAC subroutine of the same name. The subroutine must have the form SUBROUTINE IPRINT(IPRT) IPRT = u RETURN END where u is an integer value specifying the output unit to which all STARPAC output will be written. D.5 STARPAC Error Handling STARPAC provides extensive error-checking facilities which include both printed reports and a program-accessible error flag variable. There are essentially two types of errors STARPAC can detect. The first type of error involves incorrect problem specification, i.e., one or more of the input arguments in the subroutine statement has an improper value. For example, the number of observations, N, might have an obviously meaningless non-positive value. In the case of improper problem specification STARPAC generates a printed report identifying the subroutine involved, the error detected, and the proper form of the subroutine CALL statement. The latter is provided because improper input is often the result of an incorrectly specified subroutine argument list. A second type of error can be thought of as a computation error: either the initiated calculation cannot be completed or the results from the called subroutine are questionable. For example, when the least squares model and data are found to be singular, the desired computations cannot be completed; when one or more of the standardized residuals from a least squares fit cannot be computed because the standard deviation of the residual is zero, the results of the error estimates from the least squares regression may be questionable. If a computation error is detected, STARPAC generates a report which identifies the error, and, to aid the user in determining the cause of <1-9> 1the error, summarizes the completed results in a printed report. STARPAC error reports cannot be suppressed, even when the normal output from the STARPAC subroutine has been suppressed. (STARPAC output must be directed to a separate output device [see section D.4] when users do not want any STARPAC reports displayed under any conditions.) Because of this, users seldom have to consciously handle STARPAC error conditions in their code. When proper execution of the user's program depends on knowing whether or not an error has been detected, the error flag can be examined from within the user's code. When access to the error flag is desired, the statement COMMON /ERRCHK/ IERR must be placed with the Fortran declaration statements in the user's program. Following the execution of a STARPAC subroutine, the variable IERR will be set to zero if no errors were detected, and to a nonzero value otherwise; the value of IERR may indicate the type of error [e.g., see chapter 9, section D, argument IERR]. If the CALL statement is followed with a statement of the form IF (IERR .NE. 0) STOP then the program will stop when an error is detected. (In figure C-1, the value of IERR is printed following each CALL statement to show the value returned.) D.6 Common Programming Errors When Using STARPAC STARPAC error-checking procedures catch many programming errors and print informative diagnostics when such errors are detected. However, there are some errors which STARPAC cannot detect. The more common of these are discussed below. 1. The most common error involves array dimensions which are too small. Although certain arguments are checked by STARPAC to verify that array dimensions are adequate, if incorrect information is supplied to STARPAC, or if the dimension of an array which is not checked is too small, the program will produce erroneous results and/or will stop prematurely. Users should check the dimension statements in their program whenever difficulties are encountered in using STARPAC. 2. The second most common error involves incorrect CALL statements, that is, CALL statements in which the STARPAC subroutine name is misspelled, the arguments are incorrectly ordered, one or more arguments are omitted, or the argument types (INTEGER, REAL, DOUBLE PRECISION, and COMPLEX) are incorrect. Users having problems using STARPAC should carefully check their declaration and CALL statements to verify that they agree with the documentation. 3. The third most common error involves incorrect specification of the work vector DSTAK. Programs which call STARPAC subroutines requiring work area must include both the DOUBLE PRECISION statement dimension DSTAK and the COMMON /CSTAK/ DSTAK statement. 4. The final common error involves user-supplied subroutines which have the same name as a subroutine in the STARPAC library. Users should <1-10> 1 consult with the local installer of STARPAC to obtain a list of all STARPAC subroutine names. This list can then be used to ensure that a STARPAC subroutine name has not been duplicated. Users who have not found the cause of a problem after checking the possibilities mentioned above should consult with their Computing Center advisers. <1-11> 1 ^PY^- ^IGE^- ^IS010^G^- ^SM10^- ^IOP^- ^IS532^G^- ^SM532^- ^IL0650^- ^ISTF00^- ^IC0000^- ^IMV0100011000^-^IMH0110008500^- ^IS532^G^- ^SM532^- ^IL0650^- ^-^PN^- 1 CHAPTER 2 LINE PRINTER GRAPHICS A. Introduction STARPAC contains 36 subroutines for producing 2 basic styles of line printer plots. The first, called a page plot, uses a single 11 x 14 inch page of line printer paper. The second, called a vertical plot, is designed for plotting time series. The user specifies only the y-axis values since the x-axis values (independent variable) are assumed to be equally spaced and ordered consecutively. The independent variable in the resulting plot is oriented vertically and the plot will continue over as many pages as necessary to plot one point per line. Within these two basic styles the user has many options, including controlling the plot symbol, plotting multivariate values, designating missing data, using log scales and specifying plot limits and plot size. Users are directed to section B for a brief description of the subroutines. The actual declaration and CALL statements are given in section C and the subroutine arguments are defined in section D. The algorithms used and output produced by these subroutines are discussed in section E. Sample programs are shown in section F. B. Subroutine Descriptions PP (Page Plot) and VP (Vertical Plot) are the simplest of the STARPAC line printer plot subroutines. For both, plot limits are automatically set by the range of the data and other control parameters are set to the default values given in section D. The remaining plotting subroutines are named by adding letters to the beginning and/or end of PP and VP. * Prefix S (e.g., SPP) indicates the user controls the plot symbol for each point. * Prefix M indicates the subroutine will accept multivariate y-axis values (e.g., MPP). * Suffix M subroutines allow data with missing observations to be plotted (e.g., VPM). * Suffix L indicates log scales can be used (e.g., PPL). * Suffix C subroutines allow control of the parameters which specify the plot limits, size, scale, etc. (e.g., VPC). The following table, which indicates the capabilities of each of the STARPAC plotting subroutines, can be used to select from the available subroutines. Subroutine declaration and CALL statements are given in section C, listed in the same order as in the table. <2-1> 1 STARPAC Plotting Subroutines Plot Symbol Multiple Page Vertical Missing Log Control Control Y-Axis Plot Plot Data Scale Parameters Name (S) (M) (PP) (VP) (M) (L) (C) PP * PPL * * PPC * * * PPM * * PPML * * * PPMC * * * * SPP * * SPPL * * * SPPC * * * * SPPM * * * SPPML * * * * SPPMC * * * * * MPP * * MPPL * * * MPPC * * * * MPPM * * * MPPML * * * * MPPMC * * * * * VP * VPL * * VPC * * * VPM * * VPML * * * VPMC * * * * SVP * * SVPL * * * SVPC * * * * SVPM * * * SVPML * * * * SVPMC * * * * * MVP * * MVPL * * * MVPC * * * * MVPM * * * MVPML * * * * MVPMC * * * * * <2-2> 1C. Subroutine Declaration and CALL Statements NOTE: Argument definitions and sample programs are given in section D and section F, respectively. The conventions used to present the following declaration and CALL statments are given in chapter 1, sections B and D. Page Plots PP: Print Y versus X scatterplot; linear axes; default control values and axis limits; no missing values allowed Y(n), X(n) : : CALL PP (Y, X, N) === PPL: Print Y versus X scatterplot; log or linear axes; default control values and axis limits; no missing values allowed Y(n), X(n) : : CALL PPL (Y, X, N, ILOG) === PPC: Print Y versus X scatterplot; log or linear axes; user-supplied control values and axis limits; no missing values allowed Y(n), X(n), YLB, YUB, XLB, XUB : : CALL PPC (Y, X, N, ILOG, + ISIZE, NOUT, YLB, YUB, XLB, XUB) === PPM: Print Y versus X scatterplot; linear axes; default control values and axis limits; missing values allowed Y(n), YMISS, X(n), XMISS : : CALL PPM (Y, YMISS, X, XMISS, N) === <2-3> 1PPML: Print Y versus X scatterplot; log or linear axes; default control values and axis limits; missing values allowed Y(n), YMISS, X(n), XMISS : : CALL PPML (Y, YMISS, X, XMISS, N, ILOG) === PPMC: Print Y versus X scatterplot; log or linear axes; user-supplied control values and axis limits; missing values allowed Y(n), YMISS, X(n), XMISS, YLB, YUB, XLB, XUB : : CALL PPMC (Y, YMISS, X, XMISS, N, ILOG, + ISIZE, NOUT, YLB, YUB, XLB, XUB) === SPP: Print Y versus X scatterplot with individual plot symbols specified by user; linear axes; default control values and axis limits; no missing values allowed Y(n), X(n) INTEGER ISYM(n) : : CALL SPP (Y, X, N, ISYM) === SPPL: Print Y versus X scatterplot with individual plot symbols specified by user; log or linear axes; default control values and axis limits; no missing values allowed Y(n), X(n) INTEGER ISYM(n) : : CALL SPPL (Y, X, N, ISYM, ILOG) === SPPC: Print Y versus X scatterplot with individual plot symbols specified by user; log or linear axes; user-supplied control values and axis limits; no missing values allowed Y(n), X(n), YLB, YUB, XLB, XUB INTEGER ISYM(n) : : CALL SPPC (Y, X, N, ISYM, ILOG, + ISIZE, NOUT, YLB, YUB, XLB, XUB) === <2-4> 1 SPPM: Print Y versus X scatterplot with individual plot symbols specified by user; linear axes; default control values and axis limits; missing values allowed Y(n), YMISS, X(n), XMISS INTEGER ISYM(n) : : CALL SPPM (Y, YMISS, X, XMISS, N, ISYM) === SPPML: Print Y versus X scatterplot with individual plot symbols specified by user; log or linear axis; default control values and axis limits; missing values allowed Y(n), YMISS, X(n), XMISS INTEGER ISYM(n) : : CALL SPPML (Y, YMISS, X, XMISS, N, ISYM, ILOG) === SPPMC: Print Y versus X scatterplot with individual plot symbols specified by user; log or linear axes; user-supplied control values and axis limits; missing values allowed Y(n), YMISS, X(n), XMISS, YLB, YUB, XLB, XUB INTEGER ISYM(n) : : CALL SPPMC (Y, YMISS, X, XMISS, N, ISYM, ILOG, + ISIZE, NOUT, YLB, YUB, XLB, XUB) === MPP: Print plot of multiple Y vectors versus a common X vector; linear axes; default control values and axis limits; no missing values allowed YM(n,m), X(n) : : CALL MPP (YM, X, N, M, IYM) === <2-5> 1MPPL: Print plot of multiple Y vectors versus a common X vector; log or linear axes; default control values and axis limits; no missing values allowed YM(n,m), X(n) : : CALL MPPL (YM, X, N, M, IYM, ILOG) === MPPC: Print plot of multiple Y vectors versus a common X vector; log or linear axes; user-supplied control values and axis limits; no missing values allowed YM(n,m), X(n), YLB, YUB, XLB, XUB : : CALL MPPC (YM, X, N, M, IYM, ILOG, + ISIZE, NOUT, YLB, YUB, XLB, XUB) === MPPM: Print plot of multiple Y vectors versus a common X vector; linear axes; default control values and axis limits; missing values allowed YM(n,m), YMMISS(m), X(n), XMISS : : CALL MPPM (YM, YMMISS, X, XMISS, N, M, IYM) === MPPML: Print plot of multiple Y vectors versus a common X vector; log or linear axes; default control values and axis limits; missing values allowed YM(n,m), YMMISS(m), X(n), XMISS : : CALL MPPML (YM, YMMISS, X, XMISS, N, M, IYM, ILOG) === MPPMC: Print plot of multiple Y vectors versus a common X vector; log or linear axes; user-supplied control values and axis limits; missing values allowed YM(n,m), YMMISS(m), X(n), XMISS, YLB, YUB, XLB, XUB : : CALL MPPMC (YM, YMMISS, X, XMISS, N, M, IYM, ILOG, + ISIZE, NOUT, YLB, YUB, XLB, XUB) === <2-6> 1 Vertical Plots VP: Print vertical plot of Y versus input order; linear axes; default control values and axis limits; no missing values allowed Y(n) : : CALL VP (Y, N, NS) === VPL: Print vertical plot of Y versus input order; log or linear horizontal (Y) axis; default control values and axis limits; no missing values allowed Y(n) : : CALL VPL (Y, N, NS, ILOG) === VPC: Print vertical plot of Y versus input order; log or linear horizontal (Y) axis; user-supplied control values and axis limits; no missing values allowed Y(n), YLB, YUB, XLB, XINC : : CALL VPC (Y, N, NS, ILOG, + ISIZE, IRLIN, IBAR, YLB, YUB, XLB, XINC) === VPM: Print vertical plot of Y versus input order; linear axis; default control values and axis limits; missing values allowed Y(n), YMISS : : CALL VPM (Y, YMISS, N, NS) === VPML: Print vertical plot of Y versus input order; log or linear horizontal (Y) axis; default control values and axis limits; missing values allowed Y(n), YMISS : : CALL VPML (Y, YMISS, N, NS, ILOG) === <2-7> 1VPMC: Print vertical plot of Y versus input order; log or linear horizontal (Y) axis; user-supplied control values and axis limits; missing values allowed Y(n), YMISS, YLB, YUB, XLB, XINC : : CALL VPMC (Y, YMISS, N, NS, ILOG, + ISIZE, IRLIN, IBAR, YLB, YUB, XLB, XINC) === SVP: Print vertical plot of Y versus input order with individual plot symbols specified by user; linear axis; default control values and axis limits; no missing values allowed Y(n) INTEGER ISYM(n) : : CALL SVP (Y, N, NS, ISYM) === SVPL: Print vertical plot of Y versus input order with individual plot symbols specified by user; log or linear horizontal (Y) axis; default control values and axis limits; no missing values allowed Y(n) INTEGER ISYM(n) : : CALL SVPL (Y, N, NS, ISYM, ILOG) === SVPC: Print vertical plot of Y versus input order with individual plot symbols specified by user; log or linear horizontal (Y) axis; user-supplied control values and axis limits; no missing values allowed Y(n), YLB, YUB, XLB, XINC INTEGER ISYM(n) : : CALL SVPC (Y, N, NS, ISYM, ILOG, + ISIZE, IREFLN, IBAR, YLB, YUB, XLB, XINC) === <2-8> 1SVPM: Print vertical plot of Y versus input order with individual plot symbols specified by user; linear axis; default control values and axis limits; missing values allowed Y(n), YMISS INTEGER ISYM(n) : : CALL SVPM (Y, YMISS, N, NS, ISYM) === SVPML: Print vertical plot of Y versus input order with individual plot symbols specified by user; log or linear horizontal (Y) axis; default control values and axis limits; missing values allowed Y(n), YMISS INTEGER ISYM(n) : : CALL SVPML (Y, YMISS, N, NS, ISYM, ILOG) === SVPMC: Print vertical plot of Y versus input order with individual plot symbols specified by user; log or linear horizontal (Y) axis; user-supplied control values and axis limits; missing values allowed Y(n), YMISS, YLB, YUB, XLB, XINC INTEGER ISYM(n) : : CALL SVPMC (Y, YMISS, N, NS, ISYM, ILOG, + ISIZE, IRLIN, IBAR, YLB, YUB, XLB, XINC) === MVP: Print vertical plot of multiple Y vectors versus input order; linear axis; default control values and horizontal (Y) axis limits; no missing values allowed YM(n,m) : : CALL MVP (YM, N, M, IYM, NS) === <2-9> 1MVPL: Print vertical plot of multiple Y vectors versus input order; log or linear horizontal (Y) axis; default control values and axis limits; no missing values allowed YM(n,m) : : CALL MVPL (YM, N, M, IYM, NS, ILOG) === MVPC: Print vertical plot of multiple Y vectors versus input order; log or linear horizontal (Y) axis; user-supplied control values and axis limits; no missing values allowed YM(n,m), YLB, YUB, XLB, XINC : : CALL MVPC (YM, N, M, IYM, NS, ILOG, + ISIZE, YLB, YUB, XLB, XINC) === MVPM: Print vertical plot of multiple Y vectors versus input order; linear axis; default control values and axis limits; missing values allowed YM(n,m), YMMISS(m) : : CALL MVPM (YM, YMMISS, N, M, IYM, NS) === MVPML: Print vertical plot of multiple Y vectors versus input order; log or linear horizontal (Y) axis; default control values and axis limits; missing values allowed YM(n,m), YMMISS(m) : : CALL MVPML (YM, YMMISS, N, M, IYM, NS, ILOG) === MVPMC: Print vertical plot of multiple Y vectors versus input order; log or linear horizontal (Y) axis; user-supplied control values and axis limits; missing values allowed YM(n,m), YMMISS(m), YLB, YUB, XLB, XINC : : CALL MVPMC (YM, YMMISS, N, M, IYM, NS, ILOG, + ISIZE, YLB, YUB, XLB, XINC) === <2-10> 1D. Dictionary of Subroutine Arguments and COMMON Variables NOTE: --> indicates that the argument is input to the subroutine and that the input value is preserved; <-- indicates that the argument is returned by the subroutine; <-> indicates that the argument is input to the subroutine and that the input value is overwritten by the subroutine; --- indicates that the argument is input to some subroutines and is returned by others; *** indicates that the argument is a subroutine name; ... indicates that the variable is passed via COMMON. IBAR --> The indicator variable used to designate whether a vertical plot is to be a bar plot or not. Bar plots connect the plotted points to the reference line [see argument IRLIN] with a string of plot symbols, as is done for example in the correlation plots. [See chapter 12.] If IBAR >= 1, the plot is a bar plot. If IBAR <= 0, it is not. The default value is IBAR = 0. When IBAR is not an argument in the subroutine CALL statement the default value is used. IERR ... An error flag returned in COMMON /ERRCHK/. [See chapter 1, section D.5.] Note that using (or not using) the error flag will not affect the printed error messages that are automatically provided. IERR = 0 indicates that no errors were detected and that the plot was completed satisfactorily. IERR = 1 indicates that improper input was detected or that some error prevented the plot from being completed. ILOG --> The indicator variable used to designate whether the axes are to be on a log or linear scale. ILOG is a two-digit integer, pq, where the value of p is used to designate the scale of the x-axis and the value of q is used to designate the scale of the y-axis. If p = 0 (q = 0) the x-axis (y-axis) is on a linear scale; if p <> 0 (q <> 0) the x-axis (y-axis) is on a log scale. For vertical plots, the value of q is used to specify the scale on the horizontal-axis and the value of p is ignored. The default value is ILOG = 0, corresponding to linear scale for both the x-axis and the y-axis. When ILOG is not an argument in the subroutine CALL statement the default value is used. IRLIN --> The indicator variable used to designate whether zero or the series mean is to be plotted as a reference line on the vertical plots or whether no reference line should be used. If IRLIN <= -1, no reference line is plotted. If IRLIN = 0, a reference line is plotted showing the location of zero on the plot. If IRLIN >= 1, a reference line is plotted showing the series mean. The default value is IRLIN = -1. When IRLIN is not an argument in the subroutine CALL statement the default value is used. ISIZE --> The indicator variable used to designate the size of a page plot. ISIZE is a two-digit integer, pq, where the value of p is used to designate the size of the x-axis and the value of q is used to <2-11> 1 designate the size of the y-axis. If p = 0 (q = 0) the x-axis (y-axis) is the maximum possible, 101 columns (51 rows), i.e., 101 (51) plot positions. If p <> 0 (q <> 0) the x-axis (y-axis) is half the maximum, or 51 columns (26 rows). For vertical plots, the value of q is used to specify the size of the horizontal-axis and the value of p is ignored. The default value is ISIZE = 0, corresponding to a plot of 51 rows by 101 columns. When ISIZE is not an argument in the subroutine CALL statement the default value is used. ISYM --> The vector of dimension at least N that contains the values designating the plotting symbol to be used for each point. The plot symbols designated by each possible integer value are given below. ISYM()<= 1 -> + ISYM() = 9 -> E ISYM() =16 -> L ISYM() =23 -> S ISYM() = 2 -> . ISYM() =10 -> F ISYM() =17 -> M ISYM() =24 -> T ISYM() = 3 -> * ISYM() =11 -> G ISYM() =18 -> N ISYM() =25 -> U ISYM() = 4 -> - ISYM() =12 -> H ISYM() =19 -> 0 ISYM() =26 -> V ISYM() = 5 -> A ISYM() =13 -> I ISYM() =20 -> P ISYM() =27 -> W ISYM() = 6 -> B ISYM() =14 -> J ISYM() =21 -> Q ISYM() =28 -> Y ISYM() = 7 -> C ISYM() =15 -> K ISYM() =22 -> R ISYM()>=29 -> Z ISYM() = 8 -> D IYM --> The exact value of the first dimension of the matrix YM as specified in the calling program. M --> The number of columns of data in YM. N --> The number of observations. NOUT --> The number of points falling outside the plot limits that are to be listed following the plot. If NOUT >= 1, a message giving the total number of points falling outside the plot limits and a list of the coordinates of these points (up to a maximum of NOUT or 50, whichever is smaller) is printed. If NOUT = 0, only a message listing the number of points falling outside the limits is printed. If NOUT < 0, no points are listed and no message is given. The default value is NOUT = 0. When NOUT is not an argument in the subroutine CALL statement the default value is used. NS --> The sampling frequency of the points plotted on a vertical plot. If NS = 1, every point is plotted; if NS = 2, every second point is plotted; if NS = 3, every third point is plotted, etc. The default value is NS = 1. When NOUT <= 0 or NS is not an argument in the subroutine CALL statement the default value is used. X --> The vector of dimension at least N that contains the x-axis values. XINC --> The increment to be used for labeling the x-axis (i.e., the vertical-axis) on vertical plots. The x-axis labels are XLB, XLB + NS*XINC, XLB + 2*NS*XINC, etc. The default value is XINC = 1.0. When XINC is not an argument in the subroutine CALL statement the default value is used. <2-12> 1XLB --> The lower bound for the x-axis. For page plots: The default value is the smallest x-axis value within the range of the y-axis values to be plotted. If XLB >= XUB, the default value is used. For vertical plots: The default value is 1.0. For both page and vertical plots, when XLB is not an argument in the subroutine CALL statements the default value is used. (The plot limits may be adjusted slightly from the user-supplied values when the plotting subroutine uses a log scale.) XMISS --> The missing value code used within the vector X to indicate that a value is missing. The user must indicate missing observations by putting the missing value code in place of each missing observation. Missing data are not indicated on page plots in any way. XUB --> The upper bound for the x-axis. The default value is the largest x-axis value within the range of the y-axis values to be plotted. If XLB >= XUB or XUB is not an argument in the subroutine CALL statement the default value is used. (The plot limits may be adjusted slightly from the user-supplied value when the plotting subroutines use a log scale.) Y --> The vector of dimension at least N that contains the y-axis values. YLB --> The lower bound for the y-axis. The default value is the smallest y-axis value within the range of the x-axis values to be plotted. If YLB >= YUB or YLB is not an argument in the subroutine CALL statement the default value is used. (The plot limits may be adjusted slightly from the user-supplied value when the plotting subroutines use a log scale.) YM --> The matrix of dimension at least N by M whose columns each contain one of the M sets of N observations to be plotted against a common X vector. YMISS --> The missing value code used within the vector Y to indicate a value is missing. The user must indicate missing observations by putting the missing value code in place of each missing observation. Missing data are indicated on vertical plots by the word "MISSING" next to the right axis of the appropriate line. Missing data are not indicated on page plots in any way. YMMISS --> The vector of dimension at least M that contains the codes used within each of the M columns of YM to indicate a value is missing, where the first element of YMMISS is the missing value code for the first column of YM, etc. The user must indicate missing observations by putting the appropriate missing value code in place of each missing observation. Missing data are indicated on <2-13> 1 vertical plots by the word "MISSING" next to the right axis of the appropriate line. Missing data are not indicated on page plots in any way. YUB --> The upper bound for the y-axis. The default value is the largest y-axis value within the range of the x-axis values to be plotted. If YLB >= YUB or YUB is not an argument in the subroutine CALL statement the default value is used. (The plot limits may be adjusted slightly from the user-supplied value when the plotting subroutines use a log scale.) E. Computational Details Plotting Symbols. The plotting symbol used depends on the type of plot and whether or not more than one point falls on a given plot position. If two to nine points fall on a single plot position, the integer corresponding to the number of points is used as the plotting symbol. When 10 or more values fall on a single position the plotting symbol X is used. This is the only way that integers or X are used as plot symbols. Subroutines without an S or M prefix use the plotting symbol + to indicate one point on a single printer position. For subroutines with an S prefix the user-supplied vector ISYM of integer values is used to specify the plotting symbol for each data point. The Fourier spectrum plot shown in chapter 12 is an example of this option. Subroutines with an M prefix use a different letter as the plot symbol for each column of the matrix of the dependent variables (y-axis): A for the first, B for the second, ..., Z for columns 25 and beyond, with X still only used to indicate that 10 or more points fell on a single plot position. Continuity of Vertical Plots. Normally, a line printer will automatically provide margins at the top and bottom of each page, causing a break in the continuity of a vertical plot or any other output continuing over two or more pages. However, these automatic page-ejects can be suppressed by the user on many systems. Appendix A gives the control sequence necessary to suppress automatic page-ejects on a Cyber computer. Users of other systems should consult their Computer Center staff for any equivalent method available. F. Examples A sample program, its data and the resulting output for the STARPAC plotting routine MPP is listed at the end of this section. The program uses MPP to display the 12 years of monthly airline data listed on page 531 of Box and Jenkins [1976] versus month. The year is indicated by the plotting symbol (A = 1949, B = 1950, etc.). Other examples of STARPAC plots can be found in the output of many of the subroutines discussed elsewhere. The output from the complex demodulation subroutines includes a sample of the simple vertical plot style (VP) and of the vertical plot of multivariate data style (MVPC) (chapter 11); the output from the autocorrelation and cross correlation subroutines includes vertical plots using the bar plot option style (VPC) (chapter 12); and the output from <2-14> 1the Fourier spectrum subroutines (chapter 12) is produced using the symbol plot style (SPPC). Program: PROGRAM EXAMPL C C DEMONSTRATE MPP USING SINGLE PRECISION VERSION OF STARPAC C RUN ON CYBER 180/840. C C N.B. DECLARATION OF X AND YM MUST BE CHANGED TO DOUBLE PRECISION C IF DOUBLE PRECISION VERSION OF STARPAC IS USED. C REAL X(20), YM(20,20) C C SET UP INPUT AND OUTPUT FILES C [CHAPTER 1, SECTION D.4, DESCRIBES HOW TO CHANGE OUTPUT UNIT.] C CALL IPRINT(IPRT) OPEN (UNIT=IPRT, FILE='FILENM') OPEN (UNIT=5, FILE='DATA') C C SPECIFY NECESSARY DIMENSIONS C IYM = 20 C C READ NUMBER OF OBSERVATIONS AND NUMBER OF COLUMNS OF DATA C X-AXIS VALUES C Y-AXIS VALUES C READ (5,100) N, M READ (5,101) (X(I), I=1,N) READ (5,101) ((YM(I,J), I=1,N), J=1,M) C C PRINT TITLE AND CALL MPP FOR PLOT C WRITE (IPRT,102) CALL MPP (YM, X, N, M, IYM) C C FORMAT STATEMENTS C 100 FORMAT (2I5) 101 FORMAT (12F6.1) 102 FORMAT ('1RESULTS OF STARPAC PLOT SUBROUTINE MPP') END <2-15> 1Data: 12 12 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 112.0 118.0 132.0 129.0 121.0 135.0 148.0 148.0 136.0 119.0 104.0 118.0 115.0 126.0 141.0 135.0 125.0 149.0 170.0 170.0 158.0 133.0 114.0 140.0 145.0 150.0 178.0 163.0 172.0 178.0 199.0 199.0 184.0 162.0 146.0 166.0 171.0 180.0 193.0 181.0 183.0 218.0 230.0 242.0 209.0 191.0 172.0 194.0 196.0 196.0 236.0 235.0 229.0 243.0 264.0 272.0 237.0 211.0 180.0 201.0 204.0 188.0 235.0 227.0 234.0 264.0 302.0 293.0 259.0 229.0 203.0 229.0 242.0 233.0 267.0 269.0 270.0 315.0 364.0 347.0 312.0 274.0 237.0 278.0 284.0 277.0 317.0 313.0 318.0 374.0 413.0 405.0 355.0 306.0 271.0 306.0 315.0 301.0 356.0 348.0 355.0 422.0 465.0 467.0 404.0 347.0 305.0 336.0 340.0 318.0 362.0 348.0 363.0 435.0 491.0 505.0 404.0 359.0 310.0 337.0 360.0 342.0 406.0 396.0 420.0 472.0 548.0 559.0 463.0 407.0 362.0 405.0 417.0 391.0 419.0 461.0 472.0 535.0 622.0 606.0 508.0 461.0 390.0 432.0 <2-16> 1 ^PY^- ^IGE^- ^IOL^- ^IS010^G^- ^SM10^- ^IL0810^- ^ISTF00^- ^IC0000^- ^IMV0055008500^-^IMH0075011000^- ^IS010^G^- ^SM10^- ^-^PN^- 1RESULTS OF STARPAC PLOT SUBROUTINE MPP STARPAC 2.07S (11/07/87) -I---------I---------I---------I---------I---------I---------I---------I---------I---------I---------I- 622.0000 - L - I I I L I I I I I 570.2000 - - I K I I K I I L I I I 518.4000 - - I J L I I I I J I I L K I 466.6000 - I I K - I L L I I I I J L I I K I I 414.8000 - L L H - I K H 2 K K I I L K L I I I I H I 363.0000 - K J J G J K - I I 2 I H I I J K G I I I 2 I I J H H I 311.2000 - I H G G J - I I F H I H I I F I I H H G I I G G G E G H I 259.4000 - F E F - I I I G 2 E F E D E G I I G F E D F F I I D I 207.6000 - F D E F - I E E D C C 2 I I F D C D I I D C D C C 2 I I D C B B C C I 155.8000 - B - I C C B B A A C I I A B A A B B I I B A 2 I I 2 A A B A I 104.0000 - A - -I---------I---------I---------I---------I---------I---------I---------I---------I---------I---------I- 1.0000 2.1000 3.2000 4.3000 5.4000 6.5000 7.6000 8.7000 9.8000 10.9000 12.0000 <2-17> 1 ^PY^- ^IGE^- ^IS010^G^- ^SM10^- ^IOP^- ^IS532^G^- ^SM532^- ^IL0650^- ^ISTF00^- ^IC0000^- ^IMV0100011000^-^IMH0110008500^- ^IS532^G^- ^SM532^- ^IL0650^- ^-^PN^- 1 CHAPTER 3 NORMAL RANDOM NUMBER GENERATION A. Introduction STARPAC contains two subroutines for generating pseudo-random numbers (noise) which obey a normal probability law with mean mu and standard deviation sigma. Such random numbers are often useful for evaluating data analysis procedures or computer programs. Users are directed to section B for a brief description of the subroutines. The declaration and CALL statements are given in section C and the subroutine arguments are defined in section D. The algorithm used by these subroutines is discussed in section E. A sample program showing the use of these subroutines is given in section F. B. Subroutine Descriptions STARPAC subroutine NRAND generates a vector of standard (zero mean and unit standard deviation) normal (Gaussian) random numbers. There is no printed output from this subroutine. STARPAC subroutine NRANDC generates Gaussian noise with mean mu and standard deviation sigma using the transformation z = sigma*y + mu where y is a standard normal psuedo-random number; mu is the desired mean (see section D, argument YMEAN); and sigma is the desired standard deviation (see section D, argument SIGMA). There is no printed output from NRANDC. C. Subroutine Declaration and CALL Statements NOTE: Argument definitions and a sample program are given in section D and section F, respectively. The conventions used to present the following declaration and CALL statments are given in chapter 1, sections B and D. NRAND: Generate a vector of normal pseudo-random numbers with zero mean and unit standard deviation Y(n) : : CALL NRAND (Y, N, ISEED) === <3-1> 1NRANDC: Generate a vector of normal pseudo-random numbers with mean YMEAN and standard deviation SIGMA Y(n) : : CALL NRANDC (Y, N, ISEED, YMEAN, SIGMA) === D. Dictionary of Subroutine Arguments NOTE: --> indicates that the argument is input to the subroutine and that the input value is preserved; <-- indicates that the argument is returned by the subroutine; <-> indicates that the argument is input to the subroutine and that the input value is overwritten by the subroutine; --- indicates that the argument is input to some subroutines and is returned by others; *** indicates that the argument is a subroutine name; ... indicates that the variable is passed via COMMON. IERR ... An error flag returned in COMMON /ERRCHK/. [See chapter 1, section D.5.] Note that using (or not using) the error flag will not affect the printed error messages that are automatically provided. IERR = 0 indicates that no errors were detected. IERR = 1 indicates that improper input was detected. ISEED --> The basis for the pseudo-random number generation. For ISEED <> 0, use of the same value of ISEED will always generate the same data set. For ISEED = 0, a different data set will be produced by each call to NRAND or NRANDC in the user's program, although the numbers generated will not differ from run to run. N --> The number of random numbers to be generated. SIGMA --> The standard deviation of the generated random numbers. Y <-- The vector of dimension at least N that contains the generated normal pseudo-random numbers. YMEAN --> The mean of the generated random numbers. E. Computational Methods The normal pseudo-random number generation procedure is that of Marsaglia and Tsang [1984]. The same pseudo-random numbers (to within final round-off error) will be generated on any computer. The code was written by Boisvert <3-2> 1and Kahanar of the National Bureau of Standards Scientific Computing Division. F. Example The sample program shown below illustrates the use of both NRAND and NRANDC. NRAND is used to generate a standard normal pseudo-random sample of size 50 from a normal population with zero mean and unit standard deviation. NRANDC is then used to generate a sequence of normal pseudo-random numbers with a mean of 4 and standard deviation 0.5. The same seed is used for both NRAND and NRANDC. Therefore, the values generated by NRANDC are YMEAN plus SIGMA times the values generated by NRAND, i.e., YMEAN(I,2) = YMEAN + SIGMA*YMEAN(I,1) for I = 1, ..., N. The generated random numbers are displayed using STARPAC plot subroutine MVP. There is no output from NRAND and NRANDC. <3-3> 1Program: PROGRAM EXAMPL C C DEMONSTRATE NRAND AND NRANDC AND DISPLAY RESULTS WITH MVP USING C SINGLE PRECISION VERSION OF STARPAC RUN ON CYBER 180/840. C C N.B. DECLARATION OF YM MUST BE CHANGED TO DOUBLE PRECISION IF C DOUBLE PRECISION VERSION OF STARPAC IS USED. C REAL YM(100,2) C C SET UP OUTPUT FILE C [CHAPTER 1, SECTION D.4, DESCRIBES HOW TO CHANGE OUTPUT UNIT.] C CALL IPRINT(IPRT) OPEN (UNIT=IPRT, FILE='FILENM') C C SPECIFY NECESSARY DIMENSIONS C IYM = 100 C C SET THE SEED C THE NUMBER OF VALUES TO BE GENERATED C THE NUMBER OF SETS OF DATA TO BE GENERATED C ISEED = 531 N = 50 M = 2 C C GENERATE STANDARD NORMAL PSEUDO-RANDOM NUMBERS INTO COLUMN 1 OF YM C AND NORMAL PSEUDO-RANDOM NUMBERS WITH MEAN 4.0 AND C STANDARD DEVIATION 0.5 INTO COLUMN 2 OF YM C CALL NRAND (YM(1,1), N, ISEED) C YMEAN = 4.0 SIGMA = 0.5 CALL NRANDC (YM(1,2), N, ISEED, YMEAN, SIGMA) C C PRINT TITLE AND CALL MVP TO PLOT RESULTS, C SAMPLING EVERY OBSERVATION C WRITE (IPRT,100) CALL MVP (YM, N, M, IYM, 1) C C FORMAT STATEMENTS C 100 FORMAT ('1RESULTS OF STARPAC NORMAL PSUEDO-RANDOM NUMBER', 1 ' GENERATION SUBROUTINES', 2 ' DISPLAYED WITH STARPAC PLOT SUBROUTINE MVP') END Data: NO DATA NEEDED FOR THIS EXAMPLE <3-4> 1 ^PY^- ^IGE^- ^IOL^- ^IS010^G^- ^SM10^- ^IL0810^- ^ISTF00^- ^IC0000^- ^IMV0055008500^-^IMH0075011000^- ^IS010^G^- ^SM10^- ^-^PN^- 1RESULTS OF STARPAC NORMAL PSUEDO-RANDOM NUMBER GENERATION SUBROUTINES DISPLAYED WITH STARPAC PLOT SUBROUTINE MVP STARPAC 2.07S (11/07/87) -2.7388 -1.9718 -1.2049 -.4380 .3289 1.0959 1.8628 2.6297 3.3967 4.1636 4.9305 -I---------I---------I---------I---------I---------I---------I---------I---------I---------I---------I- 1.0000 I A B I 2.0000 I A B I 3.0000 I A BI 4.0000 I A B I 5.0000 I A B I 6.0000 I A B I 7.0000 I A B I 8.0000 I A B I 9.0000 IA B I 10.000 I A B I 11.000 I A B I 12.000 I A B I 13.000 I A B I 14.000 I A B I 15.000 I A B I 16.000 I A B I 17.000 I A B I 18.000 I A B I 19.000 I A B I 20.000 I A B I 21.000 I A B I 22.000 I A B I 23.000 I A B I 24.000 I A B I 25.000 I A B I 26.000 I A B I 27.000 I A B I 28.000 I A B I 29.000 I A B I 30.000 I A B I 31.000 I A B I 32.000 I A B I 33.000 I A B I 34.000 I A B I 35.000 I A B I 36.000 I A B I 37.000 I A B I 38.000 I A B I 39.000 I A B I 40.000 I A B I 41.000 I A B I 42.000 I A B I 43.000 I A B I 44.000 I A B I 45.000 I A B I 46.000 I A B I 47.000 I A B I 48.000 I A B I 49.000 I A B I 50.000 I A B I <3-5> 1 ^PY^- ^IGE^- ^IS010^G^- ^SM10^- ^IOP^- ^IS532^G^- ^SM532^- ^IL0650^- ^ISTF00^- ^IC0000^- ^IMV0100011000^-^IMH0110008500^- ^IS532^G^- ^SM532^- ^IL0650^- ^-^PN^- 1G. Acknowledgments The code used to generate the pseudo-random numbers was written by Boisvert and Kahanar of the National Bureau of Standards Scientific Computing Division. <3-6> 1 ^PY^- ^IGE^- ^IS010^G^- ^SM10^- ^IOP^- ^IS532^G^- ^SM532^- ^IL0650^- ^ISTF00^- ^IC0000^- ^IMV0100011000^-^IMH0110008500^- ^IS532^G^- ^SM532^- ^IL0650^- ^-^PN^- 1 CHAPTER 4 HISTOGRAMS A. Introduction STARPAC contains two subroutines for producing histograms. Both subroutines produce a one-page printout which includes, in addition to the histogram, a number of summary statistics (mean, median, standard deviation, cell fractions, etc.) and several tests for normality. Users are directed to section B for a brief description of the subroutines. The declaration and CALL statements are given in section C and the subroutine arguments are defined in section D. The algorithms used and output produced by these subroutines are discussed in section E. Sample programs and their output are shown in section F. B. Subroutine Descriptions HIST provides the analysis described in section A using a preset procedure for choosing the number of cells. The lower and upper bounds of the histogram are chosen from the range of the observations. HISTC provides the same analysis as HIST but allows the user to specify the number of cells and the upper and lower cell boundaries. Statistics are based only on the data within the user-supplied bounds. C. Subroutine Declaration and CALL Statements NOTE: Argument definitions and sample programs are given in sections D and F, respectively. The conventions used to present the following declaration and CALL statments are given in chapter 1, sections B and D. HIST: Compute and print a histogram and summary statistics, with automatic selection of number of cells and cell boundaries Y(n) DOUBLE PRECISION DSTAK(ldstak) COMMON /CSTAK/ DSTAK : : CALL HIST (Y, N, LDSTAK) === <4-1> 1HISTC: Compute and print a histogram and summary statistics with user control of number of cells and cell boundaries Y(n) DOUBLE PRECISION DSTAK(ldstak) COMMON /CSTAK/ DSTAK : : CALL HISTC (Y, N, NCELL, YLB, YUB, LDSTAK) === D. Dictionary of Subroutine Arguments NOTE: --> indicates that the argument is input to the subroutine and that the input value is preserved; <-- indicates that the argument is returned by the subroutine; <-> indicates that the argument is input to the subroutine and that the input value is overwritten by the subroutine; --- indicates that the argument is input to some subroutines and is returned by others; *** indicates that the argument is a subroutine name; ... indicates that the variable is passed via COMMON. DSTAK ... The DOUBLE PRECISION vector in COMMON /CSTAK/ of dimension at least LDSTAK. DSTAK provides workspace for the computations. The first LDSTAK locations of DSTAK will be overwritten during subroutine execution. IERR ... An error flag returned in COMMON /ERRCHK/. [See chapter 1, section D.5.] Note that using (or not using) the error flag will not affect the printed error messages that are automatically provided. IERR = 0 indicates that no errors were detected. IERR = 1 indicates that improper input was detected. LDSTAK --> The length of the DOUBLE PRECISION workspace vector DSTAK. LDSTAK must equal or exceed the appropriate value given below, where if the single precision version of STARPAC is being used P = 0.5, otherwise P = 1.0. [See chapter 1, section B.] For HIST: LDSTAK >= (17+N)/2 + 26*P For HISTC: LDSTAK >= (17+N)/2 + max(NCELL, 26)*P N --> The number of observations. NCELL --> The number of cells in the histogram. If NCELL <= 0 or NCELL is not an argument in the subroutine CALL statement the subroutine will choose the number of cells. Y --> The vector of dimension at least N that contains the observed data. <4-2> 1YLB --> The lower bound for constructing the histogram. The interval [YLB, YUB] is divided into NCELL increments. If YLB >= YUB, the lower and upper bounds for constructing the histogram will be the minimum and maximum observations. YUB --> The upper bound for constructing the histogram. The interval [YLB, YUB] is divided into NCELL increments. If YLB >= YUB, the lower and upper bounds for constructing the histogram will be the minimum and maximum observations. E. Computational Methods E.1 Algorithms The code and output for the histogram subroutines are modeled after early versions of MINITAB [Ryan et al., 1974]. E.2 Computed Results and Printed Output The output from the histogram family of subroutines includes a summary of the input data in addition to the actual histogram. This summary includes the following information, where the actual output headings are given by the uppercase phrases enclosed in angle braces (<...>). Results which correspond to subroutine CALL statements arguments are identified by the argument name in uppercase. In the formulas, x(1), x(2), ..., x(k) denotes the ordered observations of Y such that YLB <= Y(i) <= YUB, i = 1, ..., N, i.e., x(1) is the smallest observation of Y such that YLB <= Y(i), x(k) is the largest observation of Y such that Y(i) <= YUB, etc. The value of expressions enclosed in square brackets, e.g., [(k/2) + 1], is the largest integer less than or equal to the value of the expression. * , N * * * , YLB * , YUB * , NCELL * , k, where k = the number of observations for which YLB <= Y(i) <= YUB, i = 1, ..., N * , x(1), where x(1) = the smallest observation such that YLB <= Y(i), i = 1, ..., N * , x(k), where x(k) = the largest observation such that Y(i) <= YUB, i = 1, ..., N <4-3> 1 * , xmean, where k xmean = 1/k SUM x(i) i=1 * , xmedian, where xmedian = x([(k+1)/2]) if k is odd xmedian = 0.5*( x([k/2]) + x([(k/2)+1]) ) if k is even * <25 PCT TRIMMED MEAN>, xtrim, where k-[k/4] xtrim = 1/(k-(2*[k/4])) SUM x(i) i=1+[k/4] * , s, where k s = sqrt[ 1/(k-1) SUM (x(i)-xmean)**2 ] i=1 * , r, where k r = 1/(s*k) SUM |x(i)-xmean| i=1 * , sqrt(beta1), where k beta1 = k/((k-1)**3 * s**6) (SUM (x(i)-xmean)**3)**2 i=1 * , beta2, where, k beta2 = k/((k-1)**2 * s**4) SUM (x(i)-xmean)**4 i=1 Information provided for each cell, u = 1, ..., NCELL, of the histogram includes the following. * , cmid, where cmid = YLB + (YUB-YLB)/(2*NCELL) * , the cumulative fraction of the observations which are in cells 1 through u * <1-CUM. FRACT.>, the cumulative fraction of the observations which are in cells u through NCELL * , the fraction of the observations which are in cell u * , the actual number of observations which are in cell u. <4-4> 1 The histogram itself displays the actual number of observations in each cell when the largest number of observations per cell does not exceed 50. When the largest number of observations per cell does exceed 50 then the histogram displays the cell fraction. F. Example The example program below uses HIST to analyze the 39 measurements of the velocity of light shown on page 81 of Mandel [1964]. <4-5> 1Program: PROGRAM EXAMPL C C DEMONSTRATE HIST USING SINGLE PRECISION VERSION OF STARPAC C RUN ON CYBER 180/840. C C N.B. DECLARATION OF Y MUST BE CHANGED TO DOUBLE PRECISION IF C DOUBLE PRECISION VERSION OF STARPAC IS USED. C REAL Y(200) DOUBLE PRECISION DSTAK(200) C COMMON /CSTAK/ DSTAK C C SET UP INPUT AND OUTPUT FILES C [CHAPTER 1, SECTION D.4, DESCRIBES HOW TO CHANGE OUTPUT UNIT.] C CALL IPRINT(IPRT) OPEN (UNIT=IPRT, FILE='FILENM') OPEN (UNIT=5, FILE='DATA') C C SPECIFY NECESSARY DIMENSIONS C LDSTAK = 200 C C READ NUMBER OF OBSERVATIONS C OBSERVED DATA C READ (5,100) N READ (5,101) (Y(I), I=1,N) C C PRINT TITLE AND CALL HIST TO ANALYZE RESULTS C WRITE (IPRT,102) CALL HIST (Y, N, LDSTAK) C C FORMAT STATEMENTS C 100 FORMAT (I5) 101 FORMAT (13F5.1) 102 FORMAT ('1RESULTS OF STARPAC HISTOGRAM SUBROUTINE HIST') END Data: 39 0.4 0.6 1.0 1.0 1.0 0.5 0.6 0.7 1.0 0.6 0.2 1.9 0.2 0.4 0.0 -0.4 -0.3 0.0 -0.4 -0.3 0.1 -0.1 0.2 -0.5 0.3 -0.1 0.2 -0.2 0.8 0.5 0.6 0.8 0.7 0.7 0.2 0.5 0.7 0.8 1.1 <4-6> 1 ^PY^- ^IGE^- ^IOL^- ^IS010^G^- ^SM10^- ^IL0810^- ^ISTF00^- ^IC0000^- ^IMV0055008500^-^IMH0075011000^- ^IS010^G^- ^SM10^- ^-^PN^- 1RESULTS OF STARPAC HISTOGRAM SUBROUTINE HIST STARPAC 2.07S (11/07/87) HISTOGRAM NUMBER OF OBSERVATIONS = 39 MINIMUM OBSERVATION = -5.00000000E-01 MAXIMUM OBSERVATION = 1.90000000E+00 HISTOGRAM LOWER BOUND = -5.00000000E-01 HISTOGRAM UPPER BOUND = 1.90000000E+00 NUMBER OF CELLS = 9 OBSERVATIONS USED = 39 25 PCT TRIMMED MEAN = 4.23809524E-01 MIN. OBSERVATION USED = -5.00000000E-01 STANDARD DEVIATION = 5.06689395E-01 MAX. OBSERVATION USED = 1.90000000E+00 MEAN DEV./STD. DEV. = 7.99040249E-01 MEAN VALUE = 4.10256410E-01 SQRT(BETA ONE) = 3.10646614E-01 MEDIAN VALUE = 5.00000000E-01 BETA TWO = 3.33793260E+00 FOR A NORMAL DISTRIBUTION, THE VALUES (MEAN DEVIATION/STANDARD DEVIATION), SQRT(BETA ONE), AND BETA TWO ARE APPROXIMATELY 0.8, 0.0 AND 3.0, RESPECTIVELY. TO TEST THE NULL HYPOTHESIS OF NORMALITY, SEE TABLES OF CRITICAL VALUES PP. 207-208, BIOMETRIKA TABLES FOR STATISTICIANS, VOL. 1. SEE PP. 67-68 FOR A DISCUSSION OF THESE TESTS. INTERVAL CUM. 1-CUM. CELL NO. NUMBER OF OBSERVATIONS MID POINT FRACT. FRACT. FRACT. OBS. + 0 10 20 30 40 50 ------------------------------------------ +---------+---------+---------+---------+---------+ -3.666667E-01 .128 1.000 .128 5 +++++ -1.000000E-01 .256 .872 .128 5 +++++ 1.666667E-01 .410 .744 .154 6 ++++++ 4.333333E-01 .564 .590 .154 6 ++++++ 7.000000E-01 .846 .436 .282 11 +++++++++++ 9.666667E-01 .949 .154 .103 4 ++++ 1.233333E+00 .974 .051 .026 1 + 1.500000E+00 .974 .026 .000 0 1.766667E+00 1.000 .026 .026 1 + <4-7> 1 ^PY^- ^IGE^- ^IS010^G^- ^SM10^- ^IOP^- ^IS532^G^- ^SM532^- ^IL0650^- ^ISTF00^- ^IC0000^- ^IMV0100011000^-^IMH0110008500^- ^IS532^G^- ^SM532^- ^IL0650^- ^-^PN^- 1G. Acknowledgments The code and output for the histogram subroutines is modeled on that in early versions of MINITAB [Ryan et al., 1974]. <4-8> 1 ^PY^- ^IGE^- ^IS010^G^- ^SM10^- ^IOP^- ^IS532^G^- ^SM532^- ^IL0650^- ^ISTF00^- ^IC0000^- ^IMV0100011000^-^IMH0110008500^- ^IS532^G^- ^SM532^- ^IL0650^- ^-^PN^- 1 CHAPTER 5 STATISTICAL ANALYSIS OF A UNIVARIATE SAMPLE A. Introduction STARPAC contains 4 subroutines for performing a comprehensive statistical analysis of a univariate sample. They each compute 53 different statistics which summarize the sample through measures of location (mean, median, etc), measures of dispersion (standard deviation, mean deviation, etc.) and diagnostic features such as tests for outliers, non-normality, trends and non-randomness (assuming the input order of the data is a meaningful time sequence). Common statistics such as Student's t and confidence intervals for the mean and standard deviation are also included. NBS Technical Note 756, A User's Guide to the OMNITAB Command "STATISTICAL ANALYSIS," by H. H. Ku [1973] provides a complete discussion of the output of these subroutines, which is the same output as that provided by the OMNITAB II Command STATISTICAL [Hogben et al., 1971]. Users are directed to section B for a brief description of the subroutines. The declaration and CALL statements are given in section C and the subroutine arguments are defined in section D. The algorithms used and output produced by these subroutines are discussed in section E. Sample programs and their output are shown in section F. B. Subroutine Descriptions STAT computes and prints the 53 descriptive statistics described in section A. STATS provides the same analysis as STAT but allows the user to suppress the printed output and store the computed statistics for further use. STATW and STATWS perform a weighted analysis and are otherwise identical to STAT and STATS, respectively. C. Subroutine Declaration and CALL Statements NOTE: Argument definitions and sample programs are given in sections D and F, respectively. The conventions used to present the following declaration and CALL statments are given in chapter 1, sections B and D. STAT: Compute and print 53 statistics describing the input data Y(n) DOUBLE PRECISION DSTAK(ldstak) COMMON /CSTAK/ DSTAK : : CALL STAT (Y, N, LDSTAK) === <5-1> 1STATS: Compute and optionally print 53 statistics describing input data; return statistics Y(n), STS(53) DOUBLE PRECISION DSTAK(ldstak) COMMON /CSTAK/ DSTAK : : CALL STATS (Y, N, LDSTAK, STS, NPRT) === STATW: Compute and print 53 statistics describing weighted input data Y(n), WT(n) DOUBLE PRECISION DSTAK(ldstak) COMMON /CSTAK/ DSTAK : : CALL STATW (Y, WT, N, LDSTAK) === STATWS: Compute and optionally print 53 statistics describing weighted input data; return statistics Y(n), WT(n), STS(53) DOUBLE PRECISION DSTAK(ldstak) COMMON /CSTAK/ DSTAK : : CALL STATWS (Y, WT, N, LDSTAK, STS, NPRT) === D. Dictionary of Subroutine Arguments and COMMON Variables NOTE: --> indicates that the argument is input to the subroutine and that the input value is preserved; <-- indicates that the argument is returned by the subroutine; <-> indicates that the argument is input to the subroutine and that the input value is overwritten by the subroutine; --- indicates that the argument is input to some subroutines and is returned by others; *** indicates that the argument is a subroutine name; ... indicates that the variable is passed via COMMON. DSTAK ... The DOUBLE PRECISION vector in COMMON /CSTAK/ of dimension at least LDSTAK. DSTAK provides workspace for the computations. The first LDSTAK locations of DSTAK will be overwritten during subroutine execution. <5-2> 1IERR ... An error flag returned in COMMON /ERRCHK/. [See chapter 1, section D.5.] Note that using (or not using) the error flag will not affect the printed error messages that are automatically provided. IERR = 0 indicates that no errors were detected. IERR = 1 indicates that improper input was detected. LDSTAK --> The length of the DOUBLE PRECISION workspace vector DSTAK. LDSTAK must equal or exceed (N/2) + 7. N --> The number of observations. NPRT --> The argument controlling printed output. If NPRT = 0 the printed output is suppressed. If NPRT <> 0 the printed output is provided. STS --> The vector of dimension at least 53 that contains the computed statistics. The contents of STS are listed below, along with applicable references; the number in parenthesis is the location within STS that the given statistic is stored. In the formulas, x denotes the ordered observations of Y for which the weight is nonzero, i.e., x(1) is the smallest observation of Y with a nonzero weight, x(k) is the largest observation of Y with a nonzero weight, etc. The weight associated with x(i) is denoted w(i). Zero weighted observations are not included in the analysis. The value of expressions enclosed in square brackets, e.g., [(k/2) + 1], is the largest integer less than or equal to the value of the expression. (1) NUMBER OF OBSERVATIONS, N (2) NUMBER OF NONZERO WEIGHTS, k (3) UNWEIGHTED MEAN [Dixon and Massey, 1957, p. 14], k xmean = 1/k SUM x(i) i=1 (4) WEIGHTED MEAN [Brownlee, 1965, pp. 95-97], k SUM w(i)*x(i) i=1 xwtdmean = --------------- k SUM w(i) i=1 (5) MEDIAN [Dixon and Massey, 1957, p. 70], xmedian = x([(k+1)/2]) if k is odd xmedian = 0.5*( x([k/2]) + x([(k/2)+1]) ) if k is even <5-3> 1 (6) MID-RANGE [Dixon and Massey, 1957, p. 71], xmid = 0.5*(x(1) + x(k)) (7) 25 PCT UNWTD TRIMMED MEAN [Crow and Siddiqui, 1967], k-[k/4] xtrim = 1/[k-2*[k/4]] SUM x(i) i=1+[k/4] (8) 25 PCT WTD TRIMMED MEAN, k-[k/4] SUM w(i)*x(i) i=1+[k/4] xwtdtrim = ------------------- k-[k/4] SUM w(i) i=1+[k/4] (9) WEIGHTED STANDARD DEVIATION [Snedecor and Cochran, 1967, p. 44], k s = sqrt( 1/(k-1) SUM w(i)*(x(i)-xwtdmean)**2 ) i=1 (10) WTD S.D. OF MEAN [Brownlee, 1965, p. 80], k smean = s / sqrt( SUM w(i) ) i=1 (11) RANGE [Snedecor and Cochran, 1967, p. 39], xrange = x(k) - x(1) (12) MEAN DEVIATION [Duncan, 1965, p. 50], k xmeandev = 1/k SUM |x(i)-xwtdmean| i=1 (13) VARIANCE [Snedecor and Cochran, 1967, p. 44], k s2 = 1/(k-1) SUM w(i)*(x(i)-xwtdmean)**2 i=1 (14) COEF. OF VAR. (PERCENT) [Snedecor and Cochran, 1967, p. 62], cvar = |100*s/xwtdmean| <5-4> 1 (15) LOWER CONFIDENCE LIMIT, MEAN [Natrella, 1966, pp. 2-2, 2-3], xmean - t(0.025)*smean where t(0.025) is the appropriate t-statistic with (k-1) degrees of freedom (16) UPPER CONFIDENCE LIMIT, MEAN [Natrella, 1966, pp. 2-2, 2-3], xmean + t(0.025)*smean where t(0.025) is the appropriate t-statistic with (k-1) degrees of freedom (17) LOWER CONFIDENCE LIMIT, S.D. [Natrella, 1966, p. 2-7], s*sqrt((k-1)/chi2(0.975)) where chi2(0.975) is the appropriate chi-square statistic with (k-1) degrees of freedom (18) UPPER CONFIDENCE LIMIT, S.D. [Natrella, 1966, p. 2-7], s*sqrt((k-1)/chi2(0.025)) where chi2(0.025) is the appropriate chi-square statistic with (k-1) degrees of freedom (19) SLOPE [Fisher, 1950, p. 136], k B = 12/(k*(k**2-1)) SUM i*(x(i) - xwtdmean) i=1 (20) S.D. OF SLOPE, k sqrt( -B**2*k*(k**2-1) + 12 SUM (x(i)-xwtdmean)**2 ) i=1 sB = ---------------------------------------------------- sqrt( k*(k**2-1)*(k-2) ) (21) SLOPE/S.D. OF SLOPE = T, t0 = B / sB with (k-2) degrees of freedom (22) PROB EXCEEDING ABS VALUE OF OBS T [Brownlee, 1965, p. 344], Prob (t < -|t0| and t > +|t0| ) (23) NO. OF RUNS UP AND DOWN [Brownlee, 1965, p. 223], r (24) EXPECTED NO. OF RUNS [Bradley, 1965, p. 279], E(r) = (2k-1)/3 <5-5> 1 (25) S.D. OF NO. OF RUNS [Bradley, 1965, p. 279], rsd = sqrt( [(16*k-29)/90] ) (26) MEAN SQ SUCCESSIVE DIFF [Brownlee, 1965, p. 222], k-1 D2 = 1/(k-1) SUM (x(i+1)-x(i))**2 i=1 (27) MEAN SQ SUCC DIFF/VAR [Brownlee, 1965, p. 222], D2/s2 (28) NO. OF + SIGNS, u = number of times sign of (x(i)-xwtdmean) is positive (29) NO. OF - SIGNS, v = number of times sign of (x(i)-xwtdmean) is negative (30) NO. OF RUNS [Brownlee, 1965, p. 224], RUNS = 1 + number of changes in sign of (x(i)-xwtdmean) (31) EXPECTED NO. OF RUNS [Brownlee, 1965, p. 227], E(RUNS) = 1 + (2*u*v)/k (32) S.D. OF RUNS [Brownlee, 1965, p. 230], RUNSsd = sqrt( 2*u*v*(2*u*v-u-v)/((u+v)**2*(k-1)) ) (33) DIFF./S.D. OF RUNS [Brownlee, 1965, p. 230], [RUNS - E(RUNS)] / RUNSsd (34) MINIMUM [Natrella, 1966, p. 19-1], x(1) = smallest value with nonzero weight (35) MAXIMUM [Natrella, 1966, p. 19-3], x(k) = largest value with nonzero weight (36) BETA ONE [Snedecor and Cochran, 1967, p. 86], k beta1 = k/((k-1)**3*s**6) * ( SUM (x(i)-xwtdmean)**3 )**2 i=1 (37) BETA TWO [Snedecor and Cochran, 1967, p. 87], k beta2 = k/((k-1)**2*s**4) SUM (x(i)-xwtdmean)**4 i=1 <5-6> 1 (38) WTD SUM OF VALUES, k SUM w(i)*x(i) i=1 (39) WTD SUM OF SQUARES, k SUM w(i)*x(i)**2 i=1 (40) WTD SUM OF DEVS SQUARED, k SUM w(i)*(x(i)-xwtdmean)**2 i=1 (41) STUDENT'S T [Brownlee, 1965, p. 296], k t = sqrt( SUM w(i) )*xwtdmean/s with (k-1) degrees of freedom i=1 (42) WTD SUM ABSOLUTE VALUES, k SUM w(i)*|xi| i=1 (43) WTD AVE ABSOLUTE VALUES, k SUM w(i)*|xi| i=1 ------------- k SUM w(i) i=1 (44-53) FREQUENCY DISTRIBUTION [Freund and Williams, 1958, p. 17]. WT <-- The vector of dimension at least N that contains the weights. A zero weight excludes the corresponding observation from the analysis. If the weights are all equal to 1.0, the resulting analysis is equivalent to an unweighted analysis. Y --> The vector of dimension at least N that contains the observations. The tests for trend and randomness will not be meaningful unless the sample is ordered with respect to time or some other relevant variable. <5-7> 1E. Computational Methods E.1. Algorithms Formulas for the computed statistics are given in section D under argument STS. The code for the statistical analysis subroutines is adapted from OMNITAB II [Hogben et al., 1971]. E.2. Computed Results and Printed Output The output consists of a one-page display of the 53 statistics listed for argument STS in section D. The argument, NPRT, controlling the printed output is discussed in section D. F. Example The example program below uses STAT to analyze the 39 measurements of the velocity of light shown on page 81 of Mandel [1964]. <5-8> 1Program: PROGRAM EXAMPL C C DEMONSTRATE STAT USING SINGLE PRECISION VERSION OF STARPAC C RUN ON CYBER 180/840. C C N.B. DECLARATION OF Y MUST BE CHANGED TO DOUBLE PRECISION IF C DOUBLE PRECISION VERSION OF STARPAC IS USED. C REAL Y(200) DOUBLE PRECISION DSTAK(200) C COMMON /CSTAK/ DSTAK C C SET UP INPUT AND OUTPUT FILES C [CHAPTER 1, SECTION D.4, DESCRIBES HOW TO CHANGE OUTPUT UNIT.] C CALL IPRINT(IPRT) OPEN (UNIT=IPRT, FILE='FILENM') OPEN (UNIT=5, FILE='DATA') C C SPECIFY NECESSARY DIMENSIONS C LDSTAK = 200 C C READ NUMBER OF OBSERVATIONS C OBSERVED DATA C READ (5,100) N READ (5,101) (Y(I), I=1,N) C C PRINT TITLE AND CALL STAT TO PERFORM STATISTICAL ANALYSIS C WRITE (IPRT,102) CALL STAT (Y, N, LDSTAK) C C FORMAT STATEMENTS C 100 FORMAT (I5) 101 FORMAT (13F5.1) 102 FORMAT ('1RESULTS OF STARPAC STATISTICAL ANALYSIS', * ' SUBROUTINE STAT') END Data: 39 0.4 0.6 1.0 1.0 1.0 0.5 0.6 0.7 1.0 0.6 0.2 1.9 0.2 0.4 0.0 -0.4 -0.3 0.0 -0.4 -0.3 0.1 -0.1 0.2 -0.5 0.3 -0.1 0.2 -0.2 0.8 0.5 0.6 0.8 0.7 0.7 0.2 0.5 0.7 0.8 1.1 <5-9> 1 ^PY^- ^IGE^- ^IOL^- ^IS010^G^- ^SM10^- ^IL0810^- ^ISTF00^- ^IC0000^- ^IMV0055008500^-^IMH0075011000^- ^IS010^G^- ^SM10^- ^-^PN^- 1RESULTS OF STARPAC STATISTICAL ANALYSIS SUBROUTINE STAT STARPAC 2.07S (11/07/87) +STATISTICAL ANALYSIS N = 39 FREQUENCY DISTRIBUTION (1-6) 5 3 8 3 7 7 5 0 0 1 MEASURES OF LOCATION (2-2) MEASURES OF DISPERSION (2-6) UNWEIGHTED MEAN = 4.1025641E-01 WTD STANDARD DEVIATION = 5.0668940E-01 WEIGHTED MEAN = 4.1025641E-01 WEIGHTED S.D. OF MEAN = 8.1135237E-02 MEDIAN = 5.0000000E-01 RANGE = 2.4000000E+00 MID-RANGE = 7.0000000E-01 MEAN DEVIATION = 4.0486522E-01 25 PCT UNWTD TRIMMED MEAN= 4.2380952E-01 VARIANCE = 2.5673414E-01 25 PCT WTD TRIMMED MEAN = 4.2380952E-01 COEF. OF. VAR. (PERCENT) = 1.2350554E+02 A TWO-SIDED 95 PCT CONFIDENCE INTERVAL FOR MEAN IS 2.4600615E-01 TO 5.7450667E-01 (2-2) A TWO-SIDED 95 PCT CONFIDENCE INTERVAL FOR S.D. IS 4.1408928E-01 TO 6.5301023E-01 (2-7) LINEAR TREND STATISTICS (5-1) OTHER STATISTICS SLOPE = -4.0080972E-03 MINIMUM = -5.0000000E-01 S.D. OF SLOPE = 7.2760495E-03 MAXIMUM = 1.9000000E+00 SLOPE/S.D. OF SLOPE = T = -5.5086172E-01 BETA ONE = 9.6501319E-02 PROB EXCEEDING ABS VALUE OF OBS T = .585 BETA TWO = 3.3379326E+00 WTD SUM OF VALUES = 1.6000000E+01 WTD SUM OF SQUARES = 1.6320000E+01 TESTS FOR NON-RANDOMNESS WTD SUM OF DEV SQUARED = 9.7558974E+00 STUDENTS T = 5.0564517E+00 NO. OF RUNS UP AND DOWN = 23 WTD SUM ABSOLUTE VALUES = 2.0600000E+01 EXPECTED NO. OF RUNS = 25.7 WTD AVE ABSOLUTE VALUES = 5.2820513E-01 S.D. OF NO. OF RUNS = 2.57 MEAN SQ SUCCESSIVE DIFF = 2.8289474E-01 MEAN SQ SUCC DIFF/VAR = 1.102 DEVIATIONS FROM WTD MEAN NO. OF + SIGNS = 20 NO. OF - SIGNS = 19 NO. OF RUNS = 8 EXPECTED NO. OF RUNS= 20.5 S.D. OF RUNS = 3.08 DIFF./S.D. OF RUNS = -4.056 NOTE - ITEMS IN PARENTHESES REFER TO PAGE NUMBER IN NBS HANDBOOK 91 (NATRELLA, 1966) <5-10> 1 ^PY^- ^IGE^- ^IS010^G^- ^SM10^- ^IOP^- ^IS532^G^- ^SM532^- ^IL0650^- ^ISTF00^- ^IC0000^- ^IMV0100011000^-^IMH0110008500^- ^IS532^G^- ^SM532^- ^IL0650^- ^-^PN^- 1G. Acknowledgments The code and output for the statistical analysis subroutines is adapted from OMNITAB II [Hogben et al., 1971].