
          RAPDistance Programs;   Version 1.04
     for the Analysis of Patterns of RAPD Fragments.

PURPOSE of the PROGRAMS

     The relatedness of DNA samples may be assessed by comparing RAPD
(Randomly Amplified Polymorphic DNA) or RFLP (Restriction Length
Polymorphism) fragments of DNA; these are obtained from each sample, are
separated according to their sizes, and the presence/absence of shared
fragments used to estimate the relatedness of the DNA samples.  The
RAPDistance programs are designed to help record and analyse the
fragment data; the program prompts are written for RAPD data but the
programs can be used with care for RFLP data.  There is a program for
encoding the primary data (presence/absence of bands), and others for
editing the resulting file.  Alternatively the data can be recorded
using a spreadsheet program, such as eXceL, and transformed into the
format used by the RAPDistance programs, or vice versa.

     The primary data may then be used to calculate pairwise distances
between the samples using one or other of several metrics, and the
distances stored in files with formats for:

     - the NJTREE or NTSYS tree-building programs;

     - the matrix-comparing program DIPLOMO;

     - the statistical analysis program WINAMOVA;

     - the phylogenetic analysis programs PAUP and PHYLIP;

     - the multivariate analysis programs NTSYS;

     There are also programs for PTP (Permutation Tail Probability)
assessment of the significance of trees generated by the
neighbour-joining method, others for determining which bands best
correlated with differences established by the tree-building or
multi-variate analyses, and others for assessing which samples produced
RAPD bands that correlated worst with a provided distance matrix.  To
help these analyses there is a program, 3DIST, which calculates the
pairwise distances between the taxa (the samples) defined by any
combination of 1, 2 or 3 vectors (e.g. map co-ordinates of the sites
where the sample organisms were collected, or the results of a principal
coordinates analysis), so that these can be compared with distances
obtained from RAPD/RFLP data using DIPLOMO.

     The programs are written in Ansi C.  They were developed using
Borlands C++ 3.0.  The aim has been to permit the package to operate over
the widest possible range of IBM compatible personal computers.  It is
not necessary to have Windows software or the presence of a numerical
co-processor.

     The package uses batch files to provide a Main menu and four
sub_menus.  There are specific HELP files for each group of programs,
these are accessed through the HELP-MENU ; in addition HELP.TXT is an
ASCII file containing all the HELP files, so that a search can be made
for particular words using a text editor.  Also included in the package
is the file RAPDTXT.DOC (a Microsoft Word document) containing the complete
set of Help/Notes in manual form.  It is recommended that you print this
file from the Microsoft Word package.

     At present we only distribute the package as executable files; we
will provide versions with other sample/band number combinations if
requested.

     The package is freeware and can be freely distributed. There is no
registration fee. However, users of the package are asked to contact the
authors by Email so that they can be placed on the distribution list for
advice on updates.

     Users of Macintosh computers who would like to use the package
should read the file MAC.DOC.


The authors are:
  John Armstrong(1), Adrian Gibbs (2), Rod Peakall (3), Georg Weiller (4)
    of the Australian National University, Canberra, Australia.

(1) Research School of Biological Sciences, Institute of
     Advanced Studies, A.N.U., P.O. Box 475, A.C.T.,2601.
      ph. 61 (0)6 249 2490; fax. 61 (0)6 249 4437;
          email: JohnA@rsbs.anu.edu.au
(2) address as for (1).  ph. 61 (0)6 249 4211; fax. 61
     (0)6 249 4437;
          email: Gibbs@rsbs.anu.edu.au
(3) Department of Botany and Zoology, A.N.U., P.O. Box 4,
     A.C.T., 0200.
      ph. 61 (0)6 249 0022; fax. 61 (0)6 249 5573;
              email: Rod.Peakall@anu.edu.au
 (4) address as for (1).  ph. 61 (0)6 249; fax. 61 (0)6 249 4437;
         email: Weiller@rsbs.anu.edu.au
Contact:
     (1) for programming of the RAPDistance programs;
     (3) for RAPD applications and WINAMOVA analysis of RAPD data;
     (4) & (2) for comparative data analysis, especially using DIPLOMO.



BAND DATA

     The RAPD technique produces a mixture of DNA fragments from each
DNA sample being compared; the RAPD method selects the sequence at both
ends of each fragment using a short synthetic DNA molecule to prime PCR
replication of the intervening sequence.  The fragments are separated
according to their sizes by electrophoresis through an immobile gel,
resulting in a characteristic pattern of bands from each sample.  When
the fragment mixtures produced from two or more samples are compared,
the fragments forming bands at the same position in the gel are assumed
to be homologous.  The similarity of the DNA samples is then computed
from the presence/absence of shared bands.

     The first stage in data analysis is to inspect the band patterns of
all the DNA samples being compared with each primer, and to determine
the total number of unique bands produced by that primer with all the
DNA samples being compared (`Total number of bands').  Then the
presence/absence of every band in each pattern is recorded; initially
this is best recorded on paper as a matrix with each sample (track)
forming one column of the matrix, and each band one row; in these HELP
files we use this convention although if you examine a RAPDistance
datafile directly you will find that the data is actually stored with
each sample forming one row.



STARTING THE RAPDistance PROGRAMS

     The programs are installed (see file README.DOC) in an appropriate
sub-directory (usually \RAPD).  The datafiles they produce can be in the
same sub-directory as the RAPDistance programs, but it is perhaps best
to have them in another (such as  \RAPD\DATA), but this directory MUST
contain a copy of the FILENAME.DAT file, and ';c\rapd' must be added to
the PATH statement of the AUTOEXEC.BAT file.

     The programs are started by the command:

     C:\RAPD>RAPD<Enter>

The program responds with a welcoming message and a disclaimer, and also
states the maximum number of samples and bands this version of the
program can handle.  Another <Enter> produces the Main Menu of options.

     There are 4 sub-menus.

M1 - for the RAPDistance Input/Editing programs

M2 - for Distance Calculation Programs

M3 - for Analysis Programs

M4 - for Help documentation

Each submenu finishes with the options:

MM   Return to MAIN-MENU
 Q   QUIT.
.
    Which option?  Type number/letter(s) and Enter:

                    ---------------------------

     When you start, familiarise yourself with the package using the
file EXAMPLE.DAT which is a RAPD test data file that is included.

Note that most options in most programs have a defined
default value that is chosen by hitting <Enter>.


DATA STORAGE/LISTING/STATISTICS.

Option 1.  This program (RAPDIN.C) stores data in a RAPDistance
        datafile.

     The program will ask you for:

     - a name for the datafile that is to hold the data;

     - the number of samples;

     - a name for each sample;

     - the number of 'populations' into which the samples are to be
       grouped if they are to be statistically analysed (n.b. the
       samples for each population must be recorded consecutively in
       a single group);

     - the number of samples in each 'population';

     - the number of primers that were used to generate the data (n.b.
       each primer will give one set of band patterns, and these must be
       recorded as a block of rows in the matrix);

     - the name of each primer;

     - the length of the primer (number of nucleotides), which is
       required for calculating 'genetic distances' by metric 13;

     - the 'total number of bands' generated by each primer.

     The responses to these questions defines the arrangement of the
     data in the file, and the program then prompts you to record whether
     each of the defined bands is present, '1', or not, '0'.  There is no
     provision for 'missing data', such as the absence of data for some
     primer/sample combinations).  The name of the RAPDistance datafile
     made by this program is also held in the file FILENAME.DAT, which
     holds the default name of the current datafile; if therefore, for some
     reason, you split these programs/files between directories or computers
     be sure to move this file as well as the datafile itself.

Option 2.  This program (RAPDLIST.C) makes an ASCII file of the stored data
     for printing.  This program has been modified to give warnings if limits
     for the number of bands, etc have been exceeded.  In particular watch
     the number of populations when you are combining datafiles or deleting
     samples.  Use the editing options supplied to tidy up the new population
     groupings.

Option 3. This program (RAPDSTAT.C) gives some statistics relating to the
     data in a datafile.

     This program assesses and prints the total number of samples, primers
     and bands.

     In order that replicates can be removed before statistical analysis,
     the program checks for:

     - duplicate samples; those that have the same band patterns;

     - the bands that are invariant, or have 'singleton' differences or
     'parsimony' differences (i.e. at least two samples with the same bands
     and differing from the others);

     - duplicate bands; those that have the same sample pattern;


EDITING.

Option 4.  This program (RAPDED1.C) edits data elements of a datafile
and/or add data for extra DNA samples.

     It allows data elements (defined by sample and band; numbered as
recorded, unless the order has been changed) to be edited and/or the
data for extra DNA samples to be added to the datafile.  Each extra
sample is stored as a new population, therefore it may be necessary to
use Option 9 subsequently to change the order of samples, so that all
members of each population are adjacent in the datafile, and then to
redefine the populations using Option 10.

     In this and Options #5 and #6, a copy (*.OLD) of the file being
edited is saved.

Option 5.  This program (RAPDED2.C) deletes selected DNA samples from a
datafile.

     It allows a subset of the sample data in a datafile to be selected,
either by deleting those samples you don't want to retain, or by
selecting those you wish to retain.  The samples are identified by
number (i.e. the order in which they were recorded, if this has not been
changed by Option 9), not by name.  If you are uncertain of the order
then use Option 2 and check it; the samples in the matrix are numbered
left to right.  Give the numbers of the chosen samples in any order, but
ONLY ONCE, then enter -99.

Option 6.  This program (RAPDED3.C) deletes selected bands from a datafile.

     This program is analogous to that for selecting a subset of the
sample data (Option 5).

Option 7.  This program (RAPDED4.C) combines datafiles - same samples
more bands.

     This program combines datafiles, but only checks that those being
combined are compatible in that they have the same number of samples,
and the primers are of the same length.  It does not compare the names
of the samples, and the combined data file uses the names given in the
first datafile.

Option 8.  This program (RAPDED5.C) combines datafiles - same bands more
samples.

N.B. This option will inevitably increase the number of populations.  Make
sure this is not a problem by using option #2 on the datafiles BEFORE
combining them.  If necessary use Option #10 to redefine the populations.

     This program combines datafiles, but only checks that those being
combined are compatible in that they have the same number of bands and
primers.  Again it uses the names given in the first datafile.

Option 9.  This program (RAPDED6.C) re-arranges the order of DNA samples
in a datafile.

     This program moves the data for individual samples in the datafile.
The samples are numbered in the order in which they were stored, unless
this has been changed by subsequent editing.  The program moves the
samples, one at a time, and renumbers the others.  So if, for example,
you move sample 10 to become sample 4, then the original sample 4
becomes sample 5, 5 becomes 6, etc, etc.  It is therefore essential to
plan the changes logically; it is usually best to move those with
smaller numbers before those with larger; if you are uncertain of the
order of the samples, check them using Option 2.  If, for example, you
wish to change the order among 8 samples, so that samples 3, 4 and 7
become 1st 2nd and 3rd in the file, then move 3 to 1 so that:

                   samples 1 2 3 4 5 6 7 8
                    become 3 1 2 4 5 6 7 8

now move 4 to 2 so that the order becomes:

                           3 4 1 2 5 6 7 8

finally move 7 to 3 and the final order is:
                           3 4 7 1 2 5 6 8
and then finish the run with -99.

Option 10.  This program (RAPDED7.C) re-defines the groupings of samples
in a datafile.

     This enables you to redefine the groupings of samples in a datafile
for subsequent statistical analysis.  First use Option 9 to reorder
those that you intend to group into the same population so that they are
adjacent to one another in the matrix.  Then use this program to
redefine the populations; the program only checks that the total number
of samples redefined does not exceed the number in the datafile.

Option 11.     This program (RAPDED8.C) performs the same function as in
option 10 but for primers (ie. groupings of bands).


CONVERTING FILE FORMATS.

Option 12.  This program (RAPDPAUP.C) produces a datafile in PAUP format
from a RAPDistance datafile.

Option 13.  This program (RAPD2SS.C) produces a Spreadsheet format file
from a RAPDistance datafile.

Option 14.  This program (SS2RAPD.C) produces a RAPDistance datafile
from a Spreadsheet format file.

For Option 14 to work properly the spreadsheet MUST BE CORRECTLY FORMATTED.
The spreadsheet file must be formatted as (TSV) Tab Separated Variables.
You must do all the editing that is necessary within the spreadsheet.  If
you attempt to edit the spreadsheet output file with another editor the
TABS may be expanded into spaces, up to 8 of them.  The program SS2RAPD
expects strictly only one tab between each of the items of the data.
Always check the output from this program using Option 2 (Help MENU H1).

     All three programs require the name of the output file to be given
or a default accepted.  Details of the formats are given in the Help
MENU option W.

     M2 MENU covers options 21 and 22 below.

CALCULATING PAIRWISE DISTANCES.

Option 21.  This program (RAPDALG.C) calculates pairwise distances
between the DNA samples using the band data from a RAPDistance datafile,
and any of 18 metrics and provides the results as triangular matrices in
five files:

     - *.NJT has the format required by some public versions of the
tree-building NJTREE and UPGMA programs;

     - *.DIP that required by the DIPLOMO matrix comparison program;

     - *.DIM also for use with DIPLOMO, (see Help Section W)

     - *.PHY that required by the phylogenetic inference package PHYLIP;

     - *.NTS that required by the numerical taxonomy package NTSYS-pc;

     - *.DIS that required by the statistical package WINAMOVA.

     The algorithms calculate the similarity of pairs of samples, each
of which is characterized by the presence (1) or absence (0) of bands.
For example, two samples x and y, each having 10 possible band
positions, might provide the following data :-

     Band Position        x       y
          1               1       0
          2               1       1
          3               1       0
          4               0       0
          5               1       1
          6               0       1
          7               0       0
          8               1       0
          9               1       1
         10               0       1

This data pattern can be described numerically as:
n   = The number of band positions 10
nx  = The number of bands present in x 6
ny  = The number of bands present in y 5
n11 = The number of positions where x=1 AND y=1 3
n00 = The number of positions where x=0 AND y=0 2
n01 = The number of positions where x=0 AND y=1 2
n10 = The number of positions where x=1 AND y=0 3
and, of course, nx = n11 + n10 = 6, and ny = n11 + n01 = 5

     Similarities can be calculated from these data in various ways; at
present 18 of them are encoded in Option 15; metrics 1 to 11 are
described in the NTSYS-pc Manual (pp. 7-14) (see Option (I)) and are in
the same order as in those programs:

1)  2*n11/((2*n11)+n01+n10) - Dice.

Czekanowski, J. (1913).  Zarys metod statystycznycg w zastosowaniu do
     antropologii.  Travaux de la Societe des Sciences de Varsovie III.
     Classes des sciences mathematiques et naturelles. no.5.

Dice, L.R. (1945).  Measures of the amount of ecologic association
     between species. Ecology 26: 297-302.

Nei, M. and Li, W.H. (1979).  Mathematical model for studying genetic
     variation in terms of restriction endonucleases. Proc. Natl. Acad.
     Sci. USA 76: 5269- 5273.

Sorensen, T. (1948).  A method of establishing groups of equal amplitude
     in plant sociology based on the similarity of species content and
     its application to analyses of the vegetation on Danish commons.
     K. Dan. Vidensk. Selsk. Biol. Skr. (Copenhagen) 5:1-34.

2)  n11/(n-n00) - Jaccard.

Jaccard, P. (1901).  Etude comparative de la distribution florale dans
     une portion des Alpes et des Jura. Bull. Soc. Vaudoise Sci. Nat.
     37: 547-579;  (1908). Nouvelles recherches sur la distribution
     florale. Bull. Soc. Vaud. Sci. Nat. 44: 223-270.

3)  n11/(n01+n10) - Kulczynski 1.

Kulczynski, S. (1927). Die Pflanzenassoziationen der Pieninen.  Bull.
     Intern. Acad. Pol. Sci. Lett. Cl. Sci. Math. Nat., B(Sci. Nat.),
     1927 (Suppl.2): 57- 203.

4)  0.5*((n11/(n11+n01)) + (n11/(n11+n10))) - Kulczynski 2.

Kulczynski, S. ibid.


5)  (n11*n00)- (n01*n10))/sqrt((n11+n01)*(n01+n00)*(n11+n10)*(n10+n00))

-         the Phi coefficient or Pearson's Phi coefficient.

Sokal, R.R. and Sneath, P.H.A. (1963).  Principles of Numerical
     Taxonomy. Freeman. p. 134.

6)  n11/n - Russell and Rao.

Russell, P.F. and Rao, T.R. (1940). On habitat and association of
     species of anopheline larvae in south-eastern Madras.  J. Malar.
     Inst. India 3: 153- 178.

7)  n11/(n11+2*(n10+n01)) - Sokal and Sneath 1. or Anderberg.

Sokal, R.R. and Sneath, P.H.A. (1963). Principles of Numerical Taxonomy.
     Freeman.  p.128 et seq.

8)  0.25*((n11/(n11+n10))+(n11/(n11+n01))+(n00/(n00+n10))+
               (n00/(n00+n01))) - Sokal and Sneath 2.

Sokal, R.R. and Sneath, P.H.A. ibid.

9)  n11/sqrt((n11+n10)*(n11+n01)) - Ochiai

Ochiai, A. (1957). Zoogeographic studies on the soleoid fishes found in
     Japan and its neighbouring regions. Bull. Jap. Soc. Sci. Fish 22:
     526-530.

10)  n11*n00/sqrt((n11+n10)*(n11+n01)*(n00+n10)*(n00+n01))
               - Sokal and Sneath 3.

Sokal, R.R. and Sneath, P.H.A. ibid.

11)  (n11*n00)-(n10*n01))/((n11*n00)+(n10*n01) - Yule and Kendall

Yule, G.U. and Kendall, M.G. (1950). An Introduction to the theory of
     Statistics.  14th edition. Hafner.

12)  0.5*(sqrt((F*F)+(8*F))-F))**(1/n), where F = 2*n11/(nx+ny) -
               Upholt.

Upholt, W.B. (1977). Estimation of DNA sequence divergence from
     comparison of restriction endonuclease digests.  Nucl. Acid Res. 4:
     1257-1265.

13)  'Evolutionary distance estimate' (K)

Li, W.-H. and Graur, D. (1991). 'Fundamentals of Molecular Evolution'
     Sinauer. pp 61-3.

14)  (n11+n00)/n - Simple Matching (or Apostol)

Apostol, B.L. et al. (1993). Estimation of the number of full sibling
     families at an oviposition site using RAPD-PCR markers:
     applications to the mosquito Aedes aegypti.  Theor. Appl. Genet.
     86: 991-1000.

15)  n*(1-(n11/n)) - Excoffier.

Excoffier, L., Smouse, P.E., and Quattro, J.M. (1992). Analysis of
     molecular variance inferred from metric distances among DNA
     haplotypes: application to human mitochondrial DNA restriction
     data.  Gen. Soc. Amer. 131: 479-491.

16)  (n11+n00)/(n11+2*(n10+n01)+n00) - Rogers and Tanimoto.

Rogers, D.J. and Tanimoto, T.T. (1960).  A computer program for
     classifying plants.  Science 132: 1115- 1118.

17)  (n11+n00)/(n11+0.5*(n10+n01)+n00 - Sneath and Sokal.

Sokal, R.R. and Sneath, P.H.A. ibid.

18)  (n11-(n10+n01)+n00)/n - Hamman.

Spath, H. (1980).  Cluster Analysis Algorithms.  Trans. Ursula Bull.
     Ellis Horwood (Halstead/Wiley), Chichester, England.

     Gower (1985), Jackson et al (1989) and Skroch et al (1992) have
discussed the relationships between some of these similarity measures
(Gower, J.C. 1985.  Measures of similarity, dissimilarity and distance.
In Klotz, S. and Johnson, N.L. (eds). 'Encyclopedia of Statistical
Sciences', Wiley, New York. Vol 5. pp 397-405;  Jackson, D.A., Somers,
K.M. and Harvey H.H. 1989.  Similarity coefficients: measures of
co-occurrence and association or simply measures of occurrence? American
Naturalist 133: 436-453;  Skroch, P., Tivang, J. and Nienhuis, J. 1992.
Analysis of genetic relationships using RAPD market data.  'Applications
of RAPD Technology to Plant Breeding'. Joint Plant Breeding Symp. Series
Nov 1992, pp 26-30).

     Most of the metrics give similarity values in the range 0 to +1 and
the programs convert these to distances as (1-s) or SQRT(1-s).  Metrics
5,11 and 18 give values in the range -1 to +1 and are converted to the
same range by the transformation (s+1)/2.  Metric 3 gives similarity
values in the range 0 to infinity but does not give a sensible estimate
with more than 50% of the bands shared, with the converse true for
Metric 13.  Metric 11 may give problems with some data; if the program
stops abnormally indicating an arithmetic error (e.g. Domain or Range)
use M1-menu Option 3 (RAPDSTAT.C) to check your data as there may be
duplicate samples.

     N.B.  Metrics 12 and 13 are the only ones to require the primer
length data.  Currently we have not resolved how to deal with different
primer lengths (which are allowed in the datafiles), and all
calculations are made using the primer length of the first group.

     The possibility of converting the similarity estimate to its
complement or the square root of its complement as the distance measure
is provided, because metrics 1,4,5,8,9,10,11,17 and 18 are non-metric
and the latter conversion may be more appropriate.

     Gower (1985) points out that some of the metrics are related by
simple monotonic functions; the distances they produce are linearly or
curvi-linearly related.  This can be confirmed by using DIPLOMO to
compare distances calculated by different metrics.  Thus metrics 1, 2,
3, 4, 7, 9, 12 and 13 form one group, and 5, 8 and 10 form another that
is closely similar to those of 14, 15, 16, 17 and 18.  Metric 11 gives
distances that are most closely, but curvi-linearly, related to those of
the 5/8/10 group, and metric 6 produces distances that are poorly
related to those produced by any of the other metrics.  Incidentally
PAUP character distances are identical to those produced by metrics
14/15/16/17/18.

     We have also compared, by DIPLOMO analysis, the pairwise distances
calculated from one data set using different metrics.  This was a RAPD
analysis of DNA samples of 20 individuals of a single plant species that
yielded a total of 100 bands with 5 primers (Vidya Jagadish, pers.
comm.).  DIPLOMO analysis was used to compare the pairwise plant DNA
RAPD distances with the geographical distances between the sites where
the plants were collected.  The largest correlations (0.285-0.275, d.f.
189) were obtained with the 5/8/10 metrics, metrics 14/15 were almost as
large (both 0.273), all the others gave significantly poorer
correlations; metric group 1/2/4 etc 0.218-0.232, metric 11 0.229, and
metric 6 - 0.006!  We suspect, however, that different correlations may
be obtained with different data sets; and advise that, if possible, each
data set be tested with a representative set of metrics against an
independent matrix of relationships.


Option 22 This program (3DIST.C) produces a distance matrix from 1, 2 or 3
dimensional data.

The program converts numerical characteristics of the samples for up to
3 vectors into a pairwise distance matrix.  The numerical characteristics
could be, for example, their positions in 1,2 or 3 dimensions of an
ordination (Multidimensional Scaling or Principal Co-ordinates) or the
geographical positions as defined by the two coordinates of latitude and
longitude.  The input file containing the vector data is named 3DIST.DAT
with each record (line) containing 1,2 or 3 values as appropriate; the
data are separated by SPACE or TAB characters.  The output files are
3DIST.DIP, 3DIST.DIM and 3DIST.NJT which can be used in conjunction with
either the RAPDCORL program (see section K) or the DIPLOMO package (see
section O) or option 31 to construct a tree representation.
(See also MDSDIST, in Support Programs).

Option 23 The program (NDIST.C) produces a distance matrix from a data
matrix.

The program converts to a distance matrix the numerical characteristics
of the samples in a similar way to that described for option 22 above
but allows for up to 35 columns.

The program operates in two modes.

        1) The user selects a column and the matrix of pairwise
differences is generated and output in the same six formats as noted in
option 21, i.e. *.NJT,*.DIP ... *.DIS

        2) One by one all the columns are used and the resulting
matrices are output as a DIP file NDIST.DIP which can then be used with
the DIPLOMO program.



Option 31 This batch file (31.BAT) generates a Neighbor Joining tree.

     This option requires a file of pairwise distances in *.NJT format,
see Help-Menu H21.


The option activates programs via NJ.BAT which deletes any existing
NJTREE.OUT file before running NJTREE.EXE to produce a new NJTREE.OUT
file which contains the inter- OTU/node branching order and lengths.
This file is then used by the TDRAW.EXE program to produce a dendrogram
in TDRAW.ASC and TDRAW.HP files; the former is an ASCII file
representation of the tree, the latter is in Hewlett- Packard Graphics
Language, and is converted into Postscript format by the HPGL2PS.EXE
program, using the command line

  >HPGL2PS TDRAW.HP > TDRAW.PS

  At this point you are given the option of adding a title to the tree
diagram.  If you choose to do this a further file TDRAW.PSA is generated.

  The tree can then be printed from the TDRAW.ASC or TDRAW.PS/PSA files,
and any files to be saved must be renamed, or they will be overwritten
by subsequent runs.

  The two files TDRAW.PS and TDRAW.PSA are in Postscript code but, as
they are ordinary ASCII text files, it is relatively easy to find the
locations (near the end of the file) where the branch lengths and branch
labels are set up for printing.  Consequently you can edit the file and,
for example, change the (probably) cryptic sample names (A1 ?) to the
full species name.

     A few datasets are not accepted by the TDRAW.EXE program, and the
computer jams with a domain error, and must be rebooted.  If this
happens, delete or rename any existing NJTREE.OUT file, run NJTREE and
give the name of the data file when prompted.  Then interpret the
NJTREE.OUT file directly and draw a tree or network by hand from the
tree description in the file.  Each line in the file identifies,
progressively, the branches that fuse at each node, and the length of
those branches; remember that when two OTUs or nodes join, the new node
assumes the smaller number, for example when the branch from OTU 2 joins
the branch from NODE 5, the new node is called NODE 2.

     The TREEDIS program may then be used to calculate the total branch
length of the tree, and also produces two files of the patristic
(actual) of the pairwise distances within the calculated tree;
TREEDIS.STA is in STATS format, it has three header lines and then the
distances as an upper right triangle but with each distance on a separate
line (replacing the headers with a single comment line makes this a
component of a *.DIP file), TREEDIS.DMX is the same data as a lower left
triangular matrix, and could be modified for use in NT-SYS.

        The text file TREES.DOC contains detailed instructions from the
authors of the NJTREE and TDRAW programs.  In particular it shows the
parameters to be used to re-arrange the tree layout.  This is very useful
if you wish to emphasise outliers by defining them as forming the root
of the tree.

     ERRORS If you get other errors with this option see notes in the file
README.DOC.  In particular note that the file TDRAW.DEF must be present
in the same directory as TDRAW.EXE.

POSTSCRIPT EXTENSIONS
        As the tree output is usually the preferred way to illustrate the
sample relationships, some work has been done to enhance the Postscript
output.
        Colour is now available together with the choice of paper or
transparencies as the final medium.  As noted earlier it is quite feasible
to modify the TDRAW.PS/PSA files to change sample names etc.  Similarly
you can add colour changes to the lines or text in order to highlight
branched sections of the tree.
        The file EXAMPLE.PS is the TDRAW.PS you would get by running the
EXAMPLE.DAT datafile using the Jaccard Algorithm (Option #2).  The file
COLOUR.PS is derived from EXAMPLE.PS with added definitions for colour and
changes to the Postscript commands which draw the lines and the text.  If
you have Colour Postscript printers, make copies of the two files and
interpret the differences by comparing the texts.  You will see that it is
a simple task to modify your output.
        You may need to add other code if your printer has more than one
tray.  It will depend on the make of printer etc.  We do not have wide
experience in this area but we can give some advice if you are having
problems.


Option 32 This batch file (32.BAT) does a Permutation Tail Probability
(PTP) analysis of a Neighbor Joining tree.

     The relationships of any set of objects, such as DNA samples, can
always be represented as a tree; a tree-generating program produces a
tree from pairwise distances, even if these have been calculated from
randomized data.  Thus it is important to test whether a tree calculated
from a set of distances reflects a tree- like signal in the data, or is
merely an artefact of the algorithm.  This can be tested very simply
using the DIPLOMO program (Help-MENU Option M) to compare the data
matrix used to generate a tree with the patristic distances in the
resulting tree or, alternatively, by the PTP (Permutation Tail
Probability) test (Faith, D.P. and Cranston, P.S. (1991) Could a
cladogram this short have arisen by chance alone?  On permutation tests
for cladistic structure.  Cladistics 7:1-28).

     The PTP test can be applied to trees because a tree calculated from
random distances will have a larger 'total branch length' than a tree
with 'structure'.  So the test is done by first calculating a tree from
the 'original' data and assessing its total branch length.  Next the
presence/absence data for each band position in the data is randomised
to scramble any tree-like signal it contains, and a tree and its total
branch length calculated from the pairwise distances of the randomized
data.  This process is repeated many times to obtain an estimate of the
mean total branch length of trees representing the randomised data, and
also the standard deviation of that mean.  Thus the 'tree-likeness' of
the original (unrandomized) tree is assessed as a Z-value, namely the
difference between the total branch length of the 'original' tree and
the mean of the 'random' trees, expressed as the number of standard
deviations of the randomised 'trees.'


     This option executes the batch file PTP.BAT which via COPYFILE
requests the name of the RAPDistance datafile and copies it to
RAPDNAME.DAT, and its contents to RAPD.IN.  The program NUMIND using file
RAPD.IN records the value of the number of samples (num_ind) in file
NUMIND.DAT then RAPDALGC after displaying the list of available
algorithms (see Help-menu H21) queries which distance metric is to be
used for the distance calculations.  The number of the chosen algorithm
is held in the file RAPDALGC.DAT.


     It then deletes any existing TREEDIS.DAT file, runs RAPDALGM
(modified RAPDALG.C) that requires RAPD.IN to produce the NJTREE input
file RAPD.NJT using the chosen metric, renames this NJT.IN, deletes any
existing NJTREE.OUT file, runs XNJTREE using NJT.IN to produce
NJTREE.OUT, runs XTDRAW using NJTREE.OUT to produce TDRAW.LOG, runs
XTREEDIS using TDRAW.LOG to produce TREEDIS.DMX from which EXTRACT
obtains the total branch length of the tree and adds this to
TREEDIS.DAT.

     The randomisation steps then follow.  PTP.BAT runs RAPDRAN to
randomize the data for each band in RAND.IN and then determine the total
branch length of the resulting tree using the same steps as above, and
reiterates this process 20 times.  MANSTD then calculates the Z-value
using data from the TREEDIS.DAT file and stores the result in PTP.OUT.

     While the above calculations are being made the original distance
values and the sets of values obtained with the randomized data are
sorted into distance categories and shown in histogram form using 20
intervals 0.0-0.05....0.95-1.00.  Algorithms that do not produce values
in the range 0 to 1 (i.e. metrics 3 and 15) will not give useful results.

     The output file RAPDH12.OUT contains the histograms together with
other statistics (eg Z-value as above) which indicate those parts of the
range which are contributing most to the overall Z-value.  The programs
involved are RAPDHIST, RAPDHRAN and RAPDH12.  The file NJT.IN is used
together with NUMIND.DAT which records the number of samples.  Files
RAPDH1.DAT and RAPDH2.DAT are generated by RAPDHIST and RAPDHRAN
respectively and they are used by the program RAPDH12.

Option 33 This program (RAPDCORL.C) identifies the bands that correlate
best with a provided distance matrix.

     This option activates the program RAPDCORL, which assesses which
band, or bands, has the absence/presence pattern that correlates most
closely with the distances between the samples, calculated by any means.
Thus it will determine which band gave patterns that contributed most to
the distances calculated from all the bands using a distance measure
selected under Option 21.  It could also be used to check the band
patterns against distances derived from other features.  RAPDCORL
compares a matrix recording the between-sample distances sequentially
with the presence/absence data for each band, and calculates a
correlation coefficient for each band.  For example a simple set of band
data:

                   Samples
          A       B       C       D

Band 1    1       1       0       1    (1 = Presence, 0 = Absence)

     gives an upper right triangle matrix of distances
of:

               B       C       D
      A        0       1       0   A=1 and B=1, dist = 0;
      B                1       0   A=1 and C=0, dist = 1;
      C                        1              etc

     The RAPDCORL program requires the band data in a RAPDistance file,
and any of the types of distance files generated by the RAPDALG or other
programs, namely *.NJT, *.DIP, *.DIM, *.PHY, *.NTS and *.DIS.  Numerical
characteristics of the samples, such as their positions in the 1, 2 or 3
vectors of an ordination (Multi- Dimensional Scaling or Principal
Co-ordinates) or their geographical position (2 map co-ordinates) can be
converted into a suitable distance matrix by the program 3DIST.EXE. (See
Help Menu sections H21/22 for more details.)

     As the distance matrix, that represents band differences, is
composed solely of 0/1 values, it might be thought that calculating the
correlation coefficients between this matrix and a RAPDdistance matrix,
which has continuous values, would be of little value.  On the contrary
we have found the method to be very useful in identifying those bands,
which have presence/absence patterns that are significantly correlated
with distance estimates.

     There are two output files; RAPDCORL.OUT lists the correlation
coefficients and highlights those that are significant.  The file
BANDS.OUT lists the whole 1/0 pattern so that a visual check can be made.


Option 34 This program (CORLCMP.C) identifies the samples that produce
patterns of RAPD bands that correlate worst with a provided distance
matrix.

The program requires two distance matrices; the RAPDistance matrix and
the one with which it is being correlated e.g. a geographical distance
matrix.  The program reduces each matrix by one variable at a time (i.e.
sample in the RAPDistance context) and re-calculates the correlation
coefficient for the remaining components of the two matrices.  This is
done for each variable in turn and if the correlation is improved the
particular variable giving the best improvement is removed, and the
process repeated with the remaining data until no further improvement
occurs.  This gives an indication of samples which are decreasing the
correlation between the matrices and provides information on the
relationships between groups of samples.  The output file is CORLCMP.OUT.


     Option 35 This option (batch file 35.BAT) helps to sort out groupings
in a large dataset.  Each of the samples in one datafile (called the query
datafile) is compared with the samples in another datafile (called the
main datafile).

This is useful for minimising the number of samples to be included in a
large classification by identifying samples which group together and
which can be placed into subsets of the data, do not add any useful
information and which can therefore be omitted from further analysis.

In this way a dataset with more than 100 samples can be analysed despite
the 100 sample limit of the existing programs.  Before the 100 sample
limit is reached, clear sub groupings of the samples should have
appeared.  It is then possible to select representative samples from such
groups, and test all new samples against this representative set using
the programs.  If a new sample is clearly a member of an existing group
it can then be stored in a separate file for that group , if it is not a
member of an existing  group, it can be added to the representative set.
Thus the final analysis might be a series of classifications of each of
the grouped samples, (together with one or more out groups), to
establish the structure of each group.  In addition there would be a
classification of the representative set of samples to determine the
'higher level' relationships between the groups.

The algorithm to be used is selected using the program RAPDALGC and the
comparisons are made using the program RAPDCMP1.  The output file is
RAPDCMP1.OUT.





     Option 36 This option (batch file 36.BAT) analyses the relationships
between the samples in a datafile.

      This is useful for identifying samples which group together and
which do not add any useful information and which can therefore be
omitted from further analysis.  The algorithm to be used is selected
using the program RAPDALGC and the comparisons are made using the
program RAPDCMP2.  The output file is RAPDCMP2.OUT.


MULTI-VARIATE and PHYLOGENETIC ANALYSIS

     Option 21 produces a distance file (*.NTS) that is in the format
required by the NTSYS-pc (Numerical Taxonomy for SYStematists) package.
This large analysis software package includes algorithms for principal
co-ordinates analysis and for multi-dimensional scaling; as used by
Weiller, McClure and Gibbs (1994).  Analysing distances by either of
these methods and comparing the results with a neighbor-joining tree
calculated from the same data may indicate whether the data contains a
tree-like phylogenetic signal, and hence whether a tree-like
representation is appropriate.

     Option 21 also produces distance file (*.PHY) in the format
required for the popular phylogenetic analysis package PHYLIP
(PHYLogenetic Inference Package).

     Option 12 converts a RAPDistance datafile into the format required
for analysis using the PAUP (Phylogenetic Analysis Under Parsimony)
software.

     NTSYS-pc is marketed by Exeter Software, 100 North Country Road,
Suite B, Setauket, NY 11733, USA (fax 516 751-3435).  PAUP may be
obtained from Dr David L. Swofford, Illinois Natural History Survey, 607
East Peabody Drive, Champaign, Illinois, 61820, USA, and PHYLIP from Dr
Joseph Felsenstein, Dept of Genetics SK- 50, University of Washington,
Seattle, Washington, 98195, USA (joe@genetics.washington.edu).

Reference
	Weiller, G., McClure, M.A. and Gibbs, A.J. (1995). Molecular 
Phylogenetic Analysis. In 'Molecular Basis of Virus Evolution' eds
A.J.Gibbs,C.H.Callisher and F.Garcia-Arenal pp 553-558 CU Press

STATISTICAL ANALYSIS USING WINAMOVA.

     Analysis of Molecular Variance (AMOVA) is a powerful new
statistical procedure developed by Excoffier, Smouse and Quattro (1992)
for analysing the molecular variance of genetic data sets.  The analysis
is done within WINDOWS in IBM compatible PCs using the program WINAMOVA;
the source of this program is given below.

     AMOVA studies the variance components among hierarchical partitions
in your data set in a manner analogous to Wright's hierarchical F
statistics.   For example, if your data can be classified into Regions-
>Populations->Individuals you can partition the genetic variation in 3
components: among regions, among populations within regions and among
individuals within populations.  Statistical significance is tested by
random permutation at all levels of the analysis.  An example of the
application of AMOVA to RAPD data is found in Huff, Peakall and Smouse
(1993).  More recently, these authors have developed procedures for
using AMOVA with single locus co-dominant genetic markers such as
allozymes or microsatellites (Peakall, Smouse and Huff 1995).

     Before you can analyse the Molecular Variance of your data you must
calculate a matrix of pairwise distances from the data using the
RAPDistance program; at present, this cannot be done directly by
WINAMOVA.  Raw data can be collated either using Options 1-11 of
RAPDistance, or using a spreadsheet program and Option 14 to convert the
spreadsheet file (SSDATA.S2R) to RAPDistance format.  Note that:

     - your data must be recorded in a logical hierarchical order;
individuals within each population adjacent in the file and, similarly,
populations from the same region;

     - to reduce the size of the matrix you may wish to remove
monomorphic bands (all 0's or all 1's) as they do not contribute to the
genetic distance calculation, and to remove duplicate samples with
identical profiles.  You can check for such duplications using Option 3
and remove them using Options 5 and 6, but keep track of the details as
these are required in the population files for WINAMOVA.

Steps for an AMOVA analysis using WINAMOVA:

  1. Assemble the data in a RAPDistance file as outlined above.

  2. Type >RAPD <Enter> to start the RAPDistance package;
     - choose Option 21;
     - provide the appropriate data file name;
     - choose the appropriate file naming option;

     - choose one of the distance metrics options (see Help-OPTION G);
note that AMOVA is not strictly rigorous with a non-Euclidean distances
(Huff, Peakall and Smouse 1993), though in practice this may not affect
the result, but the Excoffier distance metric 15 may be best;

     - when you are informed of 'NORMAL TERMINATION', type any key to
return to the Menu;

     - choose Option Q to quit the RAPDistance program.

  3. Prepare the files required by WINAMOVA.  First there is the
RAPDistance output file named '*.dis', which is formatted for WINAMOVA.
It contains a list of numbers identifying the samples (in the order they
appear in the matrix) in the first line, followed by a lower triangular
pairwise distance matrix, including the principal diagonal:

1 2 3 4 5 6 7 8
  0
 13   0
 15  18   0
 15  16  12   0
 20  17  15  15   0
 15  20  16  16  13   0
 13  10  14  16  15  16   0
  7  12  14  18  15  14  10   0
 16  17  15  17  10  13  13  11   0

In addition to this file WINAMOVA requires:

     - a group file named '*.grp'.  The first line identifies the number
of groups followed by an optional comment.  A list of the populations
belonging to each group follow on separate lines.

2 regions
1 2
3 4

     - a series of population files named '*1.pop' through to '*n.pop'
where n equals the total number of populations.  The first line of each
file identifies the total number of samples in that population followed
by an optional comment.  Subsequent lines identify the samples belonging
to each population and their frequency.  Note that frequencies greater
than one will only be relevant if samples with the same DNA band pattern
were removed from the raw data, and their frequencies tabulated, before
the pairwise genetic distances were calculated:

12 Population No. 1
1 1
2 4
3 1
4 6

  4. Run WINAMOVA after studying the help and example files that come
with the program.

     WINAMOVA is freeware and can be obtained by anonymous ftp.  The
latest version of `WINAMOVA 1.5 for Windows' is contained in a file
called WINAMOVA.ZIP at "anthropologie.unige.ch" (129.194.113.1).  If you
have a ftp facility, proceed as follows:

>ftp anthropologie.unige.ch (or ftp 129.194.113.1)
user : anonymous
password : <your id>
cd /pub/comp/win/amova
binary
get winamova.zip
------------------------

The author of WINAMOVA is Dr Laurent Excoffier, Genetics & Biometry Lab,
Dept of Anthropology, University of Geneva, 12, G. Revilliod, 1227
Geneva, Switzerland. Tel: +41 22 702 6965  Fax: 300 0351

     If you like the program and plan to use it regularly please let the
author know by email.  This will ensure that you are notified of any
updates.  Dr Excoffier and colleagues are now developing a major genetic
analysis package that will include AMOVA as an option.  Be sure to
acknowledge the author in any publications resulting from use of the
program.

References

Excoffier, L., Smouse, P.E. and Quattro, J.M. (1992). Analysis of
     molecular variance inferred from metric distances among DNA
     haplotypes: application to human mitochondrial DNA restriction
     data.  Gen. Soc. Amer. 131: 479-491.

Huff, D.R., Peakall, R. and Smouse, P.E. (1993). RAPD variation within
     and among natural populations of outcrossing Buffalograss (Buchloe
     dactyloides (Nutt. Engelm.)).  Theor. Applied Genet. 86: 927-934.

Morell, M.K., Peakall, R., Appels, R., Preston L.R. and Lloyd H.L.
     (1995) DNA profiling techniques for plant variety identification.
     Aust. J. Exp. Agric., 35, 807-819

Nei, M. and Li, W.H. (1979).  Mathematical model for studying genetic
     variation in terms of restriction endonucleases. Proc. Natl. Acad.
     Sci., USA 76: 5269-5273.

Peakall. R., Smouse P.E. and Huff D.R. (1995). Evolutionary implications
     of allozyme and RAPD variation in diploid populations of
     Buffalograss. (Buchloe dactyloides (Nutt. Engelm.)) Mol. Ecol. 4:
     135-147.


DIPLOMO (DIstance PLOt MOnitor) ANALYSIS.

     DIPLOMO is a package of programs for making pairwise comparisons of
different estimates of the distances between a set of taxa by plotting
them against each other in a simple scatter plot.  Taxa with similar
relative distance characteristics are thereby grouped graphically.
Groupings of different taxa may be directly identified, and the distance
characteristics of chosen groups compared using DIPLOMO tools to give them
different colours or symbols.  Thus distances calculated using different
metrics (Option 21) can be directly compared with one another, or with
the patristic distances of resulting trees (Help-MENU H).  Also distances
calculated from band patterns can be directly compared with distances
calculated from other features, such as the geographical distances
between the sites where the DNA samples (e.g. plants) were collected;
the required distance matrix of, for example, geographical distances can
be calculated from their map co-ordinates using option 22.

     DIPLOMO can be obtained by sending a DOS disk and self-addressed
envelope to Dr Georg Weiller, Research School of Biological Sciences,
Australian National University, P.O. Box 475, A.C.T., 2601. or by
Internet anonymous FTP

     - Host: life.anu.edu.au;
     - Directory: /pub/molecular_biology/software/diplomo;
     - filename: dipxxx.exe (where xxx is the version number);
     - file type: binary (self extraction archive).

     A *.DIP file is assembled using a text editor and the RAPDistance
files calculated by Option 21; use first a *.DIP file as this has the
names and a space before the headed matrix, then add *.DIM files
sequentially as each of these contains only a headed matrix (i.e. upper
left-hand triangle of a matrix without principal diagonal, and with a
single header line to identify the matrix).
     Other distance files can be added provided the format is correct, and,
most importantly, the order of the comparisons in each matrix is identical.


SUPPORT PROGRAMS

RAPDRAN scrambles the data for each band in a RAPDistance datafile.  Note
that the name for the input datafile cannot be defined by the user it is
set to RAPD.IN.  The output file has the same name.  So you must copy your
original datafile to RAPD.IN, this protects your original datafile from
being changed.

RAPDMAT takes a distance matrix file in *.NTS format, namely a lower
left triangle of distances, and converts this to two versions of a full
square matrix.  They are distinguished by the extension used.  The file
with extension .MAT has the first record containing the dimension of the
matrix, and the matrix elements following one per record.  The file with
extension .MTX has the sample names substituted for the dimension.

MDSDIST is a similar program to 3DIST (see section H22). It converts the
data in a file from the ordination programs of NTSYS. MDSDIST requires
the input file to be renamed to MDSDIST.DAT. The input data is converted
to an upper triangular distance matrix which can be used by DIPLOMO or
the tree drawing option 31.
****  NB the format used in the file MDSDIST.DAT is defined in the NTSYS
package for Multi-Dimensional Scaling arrays WITH THE LABELLING INFORMATION
REMOVED. See Help MENU section H for an example. The number of columns of
data allowed is less than or equal to 3. The program has been modified so
that the input data can be in row-wise format (the usual case) or by
columns. In either case the initial record must contain the 4 parameters
-  1, rows, cols, 0 in that order.
The output files are MDSDIST.DIP, MDSDIST.DIM and MDSDIST.NJT which can be
used in conjunction with either the RAPDCORL program (see Help MENU H33) or
the DIPLOMO package (see Help MENU O) or the tree drawing option (Help MENU
H31).

RAPDH produces a detailed histogram from an .NJT file.  This gives a
useful overall impression of the distribution of the distance values and
it enables one to identity the sample pairs that are outliers, etc.  The
output file is RAPDH.OUT.

RAPDMIN examines an NJT file and lists the 30 pairs of samples which are
closest, i.e. which have the minimum distances.  This is helpful if you
wish to look for patterns in the data which may allow you to reduce the
number of samples.  The output file is RAPDMIN.OUT.

RAPDSET generates a RAPDistance datafile which has a nominated density
of 1's randomly distributed.  Some of the algorithms have a
discriminating power which is related to the overall frequency of the
1's (presence) data.  A statistic for the 1's density of a datafile is
included in the output from Option 3 RAPDSTAT.

TBRLEN uses the file TDRAW.LOG which is an output file from running the
program TDRAW in option 31.  TDRAW.LOG contains details of the nodes and
of the branch lengths for the current tree.  TBRLEN summarises this
information and gives a value for the sum of the branch lengths.

TEXTSCAN examines a file and indicates for each character whether the
character is or is not contained in the alphanumeric set ie. (A....Z,
a...z,0...9). If it is in the set the character is printed. If not in
the set the ASCII value is printed in brackets viz. (10) and the character
is printed. In the case of non-printing characters such as SPACE (32) and
LINE FEED (10) you will only have the numeric indication. This program is
useful when you suspect that a file is corrupted and you want to check
thoroughly.

FILE FORMATS

These brief notes outline the file formats used, if they are 
unclear then check by printing the output files from your 
experiments using the RAPDdistance data file named EXAMPLE.DAT.

RAPDistance datafile:
	Name of the data file.
	Number of Samples.    
	Names of the Samples (One per record).
	Number of Populations.
	Size of each population (One per record).
	Number of Bands.
	Number of Primers.
	Names of the Primers (One per record).
	Lengths of the Primers (One per record).
	Number of bands in each Primer group (One per record).
	Code names for the bands (One per record).
The presence/absence of each band ('0' or '1'), one per record 
in the order sample 1 x band 1, sample 1 x band 2, etc; samples 
are rows, bands are columns.


Distance files output from the Option 21 (RAPDALG)

	1) NJTREE or UPGMA  (*.NJT)
	Title
	Names of the Samples(Tracks) one per record
	Blank line
	Data for the Upper Triangular Matrix excluding the diagonal values.
	(Thus if there are N samples there will be N*(N-1)/2 values).

	2) DIPLOMO      (*.DIP)
	Name1           (No spaces, max 13 characters > truncated)
	Name2
	Name n
            (Empty line)
	Title for Distance measure #1      (One record max 60 characters)
	Distance data in upper right triangle excluding diagonal elements
	any number per record (i.e. as a matrix or even one column)
	Title for Distance measure #2      (One record max 60 characters)
	Distance data in upper right triangle excluding diagonal elements
	any number per record.

	3) PHYLIP       (*.PHY)
	Record 1 contains a) Name of first sample followed by the row of
	data for that sample (i.e. N-1 values)
	Records 2 (to N) as above reducing by 1 each time and the last
	record contains only the Sample name.
	Data for the Upper Triangular Matrix excluding the diagonal values.
	(Thus if there are N samples there will be N*(N-1)/2 values).

	4) NTSYS-pc       (*.NTS)
	Record of parameters
	Names of the Samples(Tracks) one per record
	Data for the Lower Triangular Matrix including the diagonal values.
	(Thus if there are N samples there will be N*(N+1)/2 values).

	5) AMOVA          (*.DIS)
	Record of column numbers
	Data for the Lower Triangular Matrix including the diagonal values.
	(Thus if there are N samples there will be N*(N+1)/2 values).
	The AMOVA (or Windows based WINAMOVA) package requires additional
	data input files describing the sample grouping.
	See section J for full details.

	6) PAUP           (*.PAU) 
	#NEXUS
	begin data;
	dimensions ntax = ...  nchar = ... ;
	matrix
	(Taxon name #1)
	Data for that taxon
	..
	..
	(Taxon name #2)
	Data for that taxon
	..
	..
	etc.
	;
	endblock;

MDS (Multi-Dimensional Scaling) for data to NTSYS  (*.MDS)

	1) First record four integers NTYPE NROW NCOL NMISS
	   where NTYPE = 1 indicates rectangular matrix
		 NROW = number of rows
		 NCOL = number of columns
		 NMISS = Flag for missing values = 0 (None)
	2) The data values follow in free format row-wise.
	   Records starting with " ' or ` are treated as comments.

Spreadsheet datafile   (*.R2S)
	a) Number of Primers, Number of Bands, Number of Samples, 
	Number of Populations, Size of Population#1....Size of Population #n
	b) Heading Primer Names identified with correct column (Band)
	c) Heading Numbers/Primer identified with correct column (Band)
	d) Sample name, Population then the 0/1 values for each band.
	       Repeated as required.
	Example :-

	2       6       10      3       3       3       4
        AA      AA      AA      AA      ZZ      ZZ
         1       2       3       4       1       2
	A1      1       1       1       1       1       1       0       
	A2      1       1       1       1       1       1       1       
	A3      1       0       0       0       0       0       0       
	B1      2       1       0       1       0       1       1       
	B2      2       0       0       1       0       1       1       
	B3      2       0       0       1       0       0       1       
	C1      3       1       1       0       0       0       0       
	C2      3       1       0       0       0       1       1       
	C3      3       1       1       1       0       0       0       
	C4      3       1       0       1       0       1       1       

           N.B. The data entries are separated by Tabs.  When saving 
	your file ensure that you use TEXT(Tab delimited).


WINAMOVA files
	The program requires the RAPDistance output file named 
'*.dis'.  It contains a list of numbers identifying the samples 
(in the order they appear in the matrix) in the first line, 
followed by a lower triangular pairwise distance matrix, 
including the principal diagonal:

1 2 3 4 5 6 7 8
  0
 13   0
 15  18   0
 15  16  12   0
 20  17  15  15   0
 15  20  16  16  13   0
 13  10  14  16  15  16   0
  7  12  14  18  15  14  10   0
 16  17  15  17  10  13  13  11   0

In addition to this file WINAMOVA requires:

	- a group file named '*.grp'.  The first line identifies 
the number of groups followed by an optional comment.  A list 
of the populations belonging to each group follow on separate 
lines.  

2 regions
1 2
3 4

	- a series of population files named '*1.pop' through to 
'*n.pop' where n equals the total number of populations.  The 
first line of each file identifies the total number of samples 
in that population followed by an optional comment.  Subsequent 
lines identify the samples belonging to each population and 
their frequency.  Note that frequencies greater than one will 
only be relevant if samples with the same DNA band pattern were 
removed from the raw data, and their frequencies tabulated, 
before the pairwise genetic distances were calculated:

12 Population No. 1
1 1
2 4
3 1
4 6



                Acknowledgments

     
There are various programs, noted previously, which  were
written by other individuals or groups.  They are :-
     
HPGL2PS - Don McCormick, CSIRO, Division of Applied
     Physics, PO Box 218, Lindfield, NSW, Australia 2070.

NJTREE, UPGMA and TDRAW - Li Jin and JWH Ferguson, Center
     for Demographic and Population Genetics, Univ of  
     Texas, P.O. Box 20334, Houston, Texas 77225, USA.

