                      Text Information

                       Version 1.2

         Copyright (C) 1999 by Quentin J. Christensen

Thank-you for downloading "Text information" (I'll call it "TI" for short).
TI is a FREEWARE DOS program I have written which gives you information about
text files.


INDEX:

             SECTION     TITLE

             1...........ABOUT TEXT INFORMATION (TI)
             2...........WHY WAS THIS PROGRAM WRITTEN?
             3...........INSTALLATION & ZIP FILE CONTENTS
             4...........WHAT DO ALL THESE VALUES MEAN?
             5...........COMMAND LINE SWITCHES.
             6...........THE FOG INDEX
             7...........SAMPLE OUTPUT OF TI
             8...........THE PATH
             9...........REVISION HISTORY
             10..........DISCLAIMER, DISTRIBUTION AND TERMS OF USE
             11..........PROGRAM SUPPORT AND CONTACT INFORMATION



1. ABOUT TI
-----------

TI will tell you such information as how many words, characters, sentences
and lines are in a text file, as well as calculate the "Fog Index" or
readability of the file.

Af far as I know, TI will work fine under all windows shell environments,
including windows 95 (I haven't tested it under win98, os/2 or any DOS
command interpreters except MS-DOS 5.0 and up, but it should work under most
of these, please let me know if you have any problems with it)

As far as I know, TI should also be Y2k compliant, as it never actually does
anything with the date, again, if I'm wrong, please let me know, though I
can't think of why it wouldn't be compliant...

Being a 16-bit DOS program, TI cannot handle long filenames or directory
names.  If you want to pass these to TI you will have to abbrieviate them to
their 8.3 character DOS filenames.  EG:

    C:\My Documents\my text file.txt

is actually:

    C:\MYDOCU~1\MYTEXT~1.TXT

and that is what you would have to pass to TI.


2. WHY WAS THIS PROGRAM WRITTEN?
--------------------------------

TI was originally created because I'm one of those people who likes silly
meaningless facts and statistics :)  I'd also found a couple of programs
floating around the web which provided such statistics (about text files),
but I thought I could do better - I could make a program which provided even
more meaningless statistics about text files.  Also, the output of many of
these programs was often innacurate, so I've written TI in the hopes of
providing you with as much ACCURATE information, useful or not, about your
text files.



3. INSTALLATION & ZIP FILE CONTENTS
-----------------------------------

The files which are included in the original 30.5k zip file are:

TI.EXE - The program itself. (22.5k unzipped)
README.TXT - These instructions (formerly ti.txt). (21.2k unzipped)
FILE_ID.DIZ - Brief summary of Text Information. (0.2k unzipped)

In order to use TI, you will have to 'unzip' the files using 'pkzip' or a 
similar unzip program.  Documentation for unzipping files will come with
your unzip program (Though I assume you've already done this, since you're
reading this :)

To use TI simply place the TI.EXE file somewhere and run it!  None of this
windows installation and registry and what have you rubbish, just run it!



4. WHAT DO ALL THESE VALUES MEAN?
---------------------------------

I have aimed to try and calculate these values as accurately as possible.
If you do find that TI gives inaccurate values please let me know, so I can
fix it :)


Assumed (non-)text file

	Basically just reflects wether or not the /T switch was used (see next
	section "Command line switches" for an explaination of the differences
	between text and non text files).


Size of file: #### bytes.

    This is simply the size of file being analysed - how many bytes of disk
    space it takes up.


Whitespace: ####.

    These are all the spaces, tabs and carriage returns (new lines) in the file.


Lines: ####.

    How many lines the file has (acsii value 10 characters found + 1 as
	there's no carriage return after the last line).


Blank lines: ####.

    How many lines are 0 characters long.  A line with only spaces will not
	count as a blank line.


Shortest line: #### chars.

    The shortest line in the file, not counting blank lines.


Longest line: #### chars.

    The longest line in the file. (Even though your text editor may wrap lines
    at 75 - 80 characters, it doesn't mean it puts a new line in.


Average line length: #### chars.

    The average length of each line, in characters, not counting blank lines.
    Number of characters / number of non-blank lines.
    Only characters with an ascii value > 32 and < 127 are counted.


Average with blanks: #### chars.

    The average length of each line, in characters, including blank lines.
    Number of characters / total number of lines.
    Only characters with an ascii value > 32 and < 127 are counted.


Number of pages: ####.

    This is simply Number of lines / Lines per page, to give you a rough guide
    as to how many pages the printed file will take up. (Lines per page is
    discussed next).


Lines per page: ####.

    This value, 55 by default, or set with the /P switch, is how many lines
    per page to use, when calculating the number of pages in the file.


Words: ####.

    A character is any non-whitespace character from ascii value 33 to ascii
    value 127 - if you don't have an ascii chart handy, basically any of the
    keys of the standard 101-key US style keyboard, except tab, enter, space
    and the function/control keys.

    A word is a group of these characters, and is considered ended when a
    whitespace character, or any of the following characters is reached: 
    : ; ? . !

    Using this definition, this value is the number of words found in the file.


Longest word: #### chars.

    Using the above definition of a word, this is the longest word found in the
    file.


Shortest word: #### chars.

    Again using the above definition of a word, this is the shortest word found
    in the file.


Average word length: #### chars.

    The average size of each word in characters.


Sentences: ####.

    A sentence is a group of words ended with either '.' or '?' or '!' or ';'
    or ':' AND followed by a whitespace character or end of file.

    So the text in quotes, but not counting the quotes:

    "Hello, how are you?
    I am well, thank you."

    is two sentences, while:

    "Hello, how are you?I am well, thank you."

    is only one.

    Using this definition, this value is the number of sentences found in the
    file.


Words per sentence: ####.

    Using the above definitions of words and sentences, this value is the
    average number of words in each sentence.


Long words (> @ chars): ####.

    The '@' is the length of a 'long word', 9 by default, or set with the /F
    switch.

    The #### value is how many words greater than @ chars long, were found in
    the file.


Fog Index: ####.

    See section six for a full explaination of the fog index.


Long lines (> @ chars): ####.

    The @ is the length of a 'long' line, 80 characters by default, or set
    with the /L switch.

    The #### value is how many lines there are in the file which are greater 
    than @ characters long.  Usefull to see how many lines are wider than the
    screen, for instance.


Percentage of text: #### %

    This is an indication of how much 'text' is in the file.

    'Text' includes: tabs, new lines and any character with an ASCII value
    between 32 and 126, inclusive.

    

5. COMMAND LINE SWITCHES:
-------------------------

TI has several command line switches which gives you more control over what
information TI didplays.  You may use any combination of these switches, in
any order (as long as you seperate them with a space), simply add the
switch(es) after the program name on the command line, eg: to get help, you
can type:

TI /?

Here is a list of all the command line switches TI accepts:

You can use either the '/' or the '-' characters in front of the switches,
whichever you like best, they both do the same thing.  You can also type the
switches in either upper or lower case.

-?              Bring up a help screen showing the command line switches.

FILENAME        You must specify a filename to use.  You only need to specify
                a path if FILENAME is in a different directory to the current
                working directory.

-A#             Show how many times each letter of the alphabet occurs in the
                specified text file.  You may specify a character (optional)
                which the program will tell you how often it appears, if it is
                a letter, it will also say how often it is upper/lower case.

-C#             Show how many times each letter of the alphabet occurs in the
                specified text file.  You may specify an ASCII value (optional)
                which the program will tell you how often that character 
                appears in the file, if it is a letter, it will also say how 
                often it is upper/lower case.

-F#             Specify the number of characters there are in a 'long' word.
                This is used in calculating the fog index.  If this switch is
                not used the value will be 9. If it is used you must specify a
                number after the 'F'.

-L#             Display lengths of lines in the file, # is an optional value
                specifying how long a 'long' line is.

-O:name         OUTPUT switch - Prints output to both the screen and a file.
                The filename is optional, if no filename is given, the input
                name with the extention '.ti' will be used, so if the input
                file is 'myfile.txt' the default output file will be
                'myfile.ti'.  If the output file already exists, the original
                file will be renamed with a '.ti!' extention ('myfile.ti'
                would be backed up to 'myfile.ti!').  This is because the way
                I have used to output text to the screen is not compatable
                with redirecting output with the '>'
                So 'ti myfile.txt > output.txt' will only give you an empty
                file named output.txt.

-P#             The number of lines per page to use when calculating how many
                pages long the file is (the default value is 55 lines per page

-T              Wether to assume that the file is a text file or not.  'text'
                in this case means tabs, new lines and any character with an
                ascii value between 32 and 126 (inclusive).  This switch
                affects how the program treats characters which do *not* count
                as text.  By default, if you do not use this switch, the
                program assumes the input file is to be treated as a text
                file, and any characters not fitting the definition of text
                are ignored.  Using this switch causes these characters to be
                counted when calculating word, sentence and line lengths and
                averages.

-W              Show how many 1-100 length words there are in the document.
                All words longer than 99 characters are displayed as 100+
                length words.



6. THE FOG INDEX
----------------

The Fog Index is calculated as: 0.4 *(WS +L*100/W)
where W = number of words, S = number of sentences and L = number of long
words.

The fog index is approximately equal to the number of years of schooling you
would need to read the document without too much trouble, so a fox index of 5
should be able to be read by someone in grade 5 while a fog index of 12 should
be able to be read by someone with a high school education.

The algorithm I have used (above) was adapted from Robert Gunning's 1952 work
by J.R. Ferguson in 1996, and my thanks go to him.



7. SAMPLE OUTPUT OF TI
----------------------

Below is the output which you would get, if you typed "ti ti.txt /w /a /l" at
the command prompt (In actual fact, I got this output by using the /O switch
and pasting the output from the output file into this text file).  The
difference between the screen and output file text is only that the text
displayed on the screen will be more colourful :)


Text Info, version 1.2 by Quentin Christensen.    E-Mail: ogo1@mynx.wow.aust.com
Analysis of file: ti.txt    (Assumed text file).

       Size of file:   20834 bytes                        Words:    3281
         Whitespace:    5873                       Longest word:      44 chars
              Lines:     519                      Shortest word:       1 chars
        Blank lines:     165                Average word length:    4.14 chars
      Shortest line:       4 chars                    Sentences:     236
       Longest line:      80 chars           Words per sentence:   13.90
Average line length:   55.93 chars      Long words (>  9 chars):     131
Average with blanks:   38.15 chars                    Fog index:    6.40
    Number of pages:    9.42            Long lines (> 80 chars):       0
     Lines per page:      55                 Percentage of text:  100.00 %

There are 12257 letters in the file (4486 vowels and 7771 consonants):
    920 A's,    153 B's,    498 C's,    372 D's,   1436 E's,    323 F's,
    247 G's,    590 H's,    944 I's,     10 J's,     63 K's,    676 L's,
    278 M's,    846 N's,    834 O's,    266 P's,     13 Q's,    642 R's,
    877 S's,   1110 T's,    404 U's,    106 V's,    323 W's,     87 X's,
    239 Y's,     19 Z's.

Number of letters in word (LW = Letter word):
   293  1 LW's,   684  2 LW's,   599  3 LW's,   596  4 LW's,   368  5 LW's,
   221  6 LW's,   198  7 LW's,   110  8 LW's,    81  9 LW's,    64 10 LW's,
    39 11 LW's,    12 12 LW's,     2 13 LW's,     3 14 LW's,     2 16 LW's,
     1 19 LW,       1 22 LW,       1 25 LW,       1 29 LW,       1 32 LW,
     1 33 LW,       1 35 LW,       1 43 LW,       1 44 LW.

Number of characters per line (CL = character line):
  165   0 CL's,    1   4 CL,      1   5 CL,      2   6 CL's,    1   7 CL,
    4   9 CL's,    1  10 CL,      7  11 CL's,    2  12 CL's,    2  13 CL's,
    2  15 CL's,    6  16 CL's,    2  17 CL's,    1  18 CL,      2  19 CL's,
    2  20 CL's,    1  21 CL,      4  22 CL's,    4  23 CL's,    7  24 CL's,
   10  25 CL's,    4  26 CL's,    2  28 CL's,    4  29 CL's,    3  30 CL's,
    8  32 CL's,    4  33 CL's,    6  34 CL's,    2  35 CL's,    1  37 CL,
    3  38 CL's,    3  39 CL's,    2  40 CL's,    1  41 CL,      1  42 CL,
    2  43 CL's,    2  44 CL's,    8  45 CL's,    3  46 CL's,    1  47 CL,
    2  48 CL's,    3  49 CL's,    3  50 CL's,    5  52 CL's,    3  53 CL's,
    1  54 CL,      3  55 CL's,    1  56 CL,      4  57 CL's,    1  58 CL,
    2  59 CL's,    2  60 CL's,    2  61 CL's,    1  62 CL,      4  64 CL's,
    2  65 CL's,    3  66 CL's,    5  67 CL's,    1  68 CL,      4  69 CL's,
    6  70 CL's,    6  71 CL's,   17  72 CL's,   24  73 CL's,   20  74 CL's,
   29  75 CL's,   27  76 CL's,   29  77 CL's,   16  78 CL's,    4  79 CL's,
    2  80 CL's.


As you can see, TI gives you a lot of information, which is hopefully
usefull, although my housemate, James, said I could quote him as saying
"That is quite possibly the most useless program I have ever seen" while he
laughed very hard.  Well, be that as it may, you've downloaded it, and I'd
be interested to know if you agree with him, or if you do in fact find it
usefull :)



8. THE PATH
-----------

I would reccomend putting TI somewhere in the path since then you just need
to type 'TI' when you want to run it, rather than 'c:\ti\ti' or whatever.

The path statement should be in your autoexec.bat file, in the root directory
of your startup drive (ie: c:\).  If it's not and you'd like to add a path
statement (which is pretty usefull - everytime you type something on the
command line, the computer looks in the current directory and then in the
path for a program with that name), add this line somewhere in the
autoexec.bat file in the root directory of the c: drive:

PATH=c:\dos;c:\path;c:\mydir

Where the paths above (c:\dos and c:\path and c:\mydir) can be replaced by
any paths to programs you use often, this means that programs in these
directories can be run from any directory on the computer, although for some
programs you have to actually be in their directory for them to work properly.



9. REVISION HISTORY
-------------------

1.2  This release, 24th July 1999
     - Added display of:
           long lines and percent of text in file.
     - Changed command switches:
           -L# became -A#
           -Oname became -O:name
     - Added command switches:
           -L#     Display length of each line in the file, if # switch used,
                   lines greater than # characters long are counted as 'long'.
           -C#     Same as -A# switch, except # is a numerical ASCII value, so
                   'characters' such as newlines (ASCII value 10) can be more
                   easily counted.
           -T      Makes the program assume 'non-text' input and count every
                   character towards totals and averages.
     - Fixed some of the grammar / spelling errors in TI.TXT file (this one).
     - More bug fixes :).


1.1  Second release, 11th February 1999:
     - Added display of:
           blank lines, longest line, shortest line, average line length,
           number of pages and lines per page
     - Added command switches:
           -B      Don't count blank lines when calculating shortest and
                   average line length.
           -P#     count number of pages using # lines per page.
           -Oname  Print program output to a file (name) as well as standard
                   output.  Will backup 'name' if it already exists, if 'name'
                   is not given, will output to inputname.ti.
     - Improved wording of output (eg 'Average word: ##' changed to
       'Average word length: ## chars'.
     - Program will pause displaying output every screen.
     - Several (alright, many :) bug fixes, including fixing the number of
       lines in the analysed text file which TI would incorrectly report as
       being one less than it really was (after all I said about making it
       accurate...  well, no one's perfect :)


1.0  The original program, released 11th November 1998:
     - Displayed:
           size of file, whitespace, characters, sentences, words, 
           words per sentence, longest word, lines, shortest word, 
           Long words, Average word length and fox index.
     - Command switches:
           -W      display count of length of words.
           -L#     display letter information (number of each type of letter,
                   or count of particular character.
           -F#     Count words > # chars long as 'long words' when calculating
                   fox index.
           -?      Display on-screen help.



10. DISCLAIMER, DISTRIBUTION AND TERMS OF USE
--------------------------------------------

TI is email-ware, that means that you can freely distribute the program, and
all I ask is an email to let me know what you think, wether you like it or 
not, if not, what you don't like about it, how I can improve it etc...

Thanks go to Joergen Ibsen, who wrote "APACK", the utility which I packed all
the exe files with, and who asked for a mention in return, so Joergen, here
it is :) and thank you.

DISCLAIMER:  "TI" comes as-is, I take no responsability for any adverse
effects caused by the program.  IE, I've done my best to ensure that TI does
what it's supposed to, but if it doesn't, it's not my responsability.



11. PROGRAM SUPPORT AND CONTACT INFORMATION
-------------------------------------------

PLEASE NOTE: While I have no intention of changing either my email address or
the URL of the web page, they may change, If you cannot find my page because
it is not there, I try to keep my current web pages listed on, at least, the
Altavista search engine (http://www.altavista.yellowpages.com.au/).  I don't
have any intention of changing my ICQ number and see no reason why this should
change in the forseeable future.

I will answer any and all email (or ICQ messages) about TI and will help you
in any way that I can.  If you find any bugs in the program, please let me
know so that I can fix them, and if you have any suggestions for improvements
to the program I'd also love to hear from you.

Regards

Quentin Christensen.

E-Mail address: ogo1@mynx.wow.aust.com  
ICQ number:     12889482
WWW URL:  http://www.ozemail.com.au/~mynx/quentisl/programs/ti.htm