Suggestions for Using Instant Text
It's a whole lot easier to compile a Glossary than to explain all there is to Glossary structure and how to compile one. Instant Text has an excellent Help section explaining all aspects of the program and it is best to read Help on Glossary structure and compilation soon after trying one out (if not before!).
My goal here is to discuss some values one can use for word and phrase frequency when compiling a Glossary. I hope other IT users will join in with their experiences in building glossaries.
You choose Glossary from the IT menu, then Multiple Compilation. You then click on the boxes which appear on the left of the screen, one after the other: Select a Folder (the doctor's folder), Mark the Files (see them whiz by), Extract the text (more entertainment), and then Compilation.
You are then shown the Compilation screen where you set the values which IT will use in compiling the glossary.
Word Section To Include All Words?
No, I leave it blank. I don't want every proper noun
in the reports to be put into the Glossary.
Minimum Word Frequency (MWF)
A number that depends on the size of the file. A MWF of 5 is good for small files. Use 10 or more for large files (more than 1MB).
Maximum Words per Phrase (MWPP)
Usually a number such as 6, 7, or 8
Minimum Word Frequency (or MWF) refers to the minimum number of times a word must be in the target text/Doctor's files so it will be included in the Words section of the new glossary.
Naturally the size of the target text affects
the value you should use for MWF. Compiling 1 MB of text with a
minimum word frequency of 5 is likely to yield far more
words in the new glossary than 100 KB of text will. In
general you want a lot of words in your new glossary but
not every word. With target text ranging from 150 to
200 files (average around 800 MB),
If the MWF is 3, then all words occurring 3 times or more will appear in the Phrases section of the new glossary (as well as in the Words section). It seems to me low values of MWF like 2 and 3 generate too many phrases — the Phrase Advisory gets cluttered up.
Maximum Words per Phrase (or MWPP) defines the length of phrases. If you set the MWPP high (9 or 10), you get more phrases than you can use. I generally set MWPP at 6, 7 or 8.
I'm still experimenting and am not sure I have optimum results. (Note: a comma or other punctuation represents the end of a phrase, to Instant Text, so few phrases will have, say, 12 words.)
Jean Ichbiah of IT says that you get better Continuations with a MWPP of 7, 8 or 9 than with 4 or 5. And more continuations if you compile 1 MB of text than 100KB of text. Since Continuations give you so much savings in keystrokes, both these points are worth keeping in mind when setting MWPP.
Instant Text also gives you two other statistics with regard to Compilation:
The Phrases section of a Glossary is alphabetized by 2-letter groups. For example, the aa group containing aa as always comes before the as group containing as again seen.
You don't want the average number of entries in a phrase group too high, because you want all the phrases to appear in your Advisories. If you have a Phrase Group average of 9 and only 4 lines displayed in your Advisories, you will not see 5 of those phrase groups right off the bat. (Note: You can also use Move to easily reorder the entries in a phrase group.)
The figures below give some examples from glossaries I've compiled.
A1, A2, A3, A4 refer to the same source text. Similarly with B. These source texts averaged about 600KB to 1 MB.
For example, for A1, minimum word frequency was set at 14 and maximum words per phrase was 6. This yielded a glossary with 1321 different words and 2484 phrases. There were 376 phrase groups with an average of 6.6 phrases per group. The resulting glossary file was 126KB. A4 yielded better results with parameters of 6 and 8 - more words (2040), more phrases (4231), but perhaps a bit too many phrases per phrase group (9.4).
Similarly, B2 compiled with a minimum word frequency of 15 is too small with only 425 words and 483 phrases. B1 and B4 are likely to be more useful glossaries with over 2400 phrases. Continuations are likely to be richer for B4 because of the greater phrase length.
I hope these comments are helpful if you are using Instant Text, or are thinking of trying the program.
— June 1997
|Next Tip||Instant Text Home|