Usage Tips from Instant Text Users

Suggestions for Using Instant Text
Part 3 - Setting Values in Compiling a Glossary

by Jon Knowles

It's a whole lot easier to compile a Glossary than to explain all there is to Glossary structure and how to compile one. Instant Text has an excellent Help section explaining all aspects of the program and it is best to read Help on Glossary structure and compilation soon after trying one out (if not before!).

My goal here is to discuss some values one can use for word and phrase frequency when compiling a Glossary. I hope other IT users will join in with their experiences in building glossaries.

To compile a glossary for a doctor's work:

You choose Glossary from the IT menu, then Multiple Compilation. You then click on the boxes which appear on the left of the screen, one after the other: Select a Folder (the doctor's folder), Mark the Files (see them whiz by), Extract the text (more entertainment), and then Compilation.

You are then shown the Compilation screen where you set the values which IT will use in compiling the glossary.


First Choice: Word Section To Include All Words?

No, I leave it blank. I don't want every proper noun in the reports to be put into the Glossary.


Second Choice: Minimum Word Frequency (MWF)

A number that depends on the size of the file. A MWF of 5 is good for small files. Use 10 or more for large files (more than 1MB).


Third Choice: Maximum Words per Phrase (MWPP)

Usually a number such as 6, 7, or 8


The second and third choices determine how many words and phrases will appear in your new glossary.

Minimum Word Frequency (or MWF) refers to the minimum number of times a word must be in the target text/Doctor's files so it will be included in the Words section of the new glossary.

Naturally the size of the target text affects the value you should use for MWF. Compiling 1 MB of text with a minimum word frequency of 5 is likely to yield far more words in the new glossary than 100 KB of text will. In general you want a lot of words in your new glossary but not every word. With target text ranging from 150 to 200 files (average around 800 MB), I have had best results with a MWF of 4, 5 or 6.

If the MWF is 3, then all words occurring 3 times or more will appear in the Phrases section of the new glossary (as well as in the Words section). It seems to me low values of MWF like 2 and 3 generate too many phrases — the Phrase Advisory gets cluttered up.

Maximum Words per Phrase (or MWPP) defines the length of phrases. If you set the MWPP high (9 or 10), you get more phrases than you can use. I generally set MWPP at 6, 7 or 8.

I'm still experimenting and am not sure I have optimum results. (Note: a comma or other punctuation represents the end of a phrase, to Instant Text, so few phrases will have, say, 12 words.)

Jean Ichbiah of IT says that you get better Continuations with a MWPP of 7, 8 or 9 than with 4 or 5. And more continuations if you compile 1 MB of text than 100KB of text. Since Continuations give you so much savings in keystrokes, both these points are worth keeping in mind when setting MWPP.

Instant Text also gives you two other statistics with regard to Compilation:

The Phrases section of a Glossary is alphabetized by 2-letter groups. For example, the aa group containing aa as always comes before the as group containing as again seen.

You don't want the average number of entries in a phrase group too high, because you want all the phrases to appear in your Advisories. If you have a Phrase Group average of 9 and only 4 lines displayed in your Advisories, you will not see 5 of those phrase groups right off the bat. (Note: You can also use Move to easily reorder the entries in a phrase group.)

Glossary Examples

The figures below give some examples from glossaries I've compiled.

A1, A2, A3, A4 refer to the same source text. Similarly with B. These source texts averaged about 600KB to 1 MB.

Name
MWF
MWPP
Words
Phrases
Groups
Average
Size KB
A1:
14
6
1321
2484
376
6.6
126
A2:
12
4
1323
1821
376
4.8
92
A3:
9
3
1592
1810
405
4.4
92
A4:
6
8
2040
4231
446
9.4
226

For example, for A1, minimum word frequency was set at 14 and maximum words per phrase was 6. This yielded a glossary with 1321 different words and 2484 phrases. There were 376 phrase groups with an average of 6.6 phrases per group. The resulting glossary file was 126KB. A4 yielded better results with parameters of 6 and 8 - more words (2040), more phrases (4231), but perhaps a bit too many phrases per phrase group (9.4).

Name
MWF
MWPP
Words
Phrases
Groups
Average
Size KB
B1:
8
4
1718
2485
413
5.9
124
B2:
15
8
425
483
168
2.8
26
B3:
5
6
1088
2005
384
5.2
99
B4:
4
9
1347
2418
418
5.7
124

Similarly, B2 compiled with a minimum word frequency of 15 is too small with only 425 words and 483 phrases. B1 and B4 are likely to be more useful glossaries with over 2400 phrases. Continuations are likely to be richer for B4 because of the greater phrase length.

I hope these comments are helpful if you are using Instant Text, or are thinking of trying the program.

Jon Knowles

— June 1997