How Realistic is the Dom Perignon Test -- Part 2

Posted by Jean Ichbiah , Tue, Nov 19, 2002, 22:13:31 Reply   Forum

During the Dom Perignon II contest, in June 2000, a discussion arose about the validity of the test on comp.sys.palmtops.pilot. Below is an excerpt of a second exchange between Gordon Walker and Arthur Hagen. The discussion revolves this time on letter frequency distribution.


From: Gordon Walker (gwalker@127.0.0.1)
Subject: Re: Dom Perignon Speed Contest - First Week Interim Results
View: Complete Thread (31 articles)
Newsgroups: alt.comp.sys.palmtops.pilot, comp.sys.palmtops.pilot
Date: 2000/06/22

On Thu, 22 Jun 2000 07:27:37 GMT, Arthur Hagen wrote:

>But there is! Look at the letter distribution, and you see that there's
>a very high percentage of space and the letters t, e, o, n and a.

Of course there are a high percentage of these letters. These are the
most commonly used letters in the english language. From an analysis
of Dickens "Tale of Two Cities" the top five letters used are:


E: 12.4%
T: 8.9%
A: 8.0%
O: 7.6%
N: 7.0%

A letter frequency table offered for use in decryption gives the
following values for its top six:


E: 130/1000
T: 93/1000
N: 78/1000
R: 77/1000
O: 74/1000
A: 73/1000

>Or look at the very first word, for that matter. All one-space jumps on
>a Fitaly keyboard (and this coincidence holds true for many other words
>too, to a greater or lesser degree. "chance", "error", "witness).

Again, if you stop to think this is exactly what is expected. I know
nothing about Fitaly except that on the PalmGear page it says "The
patented FITALY key arrangement minimizes pen travel". How can you be
complaining that the keyboard minimises the pen travel when that is
the sole aim with which it was designed?!

Remember that the QWERTY keyboard's design was to place the most
commonly used keys as far away from each other as possible to prevent
jamming in the mechanism. Also it's designed for use with ten fingers,
not one stylus. As such the amount of pen travel when used on a Palm
can only be expected to be approaching worst case. Almost any other
design would be better. But to report that a keyboard created with
diametrically opposing design considerations comes out with radically
different results is to state a truism. To take it up as a fault is
incredible.

For the record, by my reckoning the test sentence can be written in
about 510 key traversals on a qwerty keyboard and in about 280
traversals on the Fitaly layout. Yes, as I did this analysis I noticed
that lots of the words appeared in runs, but that is because it was
designed that way by taking into account the letter frequency patterns
discussed above and the most commonly occurring words.

>Of course this is also due to the FitalyStamp layout being more
>efficient, and *designed* to make it easier to reach the keys quickly.
>But yes, there is an overweight of words that are either hard to hack
>one-by-one on a querty layout, or easy to hack on a FitalyStamp layout.
>Whether by chance or by design.

Provide details. You say there is an "overweight" of words that are
easy on a Fitaly keyboard. Please list these words, indicate why they
are easier and relate to us how common they are in normal english with
respect to how common they are in the test sentence.

I have done my own frequency analysis of the letters in the test, it
is as follows:

Letter Ocr % Src1 Src2
e 17 9.340659341 12.4 13
t 17 9.340659341 8.9 9.3
o 15 8.241758242 7.6 7.4
a 13 7.142857143 8.0 7.3
n 13 7.142857143 7.0 7.8
s 10 5.494505495 6.2 6.3
h 8 4.395604396 6.5 3.5
I 7 3.846153846 6.7 7.4
y 5 2.747252747 2.0 1.9
c 4 2.197802198 2.2 3
d 4 2.197802198 4.6 4.4
r 4 2.197802198 6.1 7.7
u 4 2.197802198 2.7 2.7
w 4 2.197802198 2.3 1.6
v 3 1.648351648 0.8 1.3
f 2 1.098901099 2.2 2.8
l 1 0.549450549 3.6 3.5
m 1 0.549450549 2.5 2.5
p 1 0.549450549 1.6 2.7
b 0 0
g 0 0
j 0 0
k 0 0
q 0 0
x 0 0
z 0 0

You can see that there is a slight lack of 'e's against the standard
tables but apart from that the top 6 characters show a remarkable
correspondence to the standard distributions. Given that the test is
only 182 characters while the measures are taken against the per 1000
letter format of the one and the entire text of a novel in the other,
this level of correspondence is quite striking. In so small a sample
it is probably statistically meaningless to attach much significant to
the other letters, but even if you choose to then you will find that
in most cases the correspondence is quite close.

>Or take a look at the last sentence - it doesn't even make sense,
>because it lacks either punctuation or a word.

This is irrelevant unless you can demonstrate how this unusual
construction makes it easier to write on the Fitaly rather than the
more natural construction you would suggest.

In short you have brought a rather serious charge against this
competition by stating that it is rigged in favour of one product. Yet
you have no evidence whatsoever apart from some subjective statements
about "overweights" of certain words and the absence of some symbols
that you personally use quite often. Your criticisms seem to be in sum
that it is too good at what it is designed to do - reduce stylus
travel in the entry of normal english text.

[...]

--
Gordon Walker




| Edit | Reply   Current page