In Lookup.pl, our program had to look at each word in the list until it found ``cat''. On a FreeBSD 4.5 computer there are 235,881 words in the list. ``Cat'' is number 31,362. Even though ``Cat'' is near the begining of the alphabet, it will take 31,362 comparisons to find ``Cat''. Now we will look it up the way you lookup a word in a dictionary. With this algorithm it will only take 15 comparisons. Name the file ``dictionary.pl''
First we bring the word list into the array @words.
$wordlist = '/usr/share/dict/words'; open DICTIONARY, $wordlist; @words = <DICTIONARY>;
then, chomp takes the newline off the end of each word in @words.
chomp(@words);
scalar() counts the words in the list
$number = scalar(@words);
We repeatedly divide $number by 2 until it is no longer greater than 1. $log2 is the number of times we divided by 2 which is approximately the number of binary bits in $number which, in turn, is the maximum number of guesses we'll need to make.
$number = scalar(@words); $log2 = 0; $n = $number; while ($n > 1){ print "log2 = $log2\tn = $n\n"; $n = $n / 2; $log2 = $log2 + 1; }
Now we'll initialize the search with what we know at the begining. At first our window on the word list is the whole list.
$begin = 0;
The number of the first word in the list is 0. (All perl arrays start with item 0)
$end = $number - 1;
The number of the last word is 1 less than the number of words. (Because we started at 0)
$middle = int($number /2);
The number of the middle word is about 1/2 the number of words. the ``int()'' function rounds down to the next integer.
$choice = "cat";
Once again we'll lookup ``cat''.
We put the search algorithm in a subroutine ``lookup''. This is some code that we want to run over and over without having to type it again and again. Each time we run lookup, if we haven't found choice, we divide the window in two and continue to look in the half that has choice.
sub lookup { if ($choice eq lc($words[$middle])){ print "$choice is in the dictionary.\n"; last; } if ($choice gt lc($words[$middle])){ $begin = $middle; $middle = int(($middle + $end)/2); } else { $end = $middle; $middle = int(($begin + $middle)/2); } }
The new function ``lc()'' makes a string lower case.
Initialize our loop variable ``$i'' and we're off.
$i = 0; while($i < $log2 + 1){ print "$i\t\t", "$begin $words[$begin] \t", "$middle $words[$middle] \t", "$end $words[$end]\n"; lookup; $i++; }
On the FreeBSD computer it looks like this:
alonzo % perl dictionary.pl log2 = 0 n = 235881 log2 = 1 n = 117940.5 log2 = 2 n = 58970.25 log2 = 3 n = 29485.125 log2 = 4 n = 14742.5625 log2 = 5 n = 7371.28125 log2 = 6 n = 3685.640625 log2 = 7 n = 1842.8203125 log2 = 8 n = 921.41015625 log2 = 9 n = 460.705078125 log2 = 10 n = 230.3525390625 log2 = 11 n = 115.17626953125 log2 = 12 n = 57.588134765625 log2 = 13 n = 28.7940673828125 log2 = 14 n = 14.3970336914062 log2 = 15 n = 7.19851684570312 log2 = 16 n = 3.59925842285156 log2 = 17 n = 1.79962921142578 There are 235881 words in the word list. There are 18 levels in the binary tree. 0 0 A 117940 modificatory 235880 Zyzzogeton 1 0 A 58970 eaglestone 117940 modificatory 2 0 A 29485 Canarsee 58970 eaglestone 3 29485 Canarsee 44227 counterimitate 58970 eaglestone 4 29485 Canarsee 36856 Citrullus 44227 counterimitate 5 29485 Canarsee 33170 chack 36856 Citrullus 6 29485 Canarsee 31327 castra 33170 chack 7 31327 castra 32248 cellaring 33170 chack 8 31327 castra 31787 Catostomus 32248 cellaring 9 31327 castra 31557 catchiness 31787 Catostomus 10 31327 castra 31442 catalogist 31557 catchines 11 31327 castra 31384 cataclasmic 31442 catalogist 12 31327 castra 31355 casuistic 31384 cataclasmic 13 31355 casuistic 31369 catabiotic 31384 cataclasmic 14 31355 casuistic 31362 Cat 31369 catabiotic cat is in the dictionary.