next up previous contents index
Next: Hangman Up: Introduction to Perl Previous: Vocabulary   Contents   Index

Binary Search

In Lookup.pl, our program had to look at each word in the list until it found ``cat''. On a FreeBSD 4.5 computer there are 235,881 words in the list. ``Cat'' is number 31,362. Even though ``Cat'' is near the begining of the alphabet, it will take 31,362 comparisons to find ``Cat''. Now we will look it up the way you lookup a word in a dictionary. With this algorithm it will only take 15 comparisons. Name the file ``dictionary.pl''

First we bring the word list into the array @words.

    $wordlist = '/usr/share/dict/words';
    open DICTIONARY, $wordlist;
    @words = <DICTIONARY>;

then, chomp takes the newline off the end of each word in @words.

    chomp(@words);

scalar() counts the words in the list

    $number = scalar(@words);

We repeatedly divide $number by 2 until it is no longer greater than 1. $log2 is the number of times we divided by 2 which is approximately the number of binary bits in $number which, in turn, is the maximum number of guesses we'll need to make.

    $number = scalar(@words);
    $log2 = 0; 
    $n = $number;
    while ($n > 1){
    print "log2 = $log2\tn = $n\n";
    $n = $n / 2;
    $log2 = $log2 + 1;
    }

Now we'll initialize the search with what we know at the begining. At first our window on the word list is the whole list.

    $begin = 0;

The number of the first word in the list is 0. (All perl arrays start with item 0)

    $end = $number - 1;

The number of the last word is 1 less than the number of words. (Because we started at 0)

    $middle = int($number /2);

The number of the middle word is about 1/2 the number of words. the ``int()'' function rounds down to the next integer.

    $choice = "cat";

Once again we'll lookup ``cat''.

We put the search algorithm in a subroutine ``lookup''. This is some code that we want to run over and over without having to type it again and again. Each time we run lookup, if we haven't found choice, we divide the window in two and continue to look in the half that has choice.

    sub lookup {
        if ($choice eq lc($words[$middle])){
            print "$choice is in the dictionary.\n";
            last;
        } 
        if ($choice gt lc($words[$middle])){
            $begin = $middle;
            $middle = int(($middle + $end)/2);
        } 
        else {
            $end = $middle;
            $middle = int(($begin + $middle)/2);
        }
    }

The new function ``lc()'' makes a string lower case.

Initialize our loop variable ``$i'' and we're off.

    $i = 0;
    while($i < $log2 + 1){
        print "$i\t\t",
        "$begin $words[$begin] \t",
        "$middle $words[$middle] \t", 
        "$end $words[$end]\n";
        lookup;
        $i++;
    }

On the FreeBSD computer it looks like this:

    alonzo % perl dictionary.pl
    log2 = 0        n = 235881
    log2 = 1        n = 117940.5
    log2 = 2        n = 58970.25
    log2 = 3        n = 29485.125
    log2 = 4        n = 14742.5625
    log2 = 5        n = 7371.28125
    log2 = 6        n = 3685.640625
    log2 = 7        n = 1842.8203125
    log2 = 8        n = 921.41015625
    log2 = 9        n = 460.705078125
    log2 = 10       n = 230.3525390625
    log2 = 11       n = 115.17626953125
    log2 = 12       n = 57.588134765625
    log2 = 13       n = 28.7940673828125
    log2 = 14       n = 14.3970336914062
    log2 = 15       n = 7.19851684570312
    log2 = 16       n = 3.59925842285156
    log2 = 17       n = 1.79962921142578
    There are 235881 words in the word list.
    There are 18 levels in the binary tree.
    0      0 A        117940 modificatory  235880 Zyzzogeton
    1      0 A         58970 eaglestone    117940 modificatory
    2      0 A         29485 Canarsee       58970 eaglestone
    3  29485 Canarsee  44227 counterimitate 58970 eaglestone
    4  29485 Canarsee  36856 Citrullus      44227 counterimitate
    5  29485 Canarsee  33170 chack          36856 Citrullus
    6  29485 Canarsee  31327 castra         33170 chack
    7  31327 castra    32248 cellaring      33170 chack
    8  31327 castra    31787 Catostomus     32248 cellaring
    9  31327 castra    31557 catchiness     31787 Catostomus
    10 31327 castra    31442 catalogist     31557 catchines
    11 31327 castra    31384 cataclasmic    31442 catalogist
    12 31327 castra    31355 casuistic      31384 cataclasmic
    13 31355 casuistic 31369 catabiotic     31384 cataclasmic
    14 31355 casuistic 31362 Cat            31369 catabiotic
    cat is in the dictionary.


next up previous contents index
Next: Hangman Up: Introduction to Perl Previous: Vocabulary   Contents   Index
Tom Hunt 2002-06-09