CSc 250: Lecture Notes: Reviewing Lists, Dictionaries, and Files

We have already talked about lists, dictionaries, and reading/writing files. In today’s lecture, we are going to combine all of these concepts and write some example program(s).

A Simple Thesaurus

As we probably all know, a thesaurus is a book that lists words in groups of synonyms and related concepts. In this section, we’ll implement an extremely simple thesaurus. This thesaurus will only return one “similar term” for a given word. This program will read lookup requests from a user on the command line, and return the similar term.

First, let’s define how the thesaurus information will be stored. For now, we will use a simple format that we are used to be now. Each entry will be listed one-per-line. The initial word will be first, then ther will be a vertical bar (|), and this is followed by the similar word. For example:

dark | dim
small | tiny
happy | delighted
fast | speedy
cheap | bargain

Our program thesaurus.py will assume that a file named thesaurus-db.txt exists in the /tmp directory.

The first step of our program will be to read the contents of the thesaurus file and save it in a data-structure.

thesaurus = {}

f = open('/tmp/thesaurus-db.txt', 'r')
for line in f:
    sp = line.split(' | ')
    initial_word = sp[0].strip()
    similar_word = sp[1].strip()
    thesaurus[initial_word] = similar_word

After this code is executed, the thesaurus file has been read in. All of the initial words are mapped to their associated similar word in thesaurus. The initial words are the keys, and the similar words are the values.

The next step is to ask the user what word they want to know similar word(s) of:

import sys

print('Welcome to the thesaurus!')
print('What word would you like to know about?')

word = sys.stdin.readline().strip()

We should be faimiliar with this technique for reading in information.

Now let’s look up the word in the dictionary and see what we know about it.

if word in thesaurus:
    print('Similar word(s) to "' + word + '": ' + thesaurus[word])
else:
    print('Unable to find similar words to "' + word + '"')

Putting it all together, the program looks like:

###
### Author: Benjamin Dicken
### Description:
### This program reads in a thesaurus database, and provides an interface
### for users to request information about words in the database.
###

thesaurus = {}

f = open('/tmp/thesaurus-db.txt', 'r')
for line in f:
    sp = line.split(' | ')
    initial_word = sp[0].strip()
    similar_word = sp[1].strip()
    thesaurus[initial_word] = similar_word

import sys

print('Welcome to the thesaurus!')
print('What word would you like to know about?')

word = sys.stdin.readline().strip()

if word in thesaurus:
    print('Similar word(s) to "' + word + '": ' + thesaurus[word])
else:
    print('Unable to find similar words to "' + word + '"')

After doing a few test runs, this seems to work well:

$ python3 thesaurus.py
Welcome to the thesaurus!
What word would you like to know about?
fast
Similar word(s) to "fast": speedy
$ python3 thesaurus.py
Welcome to the thesaurus!
What word would you like to know about?
dark
Similar word(s) to "dark": dim
$ python3 thesaurus.py
Welcome to the thesaurus!
What word would you like to know about?
dinosoar
Unable to find similar words to "dinosoar"

Supporting Multiple Similar Words

This program is great, and works well. However, a thesaurus that only suggests one similar word isn’t super useful. Let’s make some changes to this program which will give it the ability to suggust any number of similar words.

For starters, the format of the input file needs to be changed Again, each thesaurus entry will be on it’s own line. The initial word will come first, followed by a |. After the |, the list of similar words will be separated by commas. For example:

dark | dim,dull,dusk
small | tiny,little,mini,compact
happy | delighted,cheerful
fast | speedy,rapid,brisk,swift,agile
cheap | bargain,economical

We also need to change the way we parse this file when reading it and saving it into the thesaurus dictionary. Previously, we have mapped the initial word (a string) to the similar word (also a string), but now we need to map a the initial word to multiple words. To accomplish this, we will map each initial word to a list of words.

f = open('/tmp/thesaurus-db.txt', 'r')
for line in f:
    sp = line.split(' | ')
    initial_word  = sp[0].strip()
    similar_words = sp[1].strip().split(',')
    thesaurus[initial_word] = similar_words

Notice the differences between this code and the previous version.

Next, read in a word request, the same exact way we did before.

The retreival/printing also needs to be modified so that it can report multiple similar words:

if word in thesaurus:
   print('Similar word(s) to "' + word + '": ')
    for similar_word in thesaurus[word]:
        print('  ' + similar_word)
else:
    print('Unable to find similar words to "' + word + '"')

Putting it all together looks similar:

###
### Author: Benjamin Dicken
### Description:
### This program reads in a thesaurus database, and provides an interface
### for users to request information about words in the database.
###

thesaurus = {}

f = open('/tmp/thesaurus-db.txt', 'r')
for line in f:
    sp = line.split(' | ')
    initial_word  = sp[0].strip()
    similar_words = sp[1].strip().split(',')
    thesaurus[initial_word] = similar_words

import sys

print('Welcome to the thesaurus!')
print('What word would you like to know about?')

word = sys.stdin.readline().strip()

if word in thesaurus:
    print('Similar word(s) to "' + word + '": ')
    for similar_word in thesaurus[word]:
        print('  ' + similar_word)
else:
    print('Unable to find similar words to "' + word + '"')

A few runs confirm that it works correctly:

$ python3 thesaurus.py
Welcome to the thesaurus!
What word would you like to know about?
fast
Similar word(s) to "fast":
  speedy
  rapid
  brisk
  swift
  agile
$ python3 thesaurus.py
Welcome to the thesaurus!
What word would you like to know about?
dinosoar
Unable to find similar words to "dinosoar"

Adding Definitions

Let’s add each initial word’s definition to the thesaurus database file, so that the program can both define the initial word and give similar word suggestions. Again, we need to change the format of ths database. Each entry will be on it’s own line. Each line will have two separate |s. Before the first will be the initial word, between the first and second will be the definition, and after the second will be the list of suggested words.

dark | having very little or no light | dim,dull,dusk
small | of limited size | tiny,little,mini,compact
happy | delighted, pleased, or glad, as over a particular thing | delighted,cheerful
fast | moving or able to move, operate, function, or take effect quickly | speedy,rapid,brisk,swift,agile
cheap | costing very little| bargain,economical

Now, let’s modify how we read, extract, and store this information:

thesaurus = {}
definitions = {}

f = open('/tmp/thesaurus-db.txt', 'r')
for line in f:
    sp = line.split(' | ')

    initial_word  = sp[0].strip()
    definition    = sp[1].strip()
    similar_words = sp[2].strip().split(',')

    definitions[initial_word] = definition
    thesaurus[initial_word]   = similar_words

The thesaurus words are stored the same way they were before. The definitions are stored in a new dictionary. The keys are the initial words, and the values are the definition string.

Reading in the word suggestion works the same as before.

The retreival/printing also needs to be modified so that it can report the definition:

if word in thesaurus:
    print('Definition of "' + word + '": ' + definitions[word])
    print('Similar word(s) to "' + word + '": ')
    for similar_word in thesaurus[word]:
        print('  ' + similar_word)
else:
    print('Unable to find similar words to "' + word + '"')

The final result:

###
### Author: Benjamin Dicken
### Description:
### This program reads in a thesaurus database, and provides an interface
### for users to request information about words in the database.
###

thesaurus = {}
definitions = {}

f = open('/tmp/thesaurus-db.txt', 'r')
for line in f:
    sp = line.split(' | ')

    initial_word  = sp[0].strip()
    definition    = sp[1].strip()
    similar_words = sp[2].strip().split(',')

    definitions[initial_word] = definition
    thesaurus[initial_word]   = similar_words

import sys

print('Welcome to the thesaurus!')
print('What word would you like to know about?')

word = sys.stdin.readline().strip()

if word in thesaurus:
    print('Definition of "' + word + '": ' + definitions[word])
    print('Similar word(s) to "' + word + '": ')
    for similar_word in thesaurus[word]:
        print('  ' + similar_word)
else:
    print('Unable to find similar words to "' + word + '"')

A few example runs:

$ python3 thesaurus.py
Welcome to the thesaurus!
What word would you like to know about?
fast
Definition of "fast": moving or able to move, operate, function, or take effect quickly
Similar word(s) to "fast":
  speedy
  rapid
  brisk
  swift
  agile
$ python3 thesaurus.py
Welcome to the thesaurus!
What word would you like to know about?
castaway
Unable to find similar words to "castaway"