We have already talked about lists, dictionaries, and reading/writing files. In today’s lecture, we are going to combine all of these concepts and write some example program(s).
As we probably all know, a thesaurus is a book that lists words in groups of synonyms and related concepts. In this section, we’ll implement an extremely simple thesaurus. This thesaurus will only return one “similar term” for a given word. This program will read lookup requests from a user on the command line, and return the similar term.
First, let’s define how the thesaurus information will be stored.
For now, we will use a simple format that we are used to be now.
Each entry will be listed one-per-line.
The initial word will be first, then ther will be a vertical bar (|
), and this is followed by the similar word.
For example:
dark | dim
small | tiny
happy | delighted
fast | speedy
cheap | bargain
Our program thesaurus.py
will assume that a file named thesaurus-db.txt
exists in the /tmp
directory.
The first step of our program will be to read the contents of the thesaurus file and save it in a data-structure.
thesaurus = {}
f = open('/tmp/thesaurus-db.txt', 'r')
for line in f:
sp = line.split(' | ')
initial_word = sp[0].strip()
similar_word = sp[1].strip()
thesaurus[initial_word] = similar_word
After this code is executed, the thesaurus file has been read in.
All of the initial words are mapped to their associated similar word in thesaurus
.
The initial words are the keys, and the similar words are the values.
The next step is to ask the user what word they want to know similar word(s) of:
import sys
print('Welcome to the thesaurus!')
print('What word would you like to know about?')
word = sys.stdin.readline().strip()
We should be faimiliar with this technique for reading in information.
Now let’s look up the word in the dictionary and see what we know about it.
if word in thesaurus:
print('Similar word(s) to "' + word + '": ' + thesaurus[word])
else:
print('Unable to find similar words to "' + word + '"')
Putting it all together, the program looks like:
###
### Author: Benjamin Dicken
### Description:
### This program reads in a thesaurus database, and provides an interface
### for users to request information about words in the database.
###
thesaurus = {}
f = open('/tmp/thesaurus-db.txt', 'r')
for line in f:
sp = line.split(' | ')
initial_word = sp[0].strip()
similar_word = sp[1].strip()
thesaurus[initial_word] = similar_word
import sys
print('Welcome to the thesaurus!')
print('What word would you like to know about?')
word = sys.stdin.readline().strip()
if word in thesaurus:
print('Similar word(s) to "' + word + '": ' + thesaurus[word])
else:
print('Unable to find similar words to "' + word + '"')
After doing a few test runs, this seems to work well:
$ python3 thesaurus.py
Welcome to the thesaurus!
What word would you like to know about?
fast
Similar word(s) to "fast": speedy
$ python3 thesaurus.py
Welcome to the thesaurus!
What word would you like to know about?
dark
Similar word(s) to "dark": dim
$ python3 thesaurus.py
Welcome to the thesaurus!
What word would you like to know about?
dinosoar
Unable to find similar words to "dinosoar"
This program is great, and works well. However, a thesaurus that only suggests one similar word isn’t super useful. Let’s make some changes to this program which will give it the ability to suggust any number of similar words.
For starters, the format of the input file needs to be changed
Again, each thesaurus entry will be on it’s own line.
The initial word will come first, followed by a |
.
After the |
, the list of similar words will be separated by commas.
For example:
dark | dim,dull,dusk
small | tiny,little,mini,compact
happy | delighted,cheerful
fast | speedy,rapid,brisk,swift,agile
cheap | bargain,economical
We also need to change the way we parse this file when reading it and saving it into the thesaurus
dictionary.
Previously, we have mapped the initial word (a string) to the similar word (also a string), but now we need to map a the initial word to multiple words.
To accomplish this, we will map each initial word to a list of words.
f = open('/tmp/thesaurus-db.txt', 'r')
for line in f:
sp = line.split(' | ')
initial_word = sp[0].strip()
similar_words = sp[1].strip().split(',')
thesaurus[initial_word] = similar_words
Notice the differences between this code and the previous version.
Next, read in a word request, the same exact way we did before.
The retreival/printing also needs to be modified so that it can report multiple similar words:
if word in thesaurus:
print('Similar word(s) to "' + word + '": ')
for similar_word in thesaurus[word]:
print(' ' + similar_word)
else:
print('Unable to find similar words to "' + word + '"')
Putting it all together looks similar:
###
### Author: Benjamin Dicken
### Description:
### This program reads in a thesaurus database, and provides an interface
### for users to request information about words in the database.
###
thesaurus = {}
f = open('/tmp/thesaurus-db.txt', 'r')
for line in f:
sp = line.split(' | ')
initial_word = sp[0].strip()
similar_words = sp[1].strip().split(',')
thesaurus[initial_word] = similar_words
import sys
print('Welcome to the thesaurus!')
print('What word would you like to know about?')
word = sys.stdin.readline().strip()
if word in thesaurus:
print('Similar word(s) to "' + word + '": ')
for similar_word in thesaurus[word]:
print(' ' + similar_word)
else:
print('Unable to find similar words to "' + word + '"')
A few runs confirm that it works correctly:
$ python3 thesaurus.py
Welcome to the thesaurus!
What word would you like to know about?
fast
Similar word(s) to "fast":
speedy
rapid
brisk
swift
agile
$ python3 thesaurus.py
Welcome to the thesaurus!
What word would you like to know about?
dinosoar
Unable to find similar words to "dinosoar"
Let’s add each initial word’s definition to the thesaurus database file, so that the program can both define the initial word and give similar word suggestions.
Again, we need to change the format of ths database.
Each entry will be on it’s own line.
Each line will have two separate |
s.
Before the first will be the initial word, between the first and second will be the definition, and after the second will be the list of suggested words.
dark | having very little or no light | dim,dull,dusk
small | of limited size | tiny,little,mini,compact
happy | delighted, pleased, or glad, as over a particular thing | delighted,cheerful
fast | moving or able to move, operate, function, or take effect quickly | speedy,rapid,brisk,swift,agile
cheap | costing very little| bargain,economical
Now, let’s modify how we read, extract, and store this information:
thesaurus = {}
definitions = {}
f = open('/tmp/thesaurus-db.txt', 'r')
for line in f:
sp = line.split(' | ')
initial_word = sp[0].strip()
definition = sp[1].strip()
similar_words = sp[2].strip().split(',')
definitions[initial_word] = definition
thesaurus[initial_word] = similar_words
The thesaurus words are stored the same way they were before. The definitions are stored in a new dictionary. The keys are the initial words, and the values are the definition string.
Reading in the word suggestion works the same as before.
The retreival/printing also needs to be modified so that it can report the definition:
if word in thesaurus:
print('Definition of "' + word + '": ' + definitions[word])
print('Similar word(s) to "' + word + '": ')
for similar_word in thesaurus[word]:
print(' ' + similar_word)
else:
print('Unable to find similar words to "' + word + '"')
The final result:
###
### Author: Benjamin Dicken
### Description:
### This program reads in a thesaurus database, and provides an interface
### for users to request information about words in the database.
###
thesaurus = {}
definitions = {}
f = open('/tmp/thesaurus-db.txt', 'r')
for line in f:
sp = line.split(' | ')
initial_word = sp[0].strip()
definition = sp[1].strip()
similar_words = sp[2].strip().split(',')
definitions[initial_word] = definition
thesaurus[initial_word] = similar_words
import sys
print('Welcome to the thesaurus!')
print('What word would you like to know about?')
word = sys.stdin.readline().strip()
if word in thesaurus:
print('Definition of "' + word + '": ' + definitions[word])
print('Similar word(s) to "' + word + '": ')
for similar_word in thesaurus[word]:
print(' ' + similar_word)
else:
print('Unable to find similar words to "' + word + '"')
A few example runs:
$ python3 thesaurus.py
Welcome to the thesaurus!
What word would you like to know about?
fast
Definition of "fast": moving or able to move, operate, function, or take effect quickly
Similar word(s) to "fast":
speedy
rapid
brisk
swift
agile
$ python3 thesaurus.py
Welcome to the thesaurus!
What word would you like to know about?
castaway
Unable to find similar words to "castaway"