We previously implemented thesaurus.py
in these lecture notes.
In that implementation, we used a text file to store the similar words and definitions.
We will implement the same program, but we will use a sqlite3 database.
Review the linked notes if you forgot how we did this before!
In the previous implementation of thesaurus-db.txt
, we stored all of the information in a text file.
By the end of the lecture notes, the format of the file looked like:
dark | having very little or no light | dim,dull,dusk
small | of limited size | tiny,little
happy | delighted, pleased, or glad, as over a particular thing | delighted,cheerful
fast | moving or able to move, operate, function, or take effect quickly | speedy,rapid,brisk
cheap | costing very little | bargain,economical
Each line has two vertical bars. The word before the first vertical bar is the thesaurus word (the “initial” word). The text between the first and second bar is the definition. Then there is a list of comma-separated words after the second bar, which are the similar words.
We need to design a database schema to store this same information. There are many ways to go about this, but we’ll start with a very simple design.
First, let’s cd
to the desktop and startup/create a new database:
$ cd
$ cd test/
$ sqlite3 thesdb
SQLite version 3.14.0 2016-07-26 15:17:14
Enter ".help" for usage hints.
sqlite>
Next, let’s create the schema. We will design this database to have two tables. One table will represent a word, and will include the word’s definition and a word id:
sqlite> CREATE TABLE word (word TEXT, definition TEXT, wid INT PRIMARY KEY);
Next, we will create a table to represent a word being similar to another word.
This will be a table with just two IDs.
Each ID will be a FOREIGN KEY
into the word table, because it references words in that table:
sqlite> CREATE TABLE similar (wid INT, sid INT, FOREIGN KEY (wid) REFERENCES word(wid), FOREIGN KEY (sid) REFERENCES word(wid));
This will be used to represent two words being “similar”.
Next, let’s populate this database to contain the same information that the text file did.
First, we’ll populate the word
table:
INSERT INTO word VALUES ('dark', 'having very little or no light', 1);
INSERT INTO word VALUES ('small', 'of limited size', 2);
INSERT INTO word VALUES ('happy', 'delighted, pleased, or glad, as over a particular thing', 3);
INSERT INTO word VALUES ('fast', 'moving or able to move, operate, function, or take effect quickly', 4);
INSERT INTO word VALUES ('cheap', 'costing very little', 5);
The next step is to populate the similar
table to relate similar words to each-other.
However, the similar
table only related words that are in the word
table, so we need to add each similar word to the table also.
For expediency, we’ll not include the definitions.
INSERT INTO word VALUES ('dim', '', 6);
INSERT INTO word VALUES ('dull', '', 7);
INSERT INTO word VALUES ('tiny', '', 8);
INSERT INTO word VALUES ('little', '', 9);
INSERT INTO word VALUES ('delighted', '', 10);
INSERT INTO word VALUES ('cheerful', '', 11);
INSERT INTO word VALUES ('speedy', '', 12);
INSERT INTO word VALUES ('rapid', '', 13);
INSERT INTO word VALUES ('brisk', '', 14);
INSERT INTO word VALUES ('bargain', '', 15);
INSERT INTO word VALUES ('economical', '', 16);
Now the word table has the following contents:
sqlite> SELECT * FROM word;
dark|having very little or no light|1
small|of limited size|2
happy|delighted, pleased, or glad, as over a particular thing|3
fast|moving or able to move, operate, function, or take effect quickly|4
cheap|costing very little|5
dim||6
dull||7
tiny||8
little||9
delighted||10
cheerful||11
speedy||12
rapid||13
brisk||14
bargain||15
economical||16
Finally, we can add entries to the similar
table using these IDs.
INSERT INTO similar VALUES (1, 6);
INSERT INTO similar VALUES (1, 7);
INSERT INTO similar VALUES (2, 8);
INSERT INTO similar VALUES (2, 9);
INSERT INTO similar VALUES (3, 10);
INSERT INTO similar VALUES (3, 11);
INSERT INTO similar VALUES (4, 12);
INSERT INTO similar VALUES (4, 13);
INSERT INTO similar VALUES (4, 14);
INSERT INTO similar VALUES (5, 15);
INSERT INTO similar VALUES (5, 16);
sqlite> SELECT * FROM similar;
1|6
1|7
2|8
2|9
3|10
3|11
4|12
4|13
4|14
5|15
5|16
Because SQL is so flexible, getting all of the similar words for a given word can be done in many ways. One technique is pretty straightforward, but is a multi-query process.
Say we want to find the words similar to “fast.”
First, we need to find the wid
of “fast.”
sqlite> SELECT wid FROM word where word == 'fast';
4
Next, find the IDs of all the similar words.
sqlite> SELECT similar.sid FROM word JOIN similar on similar.wid == word.wid WHERE similar.wid == 4;
12
13
14
Finally, select the words with these ids from word
:
sqlite> SELECT word from word where wid == 12;
speedy
sqlite> SELECT word from word where wid == 13;
rapid
sqlite> SELECT word from word where wid == 14;
brisk
Here’s the python implementation, using the database and the queries we previously discussed.
###
### Author: Benjamin Dicken
### Description:
### This program acts as a thesaurus.
### Is uses a SQLite thesaurus database to get the similar words.
###
import sys
import sqlite3
print('Welcome to the thesaurus!')
print('What word would you like to know about?')
word = sys.stdin.readline().strip()
conn = sqlite3.connect('thesdb')
r1 = conn.execute("SELECT wid FROM word WHERE word == '" + str(word) + "'")
wid = r1.fetchone()
if wid is not None:
print('Similar word(s) to "' + word + '": ')
sids = conn.execute("SELECT similar.sid FROM word JOIN similar on similar.wid == word.wid WHERE similar.wid == " + str(wid[0]))
for sid in sids:
r2 = conn.execute("SELECT word FROM word WHERE wid == " + str(sid[0]))
word = r2.fetchone()
print(' ' + str(word[0]))
else:
print('Unable to find similar words to "' + word + '"')
And below are a few runs.
$ python3 thesaurus.py
Welcome to the thesaurus!
What word would you like to know about?
fast
Similar word(s) to "fast":
speedy
rapid
brisk
$ python3 thesaurus.py
Welcome to the thesaurus!
What word would you like to know about?
dark
Similar word(s) to "dark":
dim
dull
$ python3 thesaurus.py
Welcome to the thesaurus!
What word would you like to know about?
dinosoar
Unable to find similar words to "dinosoar"