Most in this class have probably used a spellchecker at least once in your life (more likely, many times). Spell checkers are built into a number of programs, including microsoft word, apple pages, gmail, and others. In this assignment, you’ll be implementing a program that can suggest and fix spelling issues in text!
The spell checking program should be named spellcheck.py
.
The spell check program that you will write should accept two inputs from the console.
The first input will be the name of a text file, which will contain the text that the user wants to be spell-checked.
The second input will be the mode that the program should operate in.
The user can select to operate in one of two modes: suggest
or replace
mode.
In suggest
mode, the program should not actually make any changes to the text.
Rather, it should annotate the text with suggestions for how to spell words differently.
In replace mode, it should print out the contents of the input file with the spelling already fixed.
Below is an example of the prompts and input for this initial interaction with the user.
Enter input file:
words.txt
Enter spellcheck mode (replace or suggest):
replace
Let’s walk through a really simple example to demonstrate the difference in the two modes.
Let’s say that the file words.txt
has the below content:
one hudnred dollars
and fiveteen cents
The words hundred (hudnred) and the word fifteen (fiveteen) are spelled wrong.
In replace
mode, the contents of the file should be printed out with misspelled words replaced with correctly-spelled ones.
In suggest
mode, suggestions for how to spell the words correctly can be made.
Below is an example of the output of the spell checker in two modes.
replace mode | suggest mode |
|
|
Notice that in replace mode, the program goes ahead and replaces the misspelled words with correctly-spelled ones. In suggest mode, the incorrectly spelled words are annotated with numbers in parentheses, and the suggestions for how to spell the words correctly are shown in the legend.
How was the program able to identify the words that were not spelled correctly? Read the next section to find out.
Your program should open up and read a file names misspellings.txt
.
You can assume that this file already exists in the same directory as spellcheck.py
.
When testing on your computer, you should create your own misspellings.txt
.
This file will include information about how words are misspelled.
Using this information, you can determine which words are spelled wrong in the input file!
You can assume that each line of misspellings.txt
is formatted as follows:
correctly_spelled_word:incorrect_a,incorrect_b,incorrect_c, . . .
In other words, the first word on a line of the file is a correctly spelled word. A colon separates that word from a comma-separated string with possible misspellings of that word. For instance:
hundred:hudnred,hundrid
fifteen:fifeteen,fiveteen,fiften,fiveten
This file contains two possible misspellings of the word “hundred” and four possible misspellings of the word “fifteen”. The list of incorrectly spelled words will always have at least one word on the line. There is no particular maximum number of misspellings.
Your program should open and read the misspellings.txt
file into the program and store the mapping between misspelled and correctly-spelled words in a dictionary.
The keys of the dictionary should be the misspelled version of the word, and the value the keys map to should be the correctly-spelled version of that word.
Given the example content of misspelling.txt
shown above, the misspellings dictionary should have the keys and values shown below:
misspellings = {
'hudnred' : 'hundred',
'hundrid' : 'hundred',
'fifeteen' : 'fifteen',
'fiveteen' : 'fifteen',
'fiften' : 'fifteen',
'fiveten' : 'fifteen' }
After you have read in the file and constructed this dictionary, you can use it to determine the correct spelling of an incorrectly-spelled word. How would you do this?
After you’ve opened up the input file,loop through each line of it. You can split each line into individual words. For each word on the line, you can check if that word is one of the keys in the misspellings dictionary. If it is, you can assume it is incorrectly spelled. The correct spelling of that word is the value associated with the key, so you can grab it from the dictionary and either replace the original word (if in replace mode) or use it as a suggestion (if in suggest mode).
The words in the misspellings.txt
file will be all in lower case.
However, your program should also be able to replace or suggest fixes for misspelled capitalized words.
For instance, it should be able to change Hudnred
to Hundred
in replace mode.
Before you try to look up a word in the misspellings dictionary, determine if it is or is not capitalized (meaning that the first letter is upper-case). Then, you can convert it to lower case. If the word does need replacing, then you should make sure to capitalize the correctly spelled word if the original word was also capitalized.
You might find the string methods lower()
, isupper()
, and capitalize()
useful.
You also should make the spell checker ignore punctuation.
In particular, spellcheck.py
should be able to handle a trailing ,
, .
, ?
, or !
.
For instance, it should be able to change hudnred?
to hundred?
in replace mode.
How can this be accomplished?
Before looking up a word in the misspellings dictionary, check if the last character is one of the four types of punctuation that you should look for (,
or .
or ?
or !
).
If it is, create a variable to store that punctuation, and then trim it off of the end of the word.
Then, proceed with checking the spelling.
After you have determined if it is spelled correctly or not, you can put the punctuation at the end of the word.
There are going to be a number of test cases for this PA. There will be at least one (perhaps multiple) test cases for each of the below scenarios:
Thus, you can test out your program in chunks. Don’t try to tackle everything at once. If you follow the development strategy, you should be able to progressively pass more and more test cases, as you add more and more functionality.
Remember, every program should have a main function! Create this function, and plan to call your other functions from main. All-in-all, your program should have at least 4 functions. My recommendation for these four functions: a main function, one to populate the misspellings dictionary, one for replace mode, and one for suggest mode.
Create a function that will be responsible for opening up the misspellings file and reading in the information.
The function can assume that the misspellings.txt
file exists.
The function should open the file, read the lines, and populate the misspellings dictionary, as described previously in this specification.
Next, add the code to ask the user for the input file and open it up.
If you want, you can read the lines into a list ahead of time using the readlines()
function.
Next, work on replace mode. I recommend that you create a function for replace mode. At this point, don’t yet worry about capitalized words, punctuation, or suggest mode. Iterate through the contents of the input file, line-by-line and word-by-word. When a correctly-spelled word is encountered, print it out with a trailing space. When an incorrectly-spelled word is encountered, print out the correctly-spelled version instead.
After you have replace mode working, start on suggest mode. The general algorithm for replace mode and suggest mode is similar, so you can start off by copying the replace function and then make changes as necessary.
You might find one or the other of these easier or harder, so choose whichever you think you will be able to accomplish more quickly. Once you have one working, move to the other! Handling punctuation is extra credit.
Below are several examples.
Each example includes the contents of misspellings.txt
, the contents of the input file, the suggest mode results, and the replace mode results.
misspellings.txt | words_1.txt |
|
|
replace mode | suggest mode |
|
|
misspellings.txt | words_2.txt |
|
|
replace mode | suggest mode |
|
|
Submit this to Gradescope by Tuesday, April 2nd, by 7:00pm.
Name the file spellcheck.py
.