CSc 110 - Spell Check

Spell Check Meme

Most in this class have probably used a spellchecker at least once in your life (more likely, many times). Spell checkers are built into a number of programs, including microsoft word, apple pages, gmail, and others. In this assignment, you’ll be implementing a program that can suggest and fix spelling issues in text!

Program behavior

The spell checking program should be named spellcheck.py. The spell check program that you will write should accept two inputs from the console. The first input will be the name of a text file, which will contain the text that the user wants to be spell-checked. The second input will be the mode that the program should operate in. The user can select to operate in one of two modes: suggest or replace mode. In suggest mode, the program should not actually make any changes to the text. Rather, it should annotate the text with suggestions for how to spell words differently. In replace mode, it should print out the contents of the input file with the spelling already fixed. Below is an example of the prompts and input for this initial interaction with the user.

Enter input file:
words.txt
Enter spellcheck mode (replace or suggest):
replace

Let’s walk through a really simple example to demonstrate the difference in the two modes. Let’s say that the file words.txt has the below content:

one hudnred dollars
and fiveteen cents

The words hundred (hudnred) and the word fifteen (fiveteen) are spelled wrong. In replace mode, the contents of the file should be printed out with misspelled words replaced with correctly-spelled ones. In suggest mode, suggestions for how to spell the words correctly can be made. Below is an example of the output of the spell checker in two modes.

replace mode suggest mode
Enter input file:
words.txt
Enter spellcheck mode (replace or suggest):
replace

--- OUTPUT ---
one hundred dollars
and fifteen cents
Enter input file:
words.txt
Enter spellcheck mode (replace or suggest):
suggest

--- OUTPUT ---
one hudnred (1) dollars
and fiveteen (2) cents

--- LEGEND ---
(1) hundred
(2) fifteen

Notice that in replace mode, the program goes ahead and replaces the misspelled words with correctly-spelled ones. In suggest mode, the incorrectly spelled words are annotated with numbers in parentheses, and the suggestions for how to spell the words correctly are shown in the legend.

How was the program able to identify the words that were not spelled correctly? Read the next section to find out.

misspellings.txt

Your program should open up and read a file names misspellings.txt. You can assume that this file already exists in the same directory as spellcheck.py. When testing on your computer, you should create your own misspellings.txt. This file will include information about how words are misspelled. Using this information, you can determine which words are spelled wrong in the input file! You can assume that each line of misspellings.txt is formatted as follows:

correctly_spelled_word:incorrect_a,incorrect_b,incorrect_c, . . .

In other words, the first word on a line of the file is a correctly spelled word. A colon separates that word from a comma-separated string with possible misspellings of that word. For instance:

hundred:hudnred,hundrid
fifteen:fifeteen,fiveteen,fiften,fiveten

This file contains two possible misspellings of the word “hundred” and four possible misspellings of the word “fifteen”. The list of incorrectly spelled words will always have at least one word on the line. There is no particular maximum number of misspellings.

Your program should open and read the misspellings.txt file into the program and store the mapping between misspelled and correctly-spelled words in a dictionary. The keys of the dictionary should be the misspelled version of the word, and the value the keys map to should be the correctly-spelled version of that word. Given the example content of misspelling.txt shown above, the misspellings dictionary should have the keys and values shown below:

misspellings = { 
    'hudnred' : 'hundred',
    'hundrid' : 'hundred',
    'fifeteen' : 'fifteen',
    'fiveteen' : 'fifteen',
    'fiften' : 'fifteen',
    'fiveten' : 'fifteen' }

After you have read in the file and constructed this dictionary, you can use it to determine the correct spelling of an incorrectly-spelled word. How would you do this?

After you’ve opened up the input file,loop through each line of it. You can split each line into individual words. For each word on the line, you can check if that word is one of the keys in the misspellings dictionary. If it is, you can assume it is incorrectly spelled. The correct spelling of that word is the value associated with the key, so you can grab it from the dictionary and either replace the original word (if in replace mode) or use it as a suggestion (if in suggest mode).

Preserving Capitalization

The words in the misspellings.txt file will be all in lower case. However, your program should also be able to replace or suggest fixes for misspelled capitalized words. For instance, it should be able to change Hudnred to Hundred in replace mode.

Before you try to look up a word in the misspellings dictionary, determine if it is or is not capitalized (meaning that the first letter is upper-case). Then, you can convert it to lower case. If the word does need replacing, then you should make sure to capitalize the correctly spelled word if the original word was also capitalized.

You might find the string methods lower(), isupper(), and capitalize() useful.

Ignoring Punctuation (Extra Credit)

You also should make the spell checker ignore punctuation. In particular, spellcheck.py should be able to handle a trailing ,, ., ?, or !. For instance, it should be able to change hudnred? to hundred? in replace mode. How can this be accomplished?

Before looking up a word in the misspellings dictionary, check if the last character is one of the four types of punctuation that you should look for (, or . or ? or !). If it is, create a variable to store that punctuation, and then trim it off of the end of the word. Then, proceed with checking the spelling. After you have determined if it is spelled correctly or not, you can put the punctuation at the end of the word.

Test Cases

There are going to be a number of test cases for this PA. There will be at least one (perhaps multiple) test cases for each of the below scenarios:

Thus, you can test out your program in chunks. Don’t try to tackle everything at once. If you follow the development strategy, you should be able to progressively pass more and more test cases, as you add more and more functionality.

Development Strategy

(0) Create a main function

Remember, every program should have a main function! Create this function, and plan to call your other functions from main. All-in-all, your program should have at least 4 functions. My recommendation for these four functions: a main function, one to populate the misspellings dictionary, one for replace mode, and one for suggest mode.

(1) Reading in misspellings.txt

Create a function that will be responsible for opening up the misspellings file and reading in the information. The function can assume that the misspellings.txt file exists. The function should open the file, read the lines, and populate the misspellings dictionary, as described previously in this specification.

(2) Input file

Next, add the code to ask the user for the input file and open it up. If you want, you can read the lines into a list ahead of time using the readlines() function.

(3) Replace Mode

Next, work on replace mode. I recommend that you create a function for replace mode. At this point, don’t yet worry about capitalized words, punctuation, or suggest mode. Iterate through the contents of the input file, line-by-line and word-by-word. When a correctly-spelled word is encountered, print it out with a trailing space. When an incorrectly-spelled word is encountered, print out the correctly-spelled version instead.

(4) Suggest Mode

After you have replace mode working, start on suggest mode. The general algorithm for replace mode and suggest mode is similar, so you can start off by copying the replace function and then make changes as necessary.

(5/6) Handle Punctuation / Handle Capitalization

You might find one or the other of these easier or harder, so choose whichever you think you will be able to accomplish more quickly. Once you have one working, move to the other! Handling punctuation is extra credit.

Examples

Below are several examples. Each example includes the contents of misspellings.txt, the contents of the input file, the suggest mode results, and the replace mode results.

Example 1 (no punctuation or capitalized words)

misspellings.txt words_1.txt
dramatic:datamic,dramati
elephant:elaphant,elofent,elaphent
zoo:zooo,zo
joe and his family went to the zoo the other day
the zooo had many animals including an elofent
the elaphant was being too dramati though
after they walked around joe left the zo
replace mode suggest mode
Enter input file:
words_1.txt
Enter spellcheck mode (replace or suggest):
replace

--- OUTPUT ---
joe and his family went to the zoo the other day
the zoo had many animals including an elephant
the elephant was being too dramatic though
after they walked around joe left the zoo
Enter input file:
words_1.txt
Enter spellcheck mode (replace or suggest):
suggest

--- OUTPUT ---
joe and his family went to the zoo the other day
the zooo (1) had many animals including an elofent (2)
the elaphant (3) was being too dramati (4) though
after they walked around joe left the zo (5)

--- LEGEND ---
(1) zoo
(2) elephant
(3) elephant
(4) dramatic
(5) zoo

Example 2 (with punctuation and capitalized words)

misspellings.txt words_2.txt
hero:hereo,heroc
flew:fluwe
jumped:jumpedd,jimped,jumpped
saved:savved,sived,sivved
superman:sumprean,sumperan
day:dayy,ayy
There once was a hero, named superman.
Sumperan, being the hero he is, jumped.
After he jumped, he fluwe!
Then, Sumprean savved the day.
replace mode suggest mode
Enter input file:
words_2.txt
Enter spellcheck mode (replace or suggest):
replace

--- OUTPUT ---
There once was a hero, named superman.
Superman, being the hero he is, jumped.
After he jumped, he flew!
Then, Superman saved the day.
Enter input file:
words_2.txt
Enter spellcheck mode (replace or suggest):
suggest

--- OUTPUT ---
There once was a hero, named superman.
Sumperan, (1) being the hero he is, jumped.
After he jumped, he fluwe! (2)
Then, Sumprean (3) savved (4) the day.

--- LEGEND ---
(1) Superman
(2) flew
(3) Superman
(4) saved

Submission

Submit this to Gradescope by Tuesday, November 3th, by 7:00pm. Name the file spellcheck.py.