CSc 250 Assignment 2

This assignment has a few more “advanced” bash programs to write than the last one. Specifically, we will need to use complex regular expressions to solve some of these problems. For all problems in this assignment, students are limited to using the following command line tools:

pwd ls cd touch rm rmdir find echo cat sort grep head tail uniq rev cut sed awk date if/else for wc

Notice that a few of these commands we have not learned about in lecture. Use the man pages, Google or office-hours to figure out how to use these commands. When grading your assignments, each problem will be thoroughly tested with many types of input, so make sure you test your scripts well!

Obviously, this homework requires the use of the bash shell. If you do not have a Mac, Unix, or Linux computer that can run bash, do this assignment in one of the CS computer labs in Gould-Simpson.

Your bash scripts should be well-formatted and easy for the graders to read. Each script should have a header comment at the top (under the Sha-Bang) that has the following format:

#
# Author: Student Name
# Description:
#    A description of what this program/script does!
#    In this section, you can document how the script works, and what the
#    command-line options are.
#

If any part of your scripts are particularly complex, you should put documentation comments above those lines of code.

Many of the script that you are to write require the user to specify an input file. Each script should check to ensure that the files passed to them exist. If the file does not exist, and error should be printed with the form:

File X does not exist

Where X is the name of the file that the user specified. We will test your scripts with invalid file names, and points will be deducted if this feature doesn’t work properly.

Problem 1 (25 Points)

In this problem, you will be writing regular expressions that match and don’t match specified strings. For each entry below, write a single regular expression that matches the string listed after “Matches:” and does not match the strings listed after “Doesn’t Match”.

You will put the resulting regular expressions in a file named regex.txt. Each regex will be on it’s own line of the .txt file. (A) will go one line 1, (B) will go on line two, and (C) will go on line 3.

When coming up with your solutions, look for patters and key differences in the strings to match and the strings to ignore. I recommend using RegExr while comping up with your solutions. Also, remember to enable the multi-line flag.

Problem 2 (25 Points)

Write a bash script named three-vowels.sh. This script finds words with three (or more) vowels in a row in them. This script takes one positional argument, and returns each word that contains three vowels in a row (in a sequence). Each word will be printed to stdout, one per line.

This is a very curious lion.
Onomatopoeia is a really long word!
Why do you always talk about superfluOus things?
This sentence does not have a word with three vowels.
And neither does this one.

$ ./three-vowels.sh words.txt 
curious
Onomatopoeia
superfluOus

Notice that the search is case insensitive, because it included superfluOus (upper-case O) in the result.

Problem 3 (25 Points)

Write a bash script named sequence.sh. This script searches for words with sequences of characters. The user of the script can specify the characters to look for, and the length of the sequence.

The first argument is a string of characters. The characters in this string specify the characters to search for sequences of. For example, say the string "rdft" is provided as the first argument. sequence.sh will search for sequences of the characters r d f t in any order. The script only needs to handle alphanumeric characters in this input string (1-9, A-Z, a-z).

The second argument is a number, specifying the length of the sequence to search for.

Heeeeeyyyyy... How are ya?
This is a very curious lion.
look, look! a book.
Onomatopoeia is a really long word!
Why do you always talk about superfluOus things?
This sentence does not have a word with three vowels.
And neither does this one.

$ ./sequence.sh aeiou 3 words.txt 
Heeeeeyyyyy
curious
Onomatopoeia

$ ./sequence.sh AEIOUaeiou 3 words.txt 
Heeeeeyyyyy
curious
Onomatopoeia
superfluOus

$ ./sequence.sh y 4 words.txt 
Heeeeeyyyyy

$ ./sequence.sh o 2 words.txt 
look
look
book

$ ./sequence.sh abcdefg 2 words.txt 
Heeeeeyyyyy
really
about
sentence
three

Notice that the searching this script does is case-sensitive. When you run your version of the script, you should get this same output for all of these examples.

Problem 4 (25 Points)

Write a bash script named word-search.sh. This script consumes an input file full of words (one per line) and searches for those words in another file. This script will take two positional arguments. The first is the name of a text file that contains a list of words, one per line. The second is the name of a text file to search in.

This script will grab all of the words from the first file, and use a for loop to iterate over them all and search for each one individually in the second file.

torch
employ
trade
palladium
qwerty
azerty
adapted

In this example, we will use this text file discussing palladium. Go ahead and download it if you want to follow along with the examples below. (You should also test your script on other text files in addition to this one.)

$ ./word-search.sh words.txt metal.txt 
torch appears 1 times in metal.txt
employ appears 1 times in metal.txt
trade appears 1 times in metal.txt
palladium appears 33 times in metal.txt
qwerty appears 0 times in metal.txt
azerty appears 0 times in metal.txt
adapted appears 2 times in metal.txt

Problem 5 (15 Points, Extra Credit)

Write a script named date.sh. This script will print out a day of the week, depending on what other arguments the user specifies. This script will take one positional argument. The positional argument is a string representing the day to print. This option can legally be one of three values: yesterday, today, or tomorrow. If the value provided by the user is not exactly one of these three, the following should be printed to the command line, and the program should stop:

Invalid option. Use yesterday, today, or tomorrow.

This script will dynamically print the day of the week that corresponds with yesterday, today, or tomorrow.

$ ./date.sh blah
Invalid option. Use yesterday, today, or tomorrow.
$ ./date.sh yesterday
Wednesday
$ ./date.sh today
Thursday
$ ./date.sh tomorrow
Friday
$

$ ./date.sh blah
Invalid option. Use yesterday, today, or tomorrow.
$ ./date.sh yesterday
Monday
$ ./date.sh today
Tuesday
$ ./date.sh tomorrow
Wednesday
$

This script should print the appropriate yesterday, today, and tomorrow days for all days of the week. You can test that this script works for multiple days of the week either by manually changing the date on your computer in your system preferences, or by testing out your script over the course of multiple days.

Submission and grading

This problem will be graded out of 100 points. With the extra credit, there is a possibility for a total of 115 points.

This was assigned on Thursday, January 26, 2017. It is due Thursday, February 2, 2017, at 5:00pm.