CSc 250: Assignment 1

This first assignment focusses on helping you get familiar with the command-line interface and learning how to use some common command line tools. For all problems in this assignment, students are limited to using the following command line tools:

pwd ls cd touch rm rmdir find echo cat sort grep head tail uniq rev cut sed if/else

You can also use bash variables however you please. Notice that a few of these commands we have not learned about in lecture. Use the man pages, Google or office-hours to figure out how to use these commands. When grading your assignments, each problem will be thoroughly tested with many types of input, so make sure you test your scripts well!

Obviously, this homework requires the use of the bash shell. If you do not have a Mac, Unix, or Linux computer that can run bash, do this assignment in one of the CS computer labs in Gould-Simpson.

Your bash scripts should be well-formatted and easy for the graders to read. Each script should have a header comment at the top (under the Sha-Bang) that has the following format:

#
# Author: Student Name
# Description:
#    A description of what this program/script does!
#    In this section, you can document how the script works, and what the
#    command-line options are.
# 

If any part of your scripts are particularly complex, you should put documentation comments above those lines of code.

Many of the script that you are to write require the user to specify an input file. Each script should check to ensure that the files passed to them exist. If the file does not exist, and error should be printed with the form:

File X does not exist

Where X is the name of the file that the user specified. We will test your scripts with invalid file names, and points will be deducted if this feature doesn’t work properly.

Problem 1

This problem requires writing a bash script that creates a directory structure using several of the bash command we have learned about so far. First, open up a new bash session with the terminal application. Navigate to the Desktop directory (cd ~/Desktop), and then create an empty directory that you can test the script with (for example, mkdir a1test). cd into this directory. In this dir, write a script named create-dirs-and-files.sh that creates a file/directory structure that looks exactly like the following:

a1
└── documents
    ├── courses
    │   ├── cs250
    │   │   ├── assignment-0.txt
    │   │   ├── assignment-1.txt
    │   |   └── study-guide.txt
    │   └── gen-ed
    │       ├── essay1.txt
    │       └── essay2.txt
    └── personal
        └── todo-list.txt

The a1 directory will have one sub-directory named documents. documents will have two subdirectories named courses and personal. courses will have two subdirectories named cs250 and gen-ed. You will also create all of the shown .txt files in their respective directories.

All of the .txt files must be empty, except for todo-list.txt and study-guide.txt. todo-list.txt must have the following contents:

* Wake up
* Go to class
* Learn all the things

study-guide.txt must look like this:

Unix: A family of operating systems
Bash: A shell interpreter for Unix systems
Python: A programming language
SQL: A language used to write queries for relational DBMSs
Postgres: A relational DBMS

We will be testing this script by executing it and then checking that the directory structure and file contents are what is specified in this assignment. Before executing it, there should be no directory named a1 in the directory you are currently at in bash. When running find a1 you should see something like:

$ find a1
find: a1: No such file or directory

After executing it (./create-dirs-and-files.sh), you should now see the following when running find a1:

$ find a1
a1
a1/documents
a1/documents/courses
a1/documents/courses/cs250
a1/documents/courses/cs250/assignment-0.txt
a1/documents/courses/cs250/assignment-1.txt
a1/documents/courses/cs250/study-guide.txt
a1/documents/courses/gen-ed
a1/documents/courses/gen-ed/essay1.txt
a1/documents/courses/gen-ed/essay2.txt
a1/documents/personal
a1/documents/personal/todo-list.txt

For this problem, use only the commands: cd mkdir touch echo cat find.

Problem 2

Write a bash script called num-sort.sh. This script processes a file that contains a list of numbers, one per line. This script takes one positional argument, which is the name of the file to process. An example input file might be named numbers.txt and look like this:

456
345
687
345
567
923
455
345
890
345
438
284
345
887
438
890

num-sort.sh should output the three numbers that occur the most in numbers.txt in descending order, along with the count of each number, to standard output. For the example given above, the output should look like so:

$ ./num-sort.sh numbers.txt
5 345
2 890
2 438

Problem 3

Write a bash script named find-name.sh. This script will search through a list of one-word names and determine if a given name exists in the list. This script takes two positional arguments. The first is the name (string) to search for. The second is the name of the file to search in. Invoking this script in bash should look something like this:

$ ./find-name.sh Sally names.txt

or

$ ./find-name.sh BillyBob names.txt

An example input file might be named names.txt and look like this:

Sally
Bill 
Donna
Rachel
Benito
Paris
Perris
Zento

find-name.sh will print the results to standard output. If the name being searched for is found, find-name.sh will print YES. If it is not found, it will print NO. Assuming that names.txt has the contents of the example given above, here are a few examples of running find-name.sh and what the output looks like:

$ ./find-name.sh Donna names.txt
YES
$ ./find-name.sh Zachary names.txt
NO
$ ./find-name.sh Ben names.txt
NO
$ ./find-name.sh benito names.txt
NO

Notice that when Ben is searched for, the result is NO. Benito exists in the list of names, which has the word Ben as a substring, but Ben does not exist in the list, so returning NO is correct. Also notice that searching for benito returns NO, because the search is case-sensitive.

Remember to thoroughly test your script!

Problem 4

Write a bash script named sort-column.sh. This script will take as input a CSV file with zero or more pieces of information per line (columns). The user will specify a column number, and the script will output all of the contents of that column in sorted order.

This script takes two positional arguments. The first will be the one-based column number to extract and sort. The second will be the name of the input file.

The following is an example of what an input file (say, people.txt) might look like:

Dylan,Smith,sd@gmail.com
Jan,Yellow,yeljan@yahoo.com
Anne,Cho,anne145@gmail.com
James,Kemp,jkemp@apple.com
Daniel,Talbot,td56@gmail.com

Running sort-column.sh on this file would look like:

Here are a few examples of what running sort-column.sh would look like with various inputs.

$ ./sort-column.sh 1 people.txt 
Anne
Daniel
Dylan
James
Jan
$
$ ./sort-column.sh 2 people.txt 
Cho
Kemp
Smith
Talbot
Yellow
$
$ ./sort-column.sh 3 people.txt 
anne145@gmail.com
jkemp@apple.com
sd@gmail.com
td56@gmail.com
yeljan@yahoo.com
$

As can be seen, only the entries in the requested column are printed to stdout, and they are printed in descending sorted order.

Problem 5

Write a script named compare-first-last.sh. This script will compare the beginning lines of one file with the end lines of another file, and determine if they are the same.

This script will take three positional arguments. The first is the name of the file to check the beginning lines in. The second is the name of the file to check the ending lines in. The third is the number of lines to compare.

Say we have two files. The first, named one.txt has the contents:

Stephen Curry
Eric Bledsoe
Devin Booker
Anthony Davis
Isaiah Thomas
DeMar DeRozan
James Harden
Russell Westbrook
John Wall
Chris Paul
LeBron James

The second, named two.txt has the contents:

Russell Westbrook
Isaiah Thomas
DeMar DeRozan
James Harden
John Wall
Chris Paul
Anthony Davis
LeBron James
Stephen Curry
Eric Bledsoe
Devin Booker

Both of these files are lists of names of NBA players. Each file has the same names, but in a different order. Notice that the first three names in one.txt are the same as the last three names of two.txt. We can use this script to confirm this. Run:

$ ./compare-first-last.sh one.txt two.txt 3
The first/last 3 lines are identical

However, if we only compare the first two lines, they are not the same:

$ ./compare-first-last.sh one.txt two.txt 2
The first/last 2 lines differ

If we check the first three lines of two.txt to the last three lines of one.txt, they are not the same. Thus, running the script should produce:

$ ./compare-first-last.sh two.txt one.txt 3
The first/last 3 lines differ

Also, if we try to compare the first/last 4 lines, there will be a difference:

$ ./compare-first-last.sh one.txt two.txt 4
The first/last 4 lines differ

If the beginning/end N lines match as expected, the script should print The first/last N lines are identical (where N is the number provided by the user as a command-line argument). If the beginning/end N lines do not match in any way, the script should print The first/last N lines differ.

This script takes two files as input. It should check that both oth the files exist, in the order that they are specified on the command line. If both do not exist, then two File X does not exist messages should be displayed. If only one does not exists, then only display one for the missing file.

Submission and grading

Each problem is worth 20% of your grade. The solutions will be graded on a mac machine. You are free to use whatever platform you like while writing the homework (Linux, Mac, or even bash in Windows 10). However, if you want to be really sure your programs will run correctly when being graded, it is your responsibility to test them on a mac machine. If programs do not run correctly, crash, etc, you may be given a grade of 0.

This was assigned on Thursday, January 19, 2017. It is due Thursday, January 26, 2017, at noon.

Turn-in instructions:

Following these turn-in instructions closely is very important, because our grading scripts will depend on some of the details. You may lose points if these instructions are not followed precisely!