cs250

Last time we covered bash variables, stdin/stderr/stdout, and bash scripting. Today, we will continue the discussion on scripting, regular expressions, conditionals, and subshells/command-substitution.

Command Substitution

It is often useful to assign the result of running some bash command to a variable, so that the result can be used later. In bash, this can be accomplished with bash subshells and command substitution. Command substitution executes a command (or sequence of commands) and returns whatever is printed to stdout/stderr. Assigning the result of a command to a variable looks like so:

VARIABLE_NAME=$( some bash command )

The bash command is placed inside of $( ). Perhaps we want to store the current working directory in a variable. We can do:

CUR_WORK_DIR=$(pwd)
$ echo ${CUR_WORK_DIR} 
/Users/bddicken/test

We can also chain together commands using pipes within the command substitution:

$ DAYS_IN_WEEK=$( cal | grep 'Mo' | wc -w )
$ echo ${DAYS_IN_WEEK} 
7

Multiple separate commands can also be run, and the textual results will be concatenated in the results:

$ TEXT=$( echo "Command One" ; echo " Command two" )
$ echo ${TEXT}
Command One Command two

Bash conditionals

When writing bash scripts, we can write conditional statements. A conditional statement in English is a statement of the form: “If X is true then do action A. Otherwise, do action B.” In a bash script we can do something similar. We can check the value of a variable or statement, and if it is a specified value we do one set of bash commands, and if not then we execute a different set of command(s).

Conditionals are very common in programming languages, and they are essential to give programs the ability to do useful things. With conditional statements, we can write programs that change their behavior based on values in the program. We will look at conditionals in more depths when we get to learning python, so for now we will just give a brief overview of how to use them and what they are useful for.

1  if [ SOMETHING ] ;
2  then
3      CMD 1
4      CMD 2
5  else
6      CMD A
7      CMD B
8  fi

This is interpreted in English as follows: “if SOMETHING is true, then execute commands CMD 1 and CMD 2, otherwise (else) execute the commands CMD A and CMD B”

The if [ SOMETHING ] ;. Is the actual condition check. SOMETHING is a statement that can evaluate to either true or false. In a moment, we will see some examples of what STATEMENT can be. then, followed by one or more commands, indicates that “these are the commands that should be run if SOMETHING was true. else, followed by one or more commands, indicates that “these are the commands that should be run if SOMETHING was not true. fi simply indicated the end of the whole if statement.

Nearly all high-level programming languages (including python, java, c, and more) have some type of if/else statement. The syntax varies slightly from language-to-language, but the general idea is the exact same.

1  if [ THING1 ] ;
2  then
3      CMD 1
4      CMD 2
5  elif [ THING2 ];
6  then
7      CMD X
8      CMD Y
9  else
10     CMD A
11     CMD B
12  fi

As you might expect, this is interpreted in English as follows: “if THING1 is true, then execute commands CMD 1 and CMD 2. If THING1 is not true but THING2 is true (elif), then execute the commands CMD X and CMD Y. Otherwise (else) execute the commands CMD A and CMD B”

We’ve covered the syntax of bash conditionals, but what exactly do these things condition on? In other wards, what kinds of things can be put in place of SOMETHING, THING1, and THING2? This documentation page has a detailed list of the types of conditions that we can use in bash, and gives examples of many of them. We will walk through a few of these examples in class.

Below is an example of a script that uses conditionals. The conditions in the if/elif/else statement are string comparisons.

#!/bin/bash

SERIES=${1}

if [ "${SERIES}" = "StarWars" ]; then
    echo "Episide 1: The Phantom Menace"
    echo "Episide 2: Attack of the Clones"
    echo "Episide 3: Revenge of the Sith"
    echo "Episide 4: A New Hope"
    echo "Episide 5: The Empire Strikes Back"
    echo "Episide 6: The Return of the Jedi"
    echo "Episide 7: The Force Awakens"
elif [ "${SERIES}" = "LOTR" ]; then
    echo "The Fellowship of the Ring"
    echo "The Two Towers"
    echo "The Return of the King"
elif [ "${SERIES}" = "IndianaJones" ]; then
    echo "Raiders of the Lost Ark"
    echo "The Temple of Doom"
    echo "The Last Crusade"
    echo "The Kingdom of the Crystal Skull"
else
    echo "I do not recognize: ${SERIES}"
fi

How do you think this script behaves for a user? Before running it, try thinking through how this script will work with various command-line arguments.

Another common task to use bash conditionals for is to check if a file exists. Bash scripts often take file names as input arguments, and it is good practice to check and make sure the file actually exists before using it to do some operation.

#!/bin/bash
 
FILE=${1}
  
if [ -f "${FILE}" ]; then
    echo "The file ${FILE} does exist!"
    exit
fi

The following is a very similar looking script, but this checks if a file does not exist:

#!/bin/bash
 
FILE=${1}
  
if [ ! -f "${FILE}" ]; then
    echo "The file ${FILE} does not exist!"
    exit
fi

When checking if files exist in your homework assignments, use these scripts as a reference.

Bash Loops

We can also write loops in bash. There are three main types of loops in bash: for-loops, while-loops, and until-loops. All of these looping types are useful, but for-loops are the most commonly used (in my experience). In the lecture notes we will focus on learning about for-loops, but feel free to study up on the other types on the page linked above.

for VAR in A B C D ;
do
    CMD 1
    CMD 2
    ...
done

How to interpret this in English: for each element in the list of items A, B, C, and D, run CMD 1, CMD 2, etc while VAR is set to the value of the current item.

for MOVIE in StarWars LOTR IndianaJones ; do
    echo "Movie Name: ${MOVIE}"
done

Movie Name: StarWars
Movie Name: LOTR
Movie Name: IndianaJones

for PERSON in Anne Fred Tom Felix Sam ; do
    echo "Person: ${PERSON}"
done

Person: Anne
Person: Fred
Person: Tom
Person: Felix
Person: Sam

It can be useful to use command substitution in conjunction with for loops, to iterate over the results of a command. Let’s say that we are currently in a directory that has the following structure of subdirectories and files:

user
├── documents
│   ├── personal-statement.docx
│   └── resume.docx
├── images
│   ├── vacation_1.jpg
│   ├── vacation_2.jpg
│   └── vacation_3.jpg
└── videos
    ├── interview.mov
    ├── vacation-2014.mov
    ├── vacation-2015.mov
    └── vacation-2016.mov

3 directories, 9 files

$ ls user/
documents   images      videos

We can use the results of ls as the sequence of items to iterate over in a loop.

for DIR in $( ls user ) ; do 
    echo "------- Files In user/${DIR} -------"
    ls user/${DIR}
    echo ""
done

------- Files In user/documents -------
personal-statement.docx resume.docx

------- Files In user/images -------
vacation_1.jpg  vacation_2.jpg  vacation_3.jpg

------- Files In user/videos -------
interview.mov       vacation-2014.mov   vacation-2015.mov   vacation-2016.mov

Loops can also be nested (you can have a loop within a loop). For example, when the following is run:

for USER in $( who | cut -d " " -f 1 ) ; do 
    echo "User: ${USER}"
    for LETTER in A B C ; do 
        echo "   ${LETTER}"
    done
done

User: bddicken
   A
   B
   C
User: _mbsetupuser
   A
   B
   C
User: bddicken
   A
   B
   C
User: bddicken
   A
   B
   C
User: bddicken
   A
   B
   C
User: bddicken
   A
   B
   C

The command who | cut -d " " -f 1 lists all users with an open shell session on a Unix system.

Regular expressions

While using bash, we often specify a string of character to the commands that we use. When we want to print a file to stdout, we can write cat some_file | grep "data". data is a sequence (string) of four characters, that specify what we want to search for in some_file. If we want to search for a name in a file, we can do grep -i "Billy" names.txt. Billy is a sequence of five characters that specifies the name of the person to search for in names.txt.

Regular expressions (often referred to as regex) are sets of characters and/or meta-characters that match (or specify) patterns. In essence, a regular expression is a pattern that describes a set of strings. Regular expressions are constructed analogously to arithmetic expressions by using various operators to combine smaller expressions. A Regular Expression string contains one or more of the following:

There are several websites around the internet that have a visual guide to the regular expressions that you type. These are very helpful for determining the desired regular expression. We will run through a few examples on Regexr, because of it’s pretty visual aid. Regular expressions are useful in many places. In this course, we will use them when writing bash commands (specifically with grep), and we will also use them when we start learning about python.

grep can take regular expressions as input when searching for / matching text. Let’s go through a few simple examples. We will start with the following text file named teams.txt:

Cleveland Cavaliers:                  
    LeBron James (23), Kyrie Irving (2) 
Golden State Warriors:                
    Stephen Curry (30), Kevin Durant (35)
Phoenix Suns:                         
    Eric Bledsoe (2), Devin Booker (1)

This file has the names of several NBA teams, and indented on the next line is a list of each teams most prominent players. First, we will try using a very simple regex to match the name “Kyrie”:

$ cat teams.txt | grep "Kyrie"
    LeBron James (23), Kyrie Irving (2)

It matched, but it grabbed the entire line of text, as we have seen form grep before. A little man-page reading and Googling will reveal that the -o option can bse used to only print the exact matching text to stdout:

$ cat teams.txt | grep -o "Kyrie"
Kyrie

Now, we are able to grab just the name we are looking for. Now Let’s try searching for all names in the file that have four characters:

$ cat teams.txt | grep -o " \w\w\w\w "
Eric

$ cat teams.txt | grep -o " \w\w\w\w\w "
James 
Kyrie 
State 
Curry 
Kevin 
Devin

$ cat teams.txt | grep -o " \w\w\w\w\w\w "
LeBron 
Irving 
Durant 
Booker

$ cat teams.txt | grep -o "(\d)"
(2)
(2)
(1)

$ cat teams.txt | grep -o "(\d\d)"
(23)
(30)
(35)

That only matched the two digit numbers. The * regex special character can be used to help us here:

$ cat teams.txt | grep -o "(\d*)"
(23)
(2)
(30)
(35)
(2)
(1)

Finally, we will try matching all of the team names using the ^ (beginning of a line) anchor special character:

$ cat teams.txt | grep -o "^.*:"
Cleveland Cavaliers:
Golden State Warriors:
Phoenix Suns:

As you can probably start to see, regular expression are very powerful, and are useful for many searching and text matching applications.

RegexOne is an excellent tutorial for beginners learning how to use regex to match various strings. You will need to know regular expressions for homework assignments and exams, and there will be problems that are more complex than the examples shown in class. Ensure that you understand all of the examples gone over in class, all of the examples in these lecture notes, and go through and understand all of the problems in the RegexOne tutorial.

Regex anchors

The two anchors in regex are $ and ^. $ matches the end of a line. ^ matches the beginning of a line.

Regex Modifiers

The following is not necessary a full documentation of all regex modifiers, but is a set of commonly used ones. You may need to research and use more than these for homeworks, labs, etc.

Regex Exercises

After reading through the above regex sections, and going through the RegexOne tutorial, try writing regular expressions to match the following specifications. As with the RegexOne tutorial, your regexes should match all of the words under MATCH and should not match any of the words under DON'T MATCH.

MATCH:
BuMbLeBeE
MoNgOoSeS
BlAcKjAcK

DON'T MATCH:
BumbLEbEE
monGooSes
blackjACK

MATCH:
utbe84m6hf84
37ndirrhfmf94
274fhh83hhkjhh
38hkjhjhfjfjj
er9ihihhhhh

DON'T MATCH
9uot324oitvyo374tb78tb3487tb
lkerhfliuafehkwlehkrugflagrflg
dfhj
sdf
2
43587f

MATCH:
bulllike
hostessship
Amerikkkan
zzz

DON'T MATCH:
person
bull
ship-of-hostess
tired

MATCH:
L.O.L.
A.F.K.
S.G.T.M.
K.O.

DON'T MATCH:
KO.
KO
.LOL.
AF.K

MATCH:
??..
?.?.\\
[\/]

DON'T MATCH:
ABC
12345
two words

MATCH:
John Cooper
Sherlock Holmes
John Watson
James Earl Jones

DON'T MATCH:
John
Sherlock
This is a longer sentence that is not a name

Comments

When writing bash scripts (and when writing code in any language) it is important to write code that is easily readable and easy for a human to follow along with. One thing that can be done to help with this is adding comments to your code. Comments do not have any functional purpose to what the script does or computes. The purpose of comments is to “annotate” your code to help readers follow along and understand what the code is doing, why it is needed, and how it is expected to be used.

Let’s say that you were writing a bash script. One of the lines in your bash script looks like this:

PAREN_NUMBERS=$( cat teams.txt | grep -o "(\d\d)" )

On first glance it might be hard to understand what this line of code is doing. To help a reader of the code understand it more quickly, we can add a comment to it. In bash, comment lines begin with #. All text following the # on the same line will not be treated as code, but as a comment. Generally, comments are added above the line of code that they are describing, like so:

# Grab all of the double-digit numbers surrounded by parenthesis in teams.txt, and store in a variable
PAREN_NUMBERS=$( cat teams.txt | grep -o "(\d\d)" )

Everything after the # on that first line is not treated as code, but as a comment. It is not necessary to put a comment on every line and section of code. In some cases, the code is clear enough to be “self-documenting.” In other-words, the code is written in a clear and concise way, such that documenting it with a comments would be redundant. But sometimes we need to use more complex code, and comments are a great way to clarify it’s purpose and usage expectations.

It is also common and good practice to put a “header” comment at the top of each script that you write. Header comments commonly include things like:

For example, we might have the following script (one that we wrote earlier in this lecture):

#!/bin/bash

for DIR in $( ls user ) ; do 
    echo "------- Files In user/${DIR} -------"
    ls user/${DIR}
    echo ""
done

We can write a header comments to help a future reader of this script understand it better:

#!/bin/bash
#
# Author: Ben Dicken 
# Contatct: bdd@email.com
# Description: 
#     This scrpt lists all of the files in each directory in the user dir.
#     This script takes no command-line arguments.
#

for DIR in $( ls user ) ; do 
    echo "------- Files In user/${DIR} -------"
    ls user/${DIR}
    echo ""
done

Notes

Some of the regex notes in this section were highly influenced (and in a few cases, taken verbatim) form this page and this page, both of which are on tldp.org. I encourage you to read through these pages too.

CSc 250: Lecture Notes: command substitution, conditionals, loops, regex

Review