cs250

The python str type has many, many functions built into it for manipulating the string. Today we will cover many, but even what we cover today is only a subset. You can see this official python page for a more complete list. Some of the descriptions on this page are taken directly from this site.

In this lesson I will demonstrate many of these concepts in the python interactive shell.

String Subsequences

As we know by now, strings are just sequences of individual characters. Using the square-bracket syntax, we can grab sub-sequences of the characters in a string.

Below are examples of grabbing subsequence of a string (substring) starting from a particular character and going to the end of the string.

>>> title = "The LEGO Batman Movie"
>>> title[0:]
'The LEGO Batman Movie'
>>> title[4:]
'LEGO Batman Movie'
>>> title[6:]
'GO Batman Movie'
>>>

Below are examples of grabbing a substring starting from the beginning and ending at a particular index.

>>> title[:0]
''
>>> title[:4]
'The '
>>> title[:6]
'The LE'

Below are examples of grabbing a substring starting and ending at a specified index.

>>> title[4:10]
'LEGO B'
>>> title[5:15]
'EGO Batman'
>>>

Functions

str also has several built-in functions that can manipulate and do operations on strings.

Some of these function do string alterations. However, these functions to not alter the contents of the string that is being operated on, but rather return a copy of the string with the described modifications.

Some of the functions do not modify the string, but check that the string has certain properties, search the string, etc. For these, the function returns booleans or integers.

Below, we will take a look at several functions, and also study how the functions can be combined.

str.lower and str.upper

The lower and upper functions will return a copy of the string that it converted to all lower case and all upper case respectively.

>>> title.upper()
'THE LEGO BATMAN MOVIE'
>>> title
'The LEGO Batman Movie'
>>> title.lower()
'the lego batman movie'
>>> title
'The LEGO Batman Movie'

This function does not change the value of the initial variable, unless you assign the result of the call to the variable.

str.islower and str.isupper

These two functions are similar to the above, but rather than returning a modified string, they check if the string is already upper or lower case.

>>> title = "The LEGO Batman Movie"
>>> upp = "UPPER CASE TEXT"
>>> low = "lower case text"
>>> 
>>> title.isupper()
False
>>> title.islower()
False
>>> upp.isupper()
True
>>> upp.islower()
False
>>> low.isupper()
False
>>> low.islower()
True

Below is an example of first getting an upper-case copy of a string and then checking if it is upper case, which will be true:

>>> title.upper().isupper()
True

>>> title.lower().islower()
True

str.strip, str.lstrip and str.rstrip

These function will remove characters on the left side of the string (lstrip) and the right side of the string (rstrip), and both sides of the string (strip). The functions take one argument, which is a string containing all of the characters to remove at the beginning or end (or both) of the string. For example:

>>> numbers = 'one two three four'
>>> numbers.lstrip('otwne')
' two three four'
>>> numbers.lstrip('otwne ')
'hree four'
>>> numbers.rstrip('rfoue ')
'one two th'
>>> numbers.rstrip('rfoue')
'one two three '
>>> numbers.strip('rfoue ')
'ne two th'

If no argument is passed to the lstrip and rstrip functions, it defaults to removing leading/trailing whitespace, which includes spaces, tabs, and newlines. For example:

>>> spacing = '    LEGOOOOO               '
>>> spacing
'    LEGOOOOO               '
>>> spacing.lstrip()
'LEGOOOOO               '
>>> spacing.rstrip()
'    LEGOOOOO'
>>> spacing.strip()
'LEGOOOOO'

str.isnumeric

This function returns true if the string has only numeric characters in it. For example:

>>> thing_one = '123'
>>> thing_two = 'This is not a number'
>>> thing_one.isnumeric()
True
>>> thing_two.isnumeric()
False

str.isalnum

This function returns true if the string has only alpha-numeric characters. For example:

>>> thing_one = '123'
>>> thing_two = 'This is not a number'
>>> thing_three = 'This string has s0me numbers 77'
>>> thing_four = 'numbersAndLetters77'
>>> thing_one.isalnum()
True
>>> thing_two.isalnum()
False
>>> thing_three.isalnum()
False
>>> thing_four.isalnum()
True

str.isalpha

This function returns true if the string has only alphabetical characters. For example:

>>> thing_one = '123'
>>> thing_two = 'These are some letters'
>>> thing_three = 'TheseAreSomeLetters'
>>> thing_one.isalpha()
False
>>> thing_two.isalpha()
False
>>> thing_three.isalpha()
True

str.startswith and str.endswith

Returns True is the string starts or ends with the designated string. For example:

>>> thing_one = '123'
>>> thing_two = 'These are some letters'
>>> thing_one.startswith('123')
True
>>> thing_one.startswith('12')
True
>>> thing_two.startswith('These are')
True
>>> thing_two.startswith('The')
True
>>> thing_two.endswith('ers')
True
>>> thing_two.endswith('lettERS')
False
>>> thing_two.startswith('teese')
False

str.find

This function is responsible for searching for substrings within a string. It takes a single argument, which is a string to search for within the string this function is being called on. This function is useful for determining if a string contains a particular substring. It returns the index of the first occurrence of the string being searched for. If the string being searched for cannot be found, it will return -1. For example:

>>> beginning = 'It was the best of times, it was the worst of times'
>>> beginning.find('was')
3

Notice that the string 'was' appears multiple times in beginning. However, the index of the first occurrence of the word is what the function returns. A few more examples:

>>> beginning.find('of times')
16
>>> beginning.find('It')
0

And here are some examples showing how the function behaves when the searched-for substring cannot be found:

>>> beginning.find('sail')
-1
>>> beginning.find('WORST')
-1

str.count

The count functions will compute the number of times that a particular substring appears within the string (not including overlaps). For example:

>>> beginning = 'It was the best of times, it was the worst of times'
>>> beginning.count('was')
2
>>> beginning.count(' was ')
2
>>> beginning.count('best')
1
>>> beginning.count('es')
3
>>> beginning.count('warp')
0

It is important to realize that count only counts non-overlapping regions of the strings for matches of a particular string. For example:

>>> pattern = 'atatatat'
>>> pattern.count('at')
4

>>> pattern.count('atat')
2

The string 'atat' pattern actually appears three times, but since count does not include overlapping patterns, it returns a count of 2. Similarly:

>>> pattern.count('atatat')
1

str.replace

This functions returns a copy of the string with all occurrences of substring old (the first argument) replaced by new (the second argument).

>>> beginning = 'It was the best of times, it was the worst of times'
>>> beginning.replace('was', 'could be')
'It could be the best of times, it could be the worst of times'
>>> beginning.replace('times', 'classes')
'It was the best of classes, it was the worst of classes'
>>> beginning.replace('was the ', '')
'It best of times, it worst of times'

str.split

This function will split a string into multiple separate strings based on the split character specified as the first argument. This function returns a list of strings, each string being one substring of the original string. We have not talked about lists in detail yet, but we will soon, at which point this function will be a little easier to understand. Let’s walk through a few examples.

>>> questions = 'Are you OK? Are you feeling well? How have you been?'

>>> questions.split('?')
['Are you OK', ' Are you feeling well', ' How have you been', '']

The result is a list of four strings. As you can see, each of the strings are the substrings that appeared on either side of the three ? characters in the original string. This is the result of “splitting” questions on the ? character.

>>> questions.split('Are')
['', ' you OK? ', ' you feeling well? How have you been?']

Again, the return value is a list of all of the substrings that appeared before/after the word Are.

Another example, in which we split a sentence by spaces, and get each individual word in the resulting string list:

>>> statement = 'My favorite color is green'
>>> statement.split(' ')
['My', 'favorite', 'color', 'is', 'green']

Splitting a string is not much use if we cannot access the individual strings returned in the list. First, we should save the results into a variable.

>>> statement = 'My favorite color is green'
>>> words = statement.split(' ')
>>> print(words)
['My', 'favorite', 'color', 'is', 'green']

To access each individual word, we can use list index syntax to pick out a particular index of the list of words. Accessing each word uses a zero-based count. This means that the first words is at index “0”, the second word is at index “1”, the third word is at index “2”, etc.

>>> words[0]
'My'
>>> words[1]
'favorite'
>>> words[2]
'color'
>>> words[3]
'is'
>>> words[4]
'green'

>>> words[5]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range

Write a function named first_mid_last that takes one argument, which is expected to be a complete english sentence. This function should print out the first word, middle word, and last word of the sentence.

Write a function named replace_and_summarize. This function will take two arguments. The first is a word to search for (call this word), and the second is a string to search for the word in (call this sentence). The function will print out how many times the word appeared in the second arguments. It will also print out the original sentence will all instances of word surrounded by astericks (*).

>>> replace_and_summarize("test", "That test was a hard test to take")
"test" was found 2 times.
UPDATED SENTENCE: That *test* was a hard *test* to take

>>> replace_and_summarize("it", "It was the best of times, it was the worst of times")
"it" was found 1 times.
UPDATED SENTENCE: It was the best of times, *it* was the worst of times

>>> replace_and_summarize("was", "It was the best of times, it was the worst of times")
"was" was found 2 times.
UPDATED SENTENCE: It *was* the best of times, it *was* the worst of times

>>> replace_and_summarize("zoo", "It was the best of times, it was the worst of times")
"zoo" was found 0 times.
UPDATED SENTENCE: It was the best of times, it was the worst of times

CSc 250: Python String Manipulation