CSc 250: Lecture Notes: File I/O

Opening a File

As with any programming language, it is useful to be able to read data from a file and write data to a file in python. In this section we are going to learn how to interact with files in python.

Before we can interact with a file (read it, write to it, append to it), a file must be opened. Python has a built-in function named open which is used for this purpose. Below is an example:

f = open('some_file_name', 'r')

This python statement opens a file named some_file_name in “read” mode (indicated by the 'r'). The open file object is stored into the f variable. The first argument to open is the name of the file. The second argument is the mode that the file is opened in. Possible options for this argument are:

So, for example, doing:

f = open('some_file_name', 'r')

Will give you the ability to read some_file_name but not modify it. But:

f = open('some_file_name', 'r+')

Will give you the ability to both read and write some_file_name.

Reading a file

Lets say we have a file named roles.txt with the following contents:

Han Solo       | Harrison Ford
Luke Skywalker | Mark Hamill
Batman         | Christian Bale
Thor           | Chris Hemsworth
Sherlock       | Benedict Cumberbatch

To open this file, run:

>>> role_file = open('roles.txt', 'r')

At this point, the file is open and readable, but we have not actually read any of the contents yet.

To read the entire contents of the file as a string, we can call the read function.

>>> roles = role_file.read()

Now roles has the entire contents of the roles.txt file.

>>> print(roles)
Han Solo       | Harrison Ford
Luke Skywalker | Mark Hamill
Batman         | Christian Bale
Thor           | Chris Hemsworth
Sherlock       | Benedict Cumberbatch

The file can also be read one-line at-a-time with the function readline.

>>> role_file = open('roles.txt', 'r')
>>> line = role_file.readline()
>>> print(line)
Han Solo       | Harrison Ford

>>> line = role_file.readline()
>>> print(line)
Luke Skywalker | Mark Hamill

>>> line = role_file.readline()
>>> print(line)
Batman         | Christian Bale

>>> line = role_file.readline()
>>> print(line)
Thor           | Chris Hemsworth

>>> line = role_file.readline()
>>> print(line)
Sherlock       | Benedict Cumberbatch

Notice that there is an extra newline printed each time we print(line). This is because each line read in has a newline (\n) role at the end of it in the file. print() adds an additional newline by default. As with reading in from standard input via sys.stdin.realine(), we want to rstrip() the newline away:

>>> role_file = open('roles.txt', 'r')
>>> line = role_file.readline().rstrip()
>>> print(line)
Han Solo       | Harrison Ford
>>> line = role_file.readline().rstrip()
>>> print(line)
Luke Skywalker | Mark Hamill
>>> line = role_file.readline().rstrip()
>>> print(line)
Batman         | Christian Bale
>>> line = role_file.readline().rstrip()
>>> print(line)
Thor           | Chris Hemsworth
>>> line = role_file.readline().rstrip()
>>> print(line)
Sherlock       | Benedict Cumberbatch

We can also loop through each line of a text file individually using a for loop.

>>> role_file = open('roles.txt', 'r')
>>> for line in role_file:
...     print(line.rstrip())
...
Han Solo       | Harrison Ford
Luke Skywalker | Mark Hamill
Batman         | Christian Bale
Thor           | Chris Hemsworth
Sherlock       | Benedict Cumberbatch
>>>

A complete program that reads from a file

Lets’ write a complete python program that uses the roles.txt file to do something useful. This program will be called getrole.py. This program will treat the roles.txt as a database of movie roles and the actor corresponding to the role. The mapping between the role and the actor will be stored in an internal dictionary.

Once read in, the user will be able to ask what actor played as particular role.

First, declare the dictionary:

# Create an empty map data structure representing the database of roles.
# All of the movie roles will be stored here.
role_db = {}

Next, open the file:

# Open roles.txt
# This script assumes roles.txt is in the current working directory
role_file = open('roles.txt', 'r')

Now we must read in the contents of roles.txt line-by-line, and save the information into role_db.

# Loop through each line of the file and save into role_db
for line in role_file:
    sp = line.split('|')
    role = sp[0].strip()
    actor     = sp[1].strip()
    role_db[role] = actor

Now that we’ve read in and stored the entire role database, we should ask the user what role they’d like to know the actor of. We will allow the user ask for as much information as they want in a loop.

# Let the user get role/actor information in a loop
while True:
    print ('PROGRAM: What role would you like to know about?')
    line = sys.stdin.readline().rstrip()
    if line == 'exit':
        print('PROGRAM: Bye!')
        sys.exit()
    sp = line.split('tell me who played')
    if len(sp) > 1:
        role = sp[1].strip()
        if role in role_db:
            print(role_db[role] + ' played ' + role)
        else:
            print('PROGRAM: Not sure!')
    else:
        print('PROGRAM: Huh?')

As can be seen, the user will be able to figure out what actor played a role by typing:

tell me who played X

Where X is the name of a role in a movie. If getrole.py can find the role in role_db it will reply with the value associated with the role key. If it does not recognize the role, it will print Not sure!. If the command that it reads in is completely unintelligible, it will print Huh?. Once the user is done, they can type exit to quit the program.

Putting this all together, we get:

import sys

# Create an empty map data structure representing the database of roles
# All of the movie roles will be stored here
role_db = {}

# Open roles.txt
# This script assumes roles.txt is in the current working directory
role_file = open('roles.txt', 'r')

# Loop through each line of the file and save into role_db
for line in role_file:
    sp = line.split('|')
    role = sp[0].strip()
    actor     = sp[1].strip()
    role_db[role] = actor

# Let the user get role/actor information in a loop
while True:
    print ('PROGRAM: What role would you like to know about?')
    line = sys.stdin.readline().rstrip()
    if line == 'exit':
        print('PROGRAM: Bye!')
        sys.exit()
    sp = line.split('tell me who played')
    if len(sp) > 1:
        role = sp[1].strip()
        if role in role_db:
            print(role_db[role] + ' played ' + role)
        else:
            print('PROGRAM: Not sure!')
    else:
        print('PROGRAM: Huh?')

When run from bash, we get:

$ python3 getrole.py
PROGRAM: What role would you like to know about?
tell me who played Sherlock
Benedict Cumberbatch played Sherlock
PROGRAM: What role would you like to know about?
tell me who played Han Solo
Harrison Ford played Han Solo
PROGRAM: What role would you like to know about?
tell me who played Darth Vader
PROGRAM: Not sure!
PROGRAM: What role would you like to know about?
what actor played the Joker?
PROGRAM: Huh?
PROGRAM: What role would you like to know about?
exit
PROGRAM: Bye!
$

In building this program, we combined several concepts wever learned over tha past few weeks: advanced string manipulation, dictionaries, lists, reading a file, reading from standard input, etc.

Writing to a file

To write to a file, we have to open the file in write mode. Say that we do not currently have a file named roles.txt in the current working directory, but we want to populate the file with a python program.

To open the the file in write mode:

>>> role_file = open('roles.txt', 'w')

To write a string to the file, use the write() function:

>>> role_file.write('Han Solo       | Harrison Ford')
30
>>>

This will write the string into the file. Notice that the write() function returns the number of characters written to the file. In order for the file to actually save to the computer’s hard drive, it must be closed:

>>> role_file.close()

This is what we get when we try to cat the contents of the file from bash:

$ cat roles.txt
Han Solo       | Harrison Ford$

The file write worked! Notice that a newline was not automatically added to the end of the string. The write() function does not add newlines by default. So if you want a line break in the file that you write you, they must be specified with the \n character.

Let’s re-open the file and write a few lines with newlines:

>>> role_file = open('roles.txt', 'w')
>>> role_file.write('Thor           | Chris Hemsworth\n')
33
>>> role_file.write('Sherlock       | Benedict Cumberbatch\n')
38
>>> role_file.write('Iron Man       | Robert Downey Jr\n')
34
>>> role_file.close()

Now we can cat the file from bash and we will see:

$ cat roles.txt
Thor           | Chris Hemsworth
Sherlock       | Benedict Cumberbatch
Iron Man       | Robert Downey Jr
$

The three lines that were written to the file (along with the newlines) show up correctly. However, notice that the line we previously wrote to the file (Han Solo | Harrison Ford) is no longer there. We did not explicitly delete the file, so why is it gone?

When a file is opened in write mode, writing to the file starts from the beginning by default. If there is already content in the file, the content will be over-written starting at the beginning. This can be demonstrated with another example.

Say we have a file named text.txt:

$ cat text.txt
ABCDEFGHIJKLMNOPQRSTUVWXYZ
$

This file has a total of 27 characters (the 26 characters of the alphabet, and the newline at the end). Now, let’s write a few characters to this file in python:

>>> text_file = open('text.txt', 'w')
>>> text_file.write('Iron Man\n')
9
>>> text_file.close()

Now the contents of the file are:

$ cat text.txt
Iron Man
$

The old contents are completely gone! Sometimes, overwriting the file each time we want to add something is the desired behavior, but at other times it is not.

Appending to a file

In append mode, we can write to a file but instead of erasing the previous contents, the content that is written gets appended to the end of the file.

In the same vein as the previous example, let’s say the roles.txt file does not yet exist. To “open” this file in append mode:

>>> role_file = open('roles.txt', 'a')

This opens the (new) file roles.txt in append mode. Now, we will write the same three lines as the earlier example, and close the file.

>>> role_file.write('Thor           | Chris Hemsworth\n')
33
>>> role_file.write('Sherlock       | Benedict Cumberbatch\n')
38
>>> role_file.write('Iron Man       | Robert Downey Jr\n')
34
>>> role_file.close()

The contents of the file are now:

$ cat roles.txt
Thor           | Chris Hemsworth
Sherlock       | Benedict Cumberbatch
Iron Man       | Robert Downey Jr
$

Now let’s try opening the file again, and appending a few more lines:

>>> role_file = open('roles.txt', 'a')
>>> role_file.write('Bane           | Tom Hardy\n')
27
>>> role_file.write('Luke Skywalker | Mark Hamill\n')
29
>>> role_file.write('Superman       | Henry Cavill\n')
30
>>> role_file.close()

And the contents of the file are now…

$ cat roles.txt
Thor           | Chris Hemsworth
Sherlock       | Benedict Cumberbatch
Iron Man       | Robert Downey Jr
Bane           | Tom Hardy
Luke Skywalker | Mark Hamill
Superman       | Henry Cavill

Notice that the additional lines did not over-write the old ones.

Let’s use what we’ve learned to write a python program in which we can add roles to the roles.py program that we wrote earlier. Let’s call this program addrole.py

Below is a program that does this:

import sys

# Open roles.txt
# This script assumes roles.txt is in the current working directory
role_file = open('roles.txt', 'a')

# Let the user add role/actor information to roles.txt until
# the exit command is given
while True:
    print ('PROGRAM: Add a role?')
    line = sys.stdin.readline().rstrip()
    if line == 'exit':
        print('PROGRAM: Bye!')
        sys.exit()
    sp = line.split('played')
    if len(sp) > 1:
        actor = sp[0].strip()
        role  = sp[1].strip()
        role_file.write(role + ' | ' + actor + '\n')
        print('PROGRAM: Role added!')
    else:
        print('PROGRAM: Huh?')

First, the roles.txt file is opened in append mode. Then, standard input is read from the user in a loop. If the user enters a line of the form X played Y, addrole.py will write this role to the file. If the user types exit, the role file will be close (thus saving the contents) and the program will exit. On any other input, the program will reply with Huh?.

Below is a demonstration of running this program, and the contents of roles.txt are shown before and after the run.

$ cat roles.txt
$ python3 addrole.py
PROGRAM: Add a role?
Henry Cavill played Superman
PROGRAM: Role added!
PROGRAM: Add a role?
Christian Bale played Batman
PROGRAM: Role added!
PROGRAM: Add a role?
Chris Evans played Captain America
PROGRAM: Role added!
PROGRAM: Add a role?
exit
PROGRAM: Bye!
$
$ cat roles.txt
Superman | Henry Cavill
Batman | Christian Bale
Captain America | Chris Evans
$

Now getrole.py and addrole.py can be used in conjunction to create a database of roles/actors and loop-up what actor played a particular role.

Inserting/Deleting mid-file

Appending is great, but what if we want to add/remove a line to/from a file, but not at the end? The simplest way to do this (in most cases) is to actually:

This perhaps is not very efficient, but it is straightforward and works well. Continuing with the running example, say that roles.txt is:

Superman | Henry Cavill
Batman | Christian Bale
Captain America | Chris Evans

Say we want to insert Thor | Chris Hemsworth to the second line of the file, and shift all of the other lines below it. Using the technique described above, we could do:

data = []
f = open('./roles.txt', 'r')
for line in f:
    data.append(line)

data.insert(1, 'Thor | Chris Hemsworth\n')

f = open('./roles.txt', 'w')
for d in data:
    f.write(d)

f.close()

First, the entire file is read in, line-by-line, into the data list. Next, 'Thor | Chris Hemsworth\n' is inserted at the second index. Lastly, data is written out to the same roles.txt file, and the file is closed.

After running this, roles.txt looks like:

Superman | Henry Cavill
Thor | Chris Hemsworth
Batman | Christian Bale
Captain America | Chris Evans

Removing an arbitrary line can function similarly. To remove the Batman | Christian Bale line from roles.txt, we can do:

data = []
f = open('./roles.txt', 'r')
for line in f:
    data.append(line)

data.pop(2)

f = open('./roles.txt', 'w')
for d in data:
    f.write(d)

f.close()

This is exactly the same as the previous code, except an element is popped from the list rather than added. After running this code roles.txt looks like:

Superman | Henry Cavill
Thor | Chris Hemsworth
Captain America | Chris Evans

Removed!

Flushing a file

As mentioned before, when a file is written to, it is not immediately saved to disk. One way to save the file is to call the close() function. This works, but after calling close() we cannot do any more writing to the file unless we call open(...) again.

Another way to “save” the contents of the file is to call f.flush(). This function “flushes” all of the written contents to the file on-disk. After this function is called, you can continue to write to the file.

File Exercises


Below is a complete python program named process-file.py which is in the current working directory:

data = []

data_file = open('data.txt', 'r')

for line in data_file:
    data.append(line)

for val in data:
    if 'man' in val:
        print('This is a man')
    elif len(val) <= 4:
        print('Unrecognizable line')
    elif val[len(val)-1] == 'n':
        print('The 6th char is e')
    else:
        print(val)

A text file named data.txt also exists in this directory. Below are several possibilities for the contents of the data.txt file. For each possibility, write what this program would output when run. If you think an error will be generated, just say so.

(A)

Iron Man
Superman
Batman
unknown
UNKNOWN LINE

(B)

Gambit
Wolverine
Magneto
Iceman
Jean

(C)

Thor
Sabretooth
Cyclops
Storm
Professor X
Spiderman


Below is a (nearly) complete python program named write-file.py which is in the current working directory:

data = X

data_file = open('data.txt', 'w')

index = 0
for val in data:
    if index % 2 == 0:
        data_file.write(val.upper() + '\n')
    if val[0].isupper():
        data_file.write(val + '\n')
    else:
        data_file.write(val + '(needs a cap)')
    index = index + 1

data_file.flush()

Below are several possibilities for the X in this script. For each possibility, write what this program would write to data.txt. If you think an error will be generated, just say so.

(A)

['Lebron', 'Curry', 'Harden', 'Westbrook', 'Leonard', 'Durant', 'DeRozan']

(B)

['antman', 'ANTMAN', 'superman', 'SUPERMAN']

(C)

['ONE', '', 'TWO', '', 'THREE']


(A) Write a function named append_text that takes two arguments, file_name and text. This function will open the file named file_name, append text to the end of the file, and then save the file to disk without closing it.

(B) Modify the function to take a third argument num. The function will repeatedly append text num times, and then flush the file to disk.