Topics:

  • Common Errors
  • Error messages from Python
  • Exception Handling (try, except)
  • Print statement debugging

Introduction


"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." – Brian W. Kernighan

Let's be honest, most of us aren't perfect. And in our zen self-awareness, we are well-served to be sensitive to our predispositions to err in particular ways. We are probably even better served to listen to the computer when it tells us we've made a big mistake. Python can be quite good at communicating our flaws to us, if only we're receptive to the constructive criticism in its spartanly high-contrast black and white print.

This morning we're going to look at common sources of error, see what to look for as feedback from Python, and learn a couple of tricks to both obviate, or if necessary, track down bugs, should they occur.

A few commonly encountered sources of errors include:

  • Mis-typed variable names

  • Mis-typed variables

  • Non-existent variables (or indices in a variable)

  • Unexpected input


If you're very lucky, an error will cause Python to give up right away, while others will cause insidious bugs that sneak through unnoticed until you present your work in lab meeting and someone calls you on your exciting, but seemingly impossible (and ultimately bogus) result.

Strategic Initiative 1: Test Early, Test Often


There's an approach to software engineering called "Test Driven Development". In that way of doing things, before you write a function that does something, you write code that tests it, often in a separate file. It's really convenient to use modules to do this, where you could have a test.py file that imports the relevant functions from your other modules and tests them on as many cases as it can. This also has the advantage that you notice fairly quickly when your new code breaks something that used to work. Especially in scientific programming, this isn't always practical to do, but the general ideas hold: testing is good.

Even if you don't do TDD, you can save yourself lots of time by testing frequently. Debugging 100 lines of code is often more than 10 times as hard as debugging 10 lines. Writing a ton of code that generates output without checking if each component works individually does NOT make you a coding rock star; it makes you sloppy.

Strategic Initiative 2: Be verbose

"Errors should never pass silently." -- The Zen of Python, by Tim Peters

One of the easiest ways to debug code is to print out the value of variables at the point things start going wrong. Of course, if you knew where things were going wrong, you would probably know what was going wrong in the first place, so to find this, a divide-and-conquer approach is often fastest. Start out by putting a bunch of print statements in throughout your code, then narrow things down gradually until you're right at the broken line of code.

It's not a bad idea to include these print statements in moderation before you need them. If you think there will be a subset of your data that will fail a logical test, setup your if and else statements to report the incidence of failure.

# lab_members = {name : [status]}
# pretend this dictionary is already filled with data
 
for name in lab_members:
     if lab_members[name] == 'Post-doc':
          get_a_job(name)
          print "Time for this post-doc to get a job..."
     elif lab_members[name] == 'Grad Student':
          do_research()
          print "Be patient, this one's just a grad student."
     else:
          print "Warning:", name, "is not a post-doc nor a grad student!"

Strategic Initiative 3: Be Boring, Be Obvious

"Programs must be written for people to read, and only incidentally for machines to execute." --Structure and Interpretation of Computer Programs Hal Abelson and Gerald Jay Sussman

As you get better at coding, you will start to take shortcuts and combine lines. As soon as something doesn't behave as you expect, you should decompose your compound statements, as this is a common source of error.

Compound:
helix_aa.extend(range(int(line[21:25].strip()),int(line[33:37].strip())+1))

Expanded:
start_str = line[21:25]
end_str = line[33:37]
start_stripped = start_str.strip()
end_stripped = end_str.strip()
start = int(start_stripped)
end = int(end_stripped)+1
helix_res = range(start,end)
helix_aa.extend(helix_res)

These two code snippets do the same thing, and Python doesn't much care which you use. On the other hand, it's much easier to look at each line of code in the expanded case and see that it does what it should be doing, whereas in the compound form, you could very easily have parentheses that are matched in unexpected ways that throws the whole meaning of things off.

On a related note, it's good practice to comment your code. The Python Style Guide has some good recommendations about comments. Anything you did that required some thinking on your part should ideally be commented, since you might come back to the code weeks or months later and have forgotten why you did it that way. Also try to keep your comments up to date with the code.

Error Messages from Python: Syntactical Mistakes and Others


Much like in spoken languages, Python has rules for what is and is not grammatically allowed. Before running your script, Python will check it to make sure that it does follow all these rules, and will let you know if it doesn't. Keep in mind, though, that it can't catch errors in meaning, so long as your code follows the rules of the language (as an natural language equivalent of things Python can't catch, consider Groucho Marx: "One morning I shot an elephant in my pajamas. How he got into my pajamas I'll never know.").

#!/usr/bin/python
 
while True
    print 'Bioinformatics is AWESOME!'

$ ./errors.py
Traceback (most recent call last):
  File "./errors.py", line 3
    while True
              ^
SyntaxError: invalid syntax

This error message, for its brevity, is quite helpful. We quickly learn the following:

1) The problem is on line 3 of the file errors.py. This may seem trivial in this case, but when you have nested functions conjured from various modules in various directories, knowing which file to look in is a big, big help.

2) There is a problem after True. Note that, for syntax errors, the caret will often point somewhere after the place where the actual error is, and sometimes can point quite a bit after, though in this case it's pretty close.

3) The problem is that there is invalid syntax: we haven't followed the rules of the language. As you've likely seen already, and will no doubt see much more of in the future, there are all kinds of different errors that can show up, usually with a terse message explaining what went wrong.

What is the problem and how is it fixable?

There are lots of other kinds of errors that will almost inevitably crop up in your code. You've even already seen some in our lecture on data structures.

instructors = ['Aaron', 'Nate', 'Peter']
print instructors[10]

$./errors.py
Traceback (most recent call last):
  File "errors.py", line 4, in <module>
    print instructors[10]
IndexError: list index out of range

Again, let's see what we can work out from this error message:
  1. The error is now on line 4 of errors.py
  2. There's some problem with the expression print instructors[10] (presumably in this case you could have looked that up yourself, but Python does try to be helpful)
  3. That problem is of the type IndexError, and in particular there is a list index out of range

By now, it should be clear what the problem is (assuming it wasn't clear to you before I even ran it): instructors isn't long enough to have an item at index 10, so Python throws up its hands and gives up. Sometimes, though, we don't want Python to surrender at the first sign of trouble. If only there were some way to help it deal with problems that we can anticipate...

Easier to Ask Forgiveness than Permission: Try and Except

//"Errors should never pass silently. Unless explicitly silenced." -- The Zen of Python, by Tim Peters

The try and except statements let us attempt to do something that may or may not work, and then respond appropriately if or when something goes wrong. For example, sometimes the file we want to work with isn't available for some reason (and if you're really lucky, it's only because you forgot to plug in an external hard drive).

fh = open('AwesomeDataFile.txt', 'r')

$ ./errors.py
Traceback (most recent call last):
  File "badsyntax.py", line 4, in <module>
    fh = open('AwesomeDataFile.txt', 'r')
IOError: [Errno 2] No such file or directory: 'AwesomeDataFile.txt'

try:
    f1 = open('AwesomeDataFile.txt', 'r')
except IOError:
    print "The file 'AwesomeDataFile.txt' doesn't exist, falling back to 'CrummyDataFile.txt'..."
    f1 = open('CrummyDataFile.txt', 'r')

Sometimes you might have a section of code where multiple things could go wrong. In that case, you can respond to each of them appropriately using multiple except statements. It's even possible to have an except statement without listing the type of error, which will deal with anything not handled by one of the previous except statements.
try:
    do_something()
except IOError:
    handle_ioerror()
except IndexError:
    handle_indexerror()
except:
    print "Some other error happened"

It's considered slightly bad form, however, to have a catch-all except block. If you didn't anticipate the error, it will probably be helpful to you to get the default error message, even if it is a little arcane, rather than a message like the above, which is pretty darn vague.

For a listing of the default kinds of errors (also called "exceptions") that can happen, see the list of Built-in Exceptions in the Python Documentation.


Else and Finally

Similar to the use of else in for and while loops, anything in an else block will run *only if* no errors happened. On the other hand, anything in a finally block runs after the try/except blocks, and will run regardless of whether an exception was thrown; if there was an exception, it doesn't stop it from being thrown, it just waits a bit before it does.

def divide(x,y):
    try:
        result = x/y
    except ZeroDivisionError:
        print "Warning: divide by zero!"
        return float('NaN') #Not a number
    else:
        return result
    finally:
        print "executing final clause"
 
print divide(2,1)
print
print divide(2,0)
print
print divide('A','B')

$./errors.py
executing final clause
2
 
Warning: divide by zero!
executing final clause
nan
 
executing final clause
Traceback (most recent call last):
  File "errors.py", line 16, in <module>
    print divide('A','B')
  File "errors.py", line 3, in divide
    result = x/y
TypeError: unsupported operand type(s) for /: 'str' and 'str'

The first case made it through the division just fine.

The second case raised an exception (a division by zero error), but we anticipated that, so we replace it with a special Not a Number value, and move on.

The third case we found an exception that we didn't anticipate. The exception came up, so we still ran the finally clause, but then raised the exception. Any questions?



Exercises


1) Exception handling we have known.

In lesson 2.1, we learned about two functions that can be applied to remove an item from set: remove() and discard(). We looked at the functions using the following example:
#!/usr/bin/env python
 
list_of_letters = ['a', 'a', 'b', 'c','c','c','d','e']
print 'ORIGINAL'
set_of_letters = set(list_of_letters)
print set_of_letters
 
print 'DISCARD'
set_of_letters.discard('q')
print set_of_letters
 
print 'REMOVE'
set_of_letters.remove('q')
print set_of_letters

a) Now that we've learned more about exception handling, explain what is happening here.

b) Create a script that contains a list of 5 of your favorite beers or wines or soft drinks (depending on your preference) for a tasting party stored as a set. Each time someone drinks a beverage, it is removed from your fridge and cannot be drunk again. Matt drinks a beverage. Adjust the set as appropriate. Rich sees Matt drinking his beverage and wants the same one. Tell him you're out of that beverage.

2) Modify the FASTA parser from Session 4.2 to handle:

1) non-existent filenames
2) files that do not conform to the FASTA format (i.e. >gene for IDs, and strings of A,T,G, or C for sequence).
3) sequences that are in both cases

3) Modify the FASTA parser to identify file compression format and read compressed files.

We need to add functionality to our FASTA parser for use next week. We will be using very large FASTA files (> 3GB) when they are uncompressed. In order to save disk space, we'll be leaving them compressed (~1 GB) while we use them. However, this will require us to use the gzip module to read a compressed file directly. We will also need the mimetypes module, which can identify the type of file we are using.

First, write a new function called open_file_by_mimetype that determines whether a file is gzipped or not (using the mimetypes modules) and returns an open file handle. Then make a function called read_fasta() using your FASTA parser script, but modify it to call this new function such that your parser can read both types of FASTA files automatically.

When you're done, store your new file-opening function in a module called file_tools.py and your improved read_fasta() parser in a module called seqeunce_tools.py.

4) Improving the teacher's code.

Go back and look at the code you used to do Exercise 3.2.3 Now that you have learned about file and string processing, you should be able to understand the wrapper script that was supplied to help you parse the pdb files.

a) The code is very poorly commented. Figure out what is going on at each step and add comments.

b) There are two locations where exception handling is applied. Why is the exception handling necessary? The implementation is very sloppy. Can you rewrite exception handling? Can you rewrite the code again to avoid exception handling all together? (HINT: PDB files are formatted according to characters not whitespace. Go back to the documentation cited in exercise 3.2-2 for additional information on PDB formatting).

5) All code is bug-free until your first user.

You have another coworker who heard about your AMAZING secondary structure analysis code. She asks if you will analyze her protein, interleukin-19, as well (HINT: use PDB code 1N1F). Crud! This protein breaks your code. Why? Rewrite your code to work on both interleukin-19 and on the original H1N1 neuraminidase example.


Solutions

1) Exception handling we have known.

#!/usr/bin/env python
 
# a) We converted the original list_of_letters to a set
# called set_of_letters, which got rid of all duplicates
# in the set.  Then we discarded 'q'.  Discard can handle
# values that don't exist in sets, so discard does not
# return an error because there's no q, but remove can't
# handle values that don't exist in sets, so it returns a
# KeyError.
 
print
listOfBeers = ['Bell\'s Oberon Summer Wheat', 'Dogfishhead 90 Minute IPA', 'Franziskaner', 'Lindeman\'s Raspberry Lambic', 'Mad Hatter IPA']
print 'The party is all stocked with my favorite beers:'
setOfBeers = set(listOfBeers)
for beer in setOfBeers:
    print '\t', beer
print
 
def BxxrHour(x):
    print 'Craving some %s.  Goes to get one.  ' % (x)
    try:
        setOfBeers.remove(x)
    except:
        print "\"Hey! Someone totally pwned my %s!!\"" % (x)
        print 'Beverage FAIL!!!'
    else:
        print 'Epic beverage WINS!'
    finally:
        print "The remaining beverages are:"
        for beer in setOfBeers:
            print '\t', beer
        print
        return setOfBeers
        print
 
print 'Matt wants a beverage:'
setOfBeers = BxxrHour('Bell\'s Oberon Summer Wheat')
 
print 'Rich wants a beverage:'
BxxrHour('Bell\'s Oberon Summer Wheat')
 

2) Modify the FASTA parser from Session 4.2 to handle:

callparser.py

#!/usr/bin/env python
from parser import fastaParser
 
parsed = fastaParser('seq.FASTA')
print parsed
 

fastaParser.py


#!/usr/bin/env python
 
def fastaParser(fastafilename):
    current_gene = ""   # Start with an empty string, just in case
    genes = { }         # Make an empty dictionary of genes
    try:
        fh = open(fastafilename, 'r')
    except IOError:
        print 'Could not find file with filename %s' % (fastafilename)
        result = 'Please verify that your filename is correct and try again.'
        return result
    for lineInd, line in enumerate(fh.readlines()):
        if lineInd == 0:
            if not line[0] == '>':
                print 'File does not conform to FASTA format.'
                result = 'Please try again with FASTA formatted file.'
                fh.close( )
                return result
            else:
                pass
        else:
            pass
        line = line.strip()  # Clear out leading/trailing whitespace
        line = line.upper()  # Deals with whatever case the
                             # sequence is by making it all upper case
        if len(line) > 0 and line[0] == ">":   # This one is a new gene
            current_gene = line[1:]
            genes[current_gene] = ""
        else:                # Add onto the current gene
            genes[current_gene] += line
    fh.close()
    return genes
 

3) Modify the FASTA parser to identify file compression format and read compressed files.

callparser2.py

#!/usr/bin/env python
 
from sequencetools import *
import sys
 
fastafilename = sys.argv[1]
 
x = read_fasta(fastafilename)
print x
 

sequencetools.py

#!/usr/bin/env python
 
def read_fasta(fastaFilename):
    from filetools import open_file_by_mimetype as ofbm
    current_gene = ""   # Start with an empty string, just in case
    geneDict = { }      # Make an empty dictionary of genes
    fh = ofbm(fastaFilename)
    for lineInd, line in enumerate(fh.readlines()):
        if lineInd == 0:
            if not line[0] == '>':
                print 'File does not conform to FASTA format.'
                result = 'Please try again with a FASTA formatted file.'
                fh.close( )
                return result
            else:
                pass
        else:
            pass
        line = line.strip()  # Clear out leading/trailing whitespace
        if lineInd % 10000 == 0:
            print "At line %s." % (lineInd)
        else:
            pass
        line = line.upper() # Deals with whatever case the
                            # sequence is by making it all upper case
        if len(line) > 0 and line[0] == ">":   # This one is a new gene
            current_gene = line[1:]
            geneDict[current_gene] = ""
        else:                # Add onto the current gene
            geneDict[current_gene] += line
    fh.close()
    return geneDict

filetools.py

#!/usr/bin/env python
 
def open_file_by_mimetype(filename):
    import mimetypes
    if 'gzip' in mimetypes.guess_type(filename):
        try:
            import gzip
            fh = gzip.open(filename, 'r')
            print 'File is a gzipped file.'
            print
            return fh
        except IOError:
            print 'Could not find file with filename %s' % (filename)
            result = 'Please verify that your filename is correct and try again.'
            return result
    else:
        try:
            fh = open(filename, 'r')
            print 'File is not a gzipped file.'
            return fh
        except IOError:
            print 'Could not find file with filename %s' % (filename)
            result = 'Please verify that your filename is correct and try again.'
            return result
 

4) Improving the teacher's code.

#!/usr/bin/env python
 

5) All code is bug-free until your first user.

 
#!/usr/bin/env python