Functions, Modules, and Pickles


  • Function definitions
  • Function documentation
  • Functions within functions
  • Modules
  • Making your own libraries
  • Pickles and quick data storage


This afternoon we'll concentrate on our last fundamental programming concept for the course. To date, we've been writing all of our program logic in the main body of our scripts. And we've seen how built-in python functions like raw_input() are used to operate on variables and their values. In this session, we'll learn how to write functions of our own, how to properly document them for ourselves and other users, and how to collect them into modules, and make our own local repositories, or libraries.

If you properly leverage a well-designed function, writing the main logic of your programs becomes almost-too-easy. Instead of writing out meticulous logical statements and loops for every task, you just call forth your previously-crafted logic, which you've vested in well-made functions.


Functions are the basic means to manage complexity in your programs, allowing you to avoid nesting and repeating large chunks of code that could otherwise make your tasks unmanageable. They allow you to bundle code with a known input and a known output into single lines, and you should use them frequently from now on.

We will start with the syntax:

#!/usr/bin/env python
# define the function
def hello(name):
     greeting = "Hello %s!" % (name)
     return greeting
# use the function
functionInput = 'Zaphod Beeblebrox'
functionOutput = hello(functionInput)
print functionOutput

To define a function, you use the keyword def. Then comes the function name, in this case hello, with parentheses containing any input arguments the function might need. In this case, we need a name to form a proper greeting, so we're giving the hello() function a variable argument called name. After that, the function does its thing, executing the indented block of code immediately below. In this case, it creates a greeting "Hello <name>!". The last thing that it does is return that greeting to the rest of the program.

Note that the variable names are different on the inside and the outside of the function: I give it functionInput, although it takes name, and it returns greeting, although that return value is fed into functionOutput. I did this on purpose, as I want to emphasize that the function only knows to expect something, which it internally refers to as name, and then to give something else back. In fact, there is some insulation against the outside world, as you can see in this example:

#!/usr/bin/env python
def hello(name):
    greeting = "Hello %s!" % (name)
    testVariable = "The hotel room is a mess, there's a chicken hangin' out, somebody's baby is in the wardrobe, there's a tiger in the bathroom that Mike Tyson wants back, Stu lost a tooth and eloped, and Doug is missing."
    print 'Inside of the function:', testVariable
    return greeting
testVariable = "What happen's in Vegas stays in Vegas."
grt = hello("Stu Price")
print 'Outside of the function:', testVariable

Even though the epic story of a bachelor party gone horrifically awry was assigned to a variable called testVariable inside the function, nothing happened to that variable outside the function. Variables created inside a function occupy their own namespace in memory distinct from variables outside of the function, and so reusing names between the two can be done without you having to keep track of it. (Refer to this article about namespace for more information.) That means you can use functions written by other people without having to keep track of what variables those functions are using internally. Just like a sleazy town in Nevada, what happens in the function stays in the function. (An important exception lies with lists and dictionaries, which you will examine in the exercises.)

Let's have another example, returning to a more pressing subject:

#!/usr/bin/env python
def whichFood(balance):
    if balance < 10:
        return 'ramen'
    elif balance < 100:
        return 'good ramen'
    elif balance < 200:
        return 'better ramen'
        return 'ramen that is truly profound in its goodness'
print whichFood(14)

Here we've made a slightly more complicated function-- it contains some control statements, and there is more than one way for it to return. We also never explicitly create an input variable (as we did with functionInput in the first example), and we don't store the output to a variable either (as we did with functionOutput).

Finally, we've shown examples with one input variable and one return value, but functions can accept zero input variables, one input variable, or multiple input variables, and functions don't necessarily need to return variables back to the program, but they are also capable of returning multiple variables. They can even have other functions nested inside them!

Here are a few more examples of the syntax used with functions:

#!/usr/bin/env python
# functions can do their thing without taking input or returning output
def useless():
    print 'What was the point of that?'
def countToTen():
    for i in range(10):
        print i

#!/usr/bin/env python
# functions can also take multiple items in and return multiple items out
def doLaundry(amtDetergent, dirtyClothes):
    cleanClothes = []
    for load in dirtyClothes:
        amtDetergent -= 1
    return (amtDetergent,cleanClothes)
amtTide = 5
dirtyLaundry = ['socks','shirts','pants']
(amtTide, cleanLaundry) = doLaundry(amtTide, dirtyLaundry)
print "Amount of Tide left:", amtTide
print cleanLaundry

Above, in doLaundry, I returned a tuple of the two variables enclosed in parenthesis. You could also return a list, which works much the same way. (Some more information on the distinctions between tuples and lists can be found at this link.) You could return other objects as well, like dictionaries.

#!/usr/bin/env python d
ef returnStuff():
    a = '>Gene1'
    b = 'ATGGTGGG'
    return [a,b] # returns the output as a list
print type(returnStuff())
# We can index the output the same as any list
print returnStuff()[0]
print returnStuff()[1]
(x,y) = returnStuff() # stores output to the variables x & y,
                      # so you can access x and y directly
print x
print y
both = returnStuff() # stores the output to the variable both
                     # which will be a list
print both
dictOfStuff = {}
dictOfStuff[returnStuff()[0][1:]] = returnStuff()[1]
print dictOfStuff

So how do functions make our lives easier? We can exploit functions to break difficult tasks into a number of easier tasks, and then these easier tasks into ones easier still, and so on. Large 'raw' code blocks, with few function calls, are only tens of lines long, and many functions are only a handful of lines. This allows us to program in large, structural sweeps, rather than getting lost in the details. This makes programs both easier to write and easier to read:

def publishAPaper(authors,topic,journal):
    data = doWork(topic)
    figures = analyze(data)
    paper = writePaper(data,figures)

And, a big part of that ease comes with the use of:


In all of the examples above, we defined our functions right above the code that we hoped to execute. If you have many functions, you can see how this would get messy in a hurry. Furthermore, part of the benefit of functions is that you can call them multiple times within a program to execute the same operations without tiresomely writing them all out again. But wouldn't it be nice to share functions across programs, too? For example, working with genomic data means lots of time getting sequence out of FASTA files, and shuttling that sequence from program to program. Many of the programs we work with overlap to a significant degree, as they need to parse FASTA files, calculate evolutionary rates, and interface with our lab servers, for example -- all of which means that many of them share functions. And if the same function exists in two or more different programs, we hit the same problems that we hit before: complex debugging, decreased readability, and, of course, too much typing.

Modules solve these problems. In short, they're collections of functions and variables (and often objects, which we'll get to towards the end of the course) that are kept together in a single file that can be read and imported by any number of programs.

Using a module: the basics

To illustrate the basics, we'll go through the use of two modules, sys and math, one of which we effectively use all the time. In fact, it's a very, very rare program indeed that doesn't use the sys module. sys contains a lot of really esoteric functions, but it also contains a simple, everyday thing -- what you typed on the command line. To illustrate, if we were to create a new program called and type the following commands into Terminal:

$ ./ argument1 argument2 argument3

then the sys module would contain a list of strings called argv composed of the following: ['./', 'argument1', 'argument2', 'argument3']. We can access the list argv from our program by importing the module sys.

#!/usr/bin/env python
import sys            # gaining access to the module
# you can access variables stored in the module by using a dot
# to get at the variable 'argv' which is stored in 'sys', type:
commandLine = sys.argv
print commandLine

From (And here's another relevant one you can appreciate today:
From (And here's another relevant one you can appreciate today:

Conveniently, we can access functions stored inside modules. To demonstrate this, I'll use the module math.

#!/usr/bin/env python
import sys
import math
# sys.argv contains only strings, even if you type integers.
# And, remember, the first element is the command itself-- usually
# not very useful.
x = float(sys.argv[1]) # argv stores the command line arguments as strings, but python isn't theoretical physicists, so we can't do math with strings
logX = math.log(x)
print logX

Great! Not so hard. It turns out that modules are easy to write, too:

Making a module

Any file of python code with a .py extension can be imported as a module from your script. When you invoke an import operation from a program, all the statements in the imported module are executed immediately. The program also gains access to names assigned in the file (names can be functions, variables, classes, etc.), which can be invoked in the program using the syntax Go ahead and make your first module by pasting the following code into your text editor and saving as

print 'The top of the greeting_module has been read.'
def hello(name):
    greeting = "Hello %s!" % name
    return greeting
def ahoy(name):
    greeting = "Ahoy-hoy %s!" % name
    return greeting
x = 5
print 'The bottom of the greeting_module has been read.'

Now make a new program called with the following code and include your first name as an argument in the Terminal command line when you execute it:

#!/usr/bin/env python
import greeting_module
hi = greeting_module.hello('Nathaniel')
print hi
print greeting_module.x
# What happens if you try 'print x' here?
# Remember how to access argv?
import sys
print greeting_module.hello(sys.argv[1]) # This will take your Terminal argument as input for the greeting module's hello function

And that's it! See-- no more messy function declarations at the beginning of your script. Now if you need any other program to say hi to you, all you need to do is import the greeting module.

Using modules: slightly more than just 'import'

Although creating a basic module is easy, sometimes you want more than just the basics. And although using a module in the most basic manner is easy, it's best to get a more thorough picture of how modules behave.

First, what if you only want one function from a given module? Let's say, as an Alexander Graham Bell loyalist, you really only dealt in 'ahoys' rather than 'hellos.' We need to use a modified syntax for retrieving only the ahoy function from the module, without wasting memory space loading the newfangled hello function preferred by T.A. Edison's entourage.

Change the code in to the following:

#!/usr/bin/env python
from greeting_module import ahoy
hi = ahoy('everybody')
# if you grab a function from a module with a 'from' statement, you don't need to use the <module>.<function> syntax
print hi

We see that we can now write ahoy('everybody') directly, instead of having to write greeting_module.ahoy('everybody'). And if we wanted to access both functions this way, we could import them both in one statement by changing the import line in to the following:

#!/usr/bin/env python
from greeting_module import ahoy_hoy, hello

Or, what if we wanted to avoid invoking the greeting_module with every call of a greeting_module function? Rather than writing out all of the function names to import individually (there could be a lot of them), we can use the asterisk wildcard (*) symbol to refer to them.

#!/usr/bin/env python
from greeting_module import *
hi = ahoy('everybody')
hi2 = hello('everybody')

While this may be useful if we are familiar with the contents of the module, including all of the names inside, there are a few reasons to be careful about using the from modulename import * syntax. First, if the module contains a lot of variables that we don't need to use, we will needlessly allocate memory to storing the information. Second, and perhaps more importantly, if the module being imported contains variables with the same names as those inside your program, you will lose access to the original values of those variables. For example, would might have a problem if both and each define distinct functions called hello(). If instead you use the syntax import yourmodule, then you can call the function in using hello() and you can call the function in using yourmodule.hello().

Finally, you can also import variables from modules and assign them new names in your program using the syntax from modulename import variablename as newvariablename.

Where to Store Your Modules: using PYTHONPATH

Over time, you'll end up accumulating lots of these modules, and they'll tend to fall together in meaningful collections. For example, you might have a module for all your functions related to reading and parsing files, called You might have another for common sequence-related tasks, called Python keeps its modules installed in a system directory that you may or may not have access to on a remote server. Therefore, it's useful and simpler to just create your own python modules directory and then let your operating system environment know about it. In MacOS, I accomplish this by placing my modules in /Users/nathaniel/Library/Python/Modules and then adding a few lines to my .bash_profile file in my home directory with the following terminal commands:

$ echo 'PYTHONPATH=/your_home/Library/Python/Modules' >> .bash_profile
$ echo 'export PYTHONPATH' >> .bash_profile
$ source .bash_profile

*NOTE: replace your_home with your own full home path: e.g. '/Users/nathaniel/'. (Remember, you can see what your home path is called in Terminal by typing pwd from your home folder.)

And with that, any .py file that ends up in this directory will be treated as a module by Python. And though this is a good final resting place for your polished modules, you can also prototype them by simply saving them in your current working directory, and moving them over when you're happy with them.**

So, with this under our belts, why don't we start using an example module? This one here is handy:


There are many modules that come with a default installation of python, and one of the more useful ones is pickle. It allows us to store data from a python script very easily into a file, and then when you want it again, we can unpickle the very same stuff! This is recommended if you have large amounts of processed data that you need to dump onto your disk momentarily, to free space while you look at other data. In effect, pickling data saves you the time of writing functions to write and read temporary data. It's nice.

There's a lot to pickle, like many built-in pieces of python. Here we'll just cover the basic functionality, which comprises most of its use anyway. pickle has two functions that we want to use. The first is dump(), which takes two arguments. The first argument given to dump() is a variable that we want to store. The second argument given to dump() a file handle, which tells pickle where to store the data. If we've used pickle.dump() correctly, pickle will store the data to a text file in a special way that will allow pickle to recover the variable with its entire data structure intact later. In order to recover the data, we use the pickle function called load(), which requires just one argument, a file handle that points to a pickled file. If we store the result of pickle.load() to a variable and print the result, we'll see that we recovered exactly what we put in.
#!/usr/bin/env python
import pickle
pickleBrandsToBuryAlive = ['vlasic','heinz','klaussen','kruger']
brandfh = open('thePickleFile','w')
#!/usr/bin/env python
import pickle
picklefh = open('thePickleFile')
zombiePickles = pickle.load(picklefh)
print zombiePickles

And there you have it! Zombie pickles! Delicious! You can also store more complicated data structures:

#!/usr/bin/env python
import pickle
brands = {}
brands['west'] = ['kruger',"klein's"]
brands['midwest'] = ['claussen','vlasic','gedney']
brands['east'] = ['mt. olive','b&g']
brands['south'] = ['best maid','goldin']
print brands
brandFileHandle = open('thePickleFile','w')


1: Practice with functions

Make a function that:

A) Takes an integer x as input and prints x * 2.

B) Takes integers x and y as input and prints x * y.

C) Takes a list xs as input and prints xs[0] * xs[1].

D) Modify the above programs so that the function returns the result instead of printing it, then the output is printed from program that called the function.

2. What happens in functions doesn't always stay in functions

As promised, most things that happen in functions stay in the functions, but there are important exceptions. Make the following functions, which should illustrate this property:

A) The function takes an integer as input and increments the integer by one using the '+=' operator. Print the value of the integer before and after the function is called.

B) The function takes a list as input and changes the first element of the list to the string 'x'. Print the value of the list before and after the function is called.

C) The function takes a dictionary as input and adds the key 'x' with value 'y' to this dictionary. Print the dictionary before and after the function is called.

3. Reverse Complement

A) Write a function that takes a DNA sequence as an argument, ensures that it the sequence is in capital letters, and then returns the reverse complement of the sequence.

B) Modify the function to ensure that only the characters A, T, G, C and N (for unknown nucleotide) are in the input sequence.

4. Making a module

Create a directory in your PythonCourse directory called pylib, then add it to your PYTHONPATH. Create a module in this directory called Put your functions from Exercise 1 into this module. Now write two programs that import and call all of the functions in the module both of these ways:

A) A program that uses the line import exercises.

B) A program that uses the line from exercises import *. What happens if you have print statements in Are they printed when you use the from statement?

C) Add your reverse complement function from Exercise 3 to this module.

5. Make a FASTA parser

Starting with your script from this morning, make a function that takes a FASTA file as input, reads through the file using open(), distinguishes between ID-containing lines and sequence-containing lines, and returns a dictionary with gene IDs as keys and sequences as values. Put this function in your module.

Copy and paste the following lines into a file called testFasta.fa. Create a program that imports the module and prints the sequence corresponding to the gene ID 'gene3.'

#!/usr/bin/env python
from exercises import fastaParser
geneDict = fastaParser('testFasta.fa')
print geneDict['gene3']

6. Pickle Practice

Modify your program from (5) such that instead of printing the data, it pickles it. Now write another program that unpickles that pickle file and prints the sequence of gene3.

7. (Bonus) Create an ORF finder

For our purposes, we will define an open reading frame (ORF) as a start codon followed at some distance by a stop codon in the same frame. This program should take a pickled FASTA file as in (6) as input and outputs a pickled dictionary of gene name->ORF sequence key-value pairs. If the sequence does not contain an ORF, then the gene name should not be in the dictionary.

8. For This and Giggles.

Try out the following code:

#!/usr/bin/env python
import this

#!/usr/bin/env python
import antigravity


1) Practice with functions:

#!/usr/bin/env python
# a) Takes an integer x as input, prints x*2 (x multiplied by 2)
def timestwo(x):
    print '%.0f multiplied by 2 is %.0f' % (x, x * 2)
num = float(raw_input('Input number to multiply by 2: '))
x = timestwo(num)
# b) Takes integers x and y as input, prints x * y
# Below I'll generate the list using command arguments, since
# we learned that today, but you could write them into the
# script instead
import sys
commandLine = sys.argv
print 'You entered the numbers', commandLine[1:], 'into the commandline.'
def product(x,y):
    print "The product of the first two numbers is %.0f." % (x*y)
numToMultiply1 = float(sys.argv[1])
numToMultiply2 = float(sys.argv[2])
multiplied = product(numToMultiply1, numToMultiply2)
# c) Takes a list xs as input, prints xs[0] * xs[1]
listOfNumbers = [2,3,3,4]
def product(xs):
    result = xs[0] * xs[1]
    print 'You supplied the list: %s' % (xs)
    print 'The product of the first two numbers in the list is %.0f.' % (result)
multipliedNumbers = product(listOfNumbers)
print multipliedNumbers # returns None
# d) Modify the above programs so that the function returns
# the result instead of printing it. This result is then
# printed by the program that called the function.
listOfNumbers = [2,3,3,4]
def product(xs):
    result = xs[0] * xs[1]
    print 'You supplied the list: %s' % (xs)
    return result
multipliedNumbers = product(listOfNumbers)
print 'The product of the first two numbers in the list is %.0f, but this time we returned the result from the function.' % (multipliedNumbers)

2. What happens in functions doesn't always stay in functions.

#!/usr/bin/env python
# a) The function takes an integer as input, and it increments that integer by one using the '+=' operator. Print the value of the integer before and after the function is called.
def increment(numberToIncrement):
    numberToIncrement += 1
numberToIncrement = 5
print 'The number to increment was', numberToIncrement
print 'The number is still', numberToIncrement
# b) The function takes a list as input, and it changes the first element of the list to the string 'x'. Print the value of the list before and after the function is called.
def modifyList(x):
    x[0] = 'overwrite'
    return x
stringlist = ['1', '33', '5', 'dog'] # could have used list of integers, or any type of list
print 'The list was', stringlist
print 'Now the list is', stringlist
# c) The function takes a dictionary as input, and it adds the key 'x' with value 'y' to this dictionary. Print the dictionary before and after the function is called.
def appendToDict(Dict_with_a_new_name):
    Dict_with_a_new_name['x'] = 'y'
Dict = {}
Dict['0'] = 'zero'
Dict['1'] = 'one'
Dict['2'] = 'two'
print 'Before:', Dict
import sys
commandLine = sys.argv
print 'After:', Dict

3. Reverse Complement

def revComp(seq):
    seq=seq.upper()            # Makes seq uppercase
    seq=seq[::-1]              # Reverses seq
    seq=seq.replace('A','t')   # Replace ACGT with lowercase complement
    seq=seq.upper()            # Make seq uppercase again
    if isitempty != "":
        print "Careful, improper characters!"
    return seq
#  Iterative method
def revCompIterative(watson):
    complements = {'A':'T', 'T':'A', 'C':'G', 'G':'C', 'N':'N'}
    watson = watson.upper()
    watsonrev = watson[::-1]
    crick = ""
    for nt in watsonrev:
       crick += complements[nt]
    return crick
print revComp("aTNrg")

4. Making a module.

#!/usr/bin/env python
# Make a directory in /Users/[username]/PythonCourse/pylib
# Open a new terminal window and type the following, substituting your username:
#      echo "PYTHONPATH=/Users/[username]/PythonCourse/pylib" >>.bash_profile
#      echo "export PYTHONPATH" >>.bash_profile
#      source .bash_profile
# Create a file called in the pylib folder, copy in your timestwo() function
# To verify it worked, try part a
#Part a
import exercises
print exercises.timestwo(4) # or whatever your function was called
#Part b --note, this should be run separately from part a
from exercises import timestwo
print timestwo(6)
#Part c
#Copy the reverse complement function from problem 3 to PythonCourse/pylib/

5. Make a FASTA parser

Below is the module called where we have stored our functions.
#!/usr/bin/env python
def fastaParser(filename):
        current_gene = ""
        genes = {}
        fh = open(filename, 'r')
        for line in fh:
                line = line.strip()
                if line.startswith('>'):
                        current_gene = line[1:]
                        genes[current_gene] = ''
                        genes[current_gene] += line
        return genes
def timestwo(x):
# Takes 1 integer x as input, prints x*2
    print '%.0f multiplied by 2 is %.0f' % (x, x*2)
def product1(x,y):
# Takes 2 integers x and y as input, prints x * y
    print "The product of the first two numbers is %.0f." % (x*y)
def product2(xs):
# Takes a list as input, prints xs[0] * xs[1]
    result = xs[0] * xs[1]
    print 'You supplied the list: %s' % (xs)
    print 'The product of the first two numbers in the list is %.0f.' % (result)
def product3(xs):
# Same as product2() except this function returns the
# result instead of printing it. This result can then
# be printed by the program that called the function.
    result = xs[0] * xs[1]
    print 'You supplied the list: %s' % (xs)
    return result

Below is the script called that will import functions from the module
#!/usr/bin/env python
# a) A program that uses the 'import exercises' line.
import exercises
x = exercises.squareNum(12)
# b) A program that uses the 'from exercises import *' line
from exercises import product1
# c) Add your reverse complement function from Exercise 3 to this module.
from exercises import fastaParser
x = fastaParser('seq.FASTA')
print x

6. Pickle Practice
#!/usr/bin/env python
from exercises import fastaParser
import pickle
x = fastaParser('seq.FASTA')
parsedFastaDataHandle = open('parsedFastaData','w')
pickle.dump(x, parsedFastaDataHandle)
#!/usr/bin/env python
import pickle
pickleFileHandle = open('parsedFastaData')
parsedFastaData = pickle.load(pickleFileHandle)
print parsedFastaData

7. (bonus) Create an ORF finder

#!/usr/bin/env python
def find_orfs(sequence):
        """ Finds all valid open reading frames in the string 'sequence', and
            returns them as a list"""
        starts = find_all(sequence, 'ATG')
        stop_amber = find_all(sequence, 'TAG')
        stop_ochre = find_all(sequence, 'TAA')
        stop_umber = find_all(sequence, 'TGA')
        stops = stop_amber + stop_ochre + stop_umber
        orfs = []
        for start in starts:
                for stop in stops:
                        if start < stop \
                           and (start - stop) % 3 == 0:  # Stop is in-frame
                                # the +3 includes the stop codon
                                # break out of the inner for loop
                                # when we hit the first stop codon
        return orfs
def find_all(sequence, subsequence):
        ''' Returns a list of indexes within sequence that are the start of subsequence'''
        start = 0
        idxs = []
        next_idx = sequence.find(subsequence, start)
        while next_idx != -1:
                start = next_idx + 1     # Move past this on the next time around
                next_idx = sequence.find(subsequence, start)
        return idxs
fname = file(sys.argv[1])   # Read in from the first command-line argument
fh = open(fname, 'w')
genedict = pickle.load(fh)
orfdict = {}
for gene in genedict:
    gene_seq = genedict[gene]
    orfs = find_orfs(gene_seq)
    if len(orfs) > 0:
        orfdict[gene] = orfs
print orfdict
fh = open('orfs_out', 'w')
pickle.dump(orfdict, fh)

8. For This and Giggles

import this should print 'The Zen of Python to the Terminal'
import antigravity open's your web browser and points it to the xkcd comic about import antigravity (a bit meta, no?)