Recent Changes

Saturday, July 14

Friday, March 23

  1. 5:20 pm

Friday, August 5

  1. page Session 10.1 edited ... from numpy import array mpl.loglog(array(fpkms1), array(fpkms2), ',') # Plot a diagonal lin…
    ...
    from numpy import array
    mpl.loglog(array(fpkms1), array(fpkms2), ',')
    # Plot a diagonal line corresponding to equal expression
    mpl.loglog([min(fpkms1), max(fpkms1)],
    [min(fpkms1), max(fpkms1)], 'r:')
    mpl.show()
    Gives us this:
    {FPKMS.png}

    Male and Female Biased
    malebiased = open('male', 'w')
    ...
    malebiased.close()
    femalebiased.close()
    ...
    little gene?
    import sys
    xlocdict = {}
    (view changes)
    2:44 pm
  2. file FPKMS.png uploaded
    2:44 pm
  3. page Session 10.1 edited ... MPOS 1-based Mate POSition Solutions Simple Plotting This has some bug in it that I'm not…
    ...
    MPOS
    1-based Mate POSition
    Solutions
    Simple Plotting
    This has some bug in it that I'm not sure of yet, but the basic idea is the same:
    fh = open('../data/cufflinks_out/gene_exp.diff')
    fh.readline()
    # Get rid of the header line
    fpkms1 = []
    fpkms2 = []
    for line in fh:
    data = line.split()
    # If the values are zero, then a loglog plot will fail.
    fpkm1 = float(data[7])
    fpkm2 = float(data[8])
    if fpkm1 == 0 or fpkm2 ==0:
    continue
    fpkms1.append(fpkm1)
    fpkms2.append(fpkm2)
    from matplotlib import pyplot as mpl
    from numpy import array
    mpl.loglog(array(fpkms1), array(fpkms2), ',')
    Male and Female Biased
    malebiased = open('male', 'w')
    femalebiased = open('female', 'w')
    fh = open('../data/cufflinks_out/gene_exp.diff')
    fh.readline()
    # Get rid of the header line
    for line in fh:
    data = line.split()
    if data[-1] == 'yes':
    print data[0], data[7], data[8]
    female = float(data[7])
    male = float(data[8])
    if female > male:
    femalebiased.write(data[0] + '\n')
    else:
    malebiased.write(data[0] + '\n')
    malebiased.close()
    femalebiased.close()
    What's your name, little gene?
    import sys
    xlocdict = {}
    fbtrdict = {}
    for line in open('../data/cufflinks_out/cuffcmp.combined.gtf'):
    XLOC = ""
    FBtr = ""
    for word in line.split():
    if 'XLOC' in word:
    XLOC = word[1:-2]
    elif 'FBtr' in word:
    FBtr = word[1:-2]
    if XLOC == "" or FBtr == "":
    continue
    xlocdict[XLOC] = FBtr
    for line in open('fbgn_fbtr_fbpp_fb_2011_07.tsv.tsv'):
    line = line.strip()
    if len(line) == 0 or line[0] == "#":
    continue
    line = line.split('\t')
    fbtrdict[line[1]] = line[0]
    infile = open(sys.argv[1])
    for line in infile:
    try:
    fbtr = xlocdict[line.strip()]
    print fbtrdict[fbtr]
    except KeyError:
    pass
    #print "Can't find a match for ", line.strip()

    (view changes)
    2:34 pm
  4. page Session 10.2 edited ... Exercises: 1. Classify each gene into transcribing, paused, inactive catagories by finding pe…
    ...
    Exercises:
    1. Classify each gene into transcribing, paused, inactive catagories by finding peaks of polII near the start of the gene and taking the ratio of promoter to gene polII reads.
    2. MakeMake a heatmap
    ...
    data set. ThenThen sort all
    ...
    to RedFly andand get a
    ...
    cis-regulatory modules. TestTest if either
    ...
    in CRMs. AA common way
    ...
    the genome.
    4. Similarly, see if polII signal is associated with either histone modification.
    (view changes)
    2:27 pm
  5. page Session 10.2 edited ... Exercises: 1. Classify each gene into transcribing, paused, inactive catagories by finding pe…
    ...
    Exercises:
    1. Classify each gene into transcribing, paused, inactive catagories by finding peaks of polII near the start of the gene and taking the ratio of promoter to gene polII reads.
    ...
    a heatmap (make two dimensional array) for the
    ...
    gene start.
    2.

    3. Go to RedFly and get a list of annotated cis-regulatory modules. Test if either histone modification is enriched in CRMs. A common way to do this is to compare the amount of signal in a region of interest to a similar distribution of regions randomly selected from the genome.
    4. Similarly, see if polII signal is associated with either histone modification.

    (view changes)
    12:55 pm
  6. page Session 10.2 edited ... $ wget 128.32.142.145/~aaron/dmel_gene.gff.gz $ curl -O 128.32.142.145/~aaron/dmel_gene.gff.g…
    ...
    $ wget 128.32.142.145/~aaron/dmel_gene.gff.gz
    $ curl -O 128.32.142.145/~aaron/dmel_gene.gff.gz
    Exercises:
    1. Classify each gene into transcribing, paused, inactive catagories by finding peaks of polII near the start of the gene and taking the ratio of promoter to gene polII reads.
    2. Make a heatmap for the 1kb up and downstream of each gene for each ChIP data set. Then sort all the heatmaps by the highest polII peak 1kb upsteam of the gene start.
    2.

    (view changes)
    11:21 am
  7. page Session 10.2 edited ... Let's to the SRA and find the ChIP input data for stage E4-8. Since this is a time and space …
    ...
    Let's to the SRA and find the ChIP input data for stage E4-8.
    Since this is a time and space consuming process and I'm sure that you could do this on your own, get the bwa mapped bam files from here. While you are there, get the MACS peak calling files here.
    MACS and other peak finding software use the properties of the experiment to separate the real peak from the random input. One important feature is that there should be two piles of fragments on opposite strands around a real binding site. Use IGV to look at the distribution of reads around eve, a gene that should be expressed or getting ready to be expressed depending on how well they staged the collections.
    Let's look at the peaks around all genes! Get the dmel gene gff using wget or curl -O
    $ wget 128.32.142.145/~aaron/dmel_gene.gff.gz
    $ curl -O 128.32.142.145/~aaron/dmel_gene.gff.gz

    (view changes)
    11:02 am

More