Structure Gazing with Chimera

The HIV-1 CA Protein
Today we will use Chimera to analyze the structure of HIV-1 CA protein, also known as p24. p24 corresponds to residues 133-351 in the Gag consensus sequence, which encompasses nearly all of the residues of sector 3 in the Dahirel et al study (see Table S1 for the complete list). This exercise will serve two purposes - first, it will provide us with a stomping ground for learning how to use the Chimera GUI, and also how to automate structural analysis with python scripts. Second, we will attempt to recapitulate the structural findings (and figures) of the Dahirel et al. paper, and maybe even do a better job of modeling the sectors of Gag that predominantly fall on the HIV Capsid.

This project will be split into two parts:
1) (Morning) Review of basic X-Ray Crystallography principles, discussion of the paper that served as the basis for modeling by Dahirel et al. ( X-Ray Structures of the Hexameric Building Block of the HIV Capsid ) and an introduction to Chimera.

2) (Afternoon) Modeling the sectors discovered by Dahirel et al. onto available HIV Capsid structures, simplifying complicated tasks in Chimera with a python script, and recapitulating the figures from the paper.

Part 1 - Chimera and Structures of the HIV Capsid

Before we open up Chimera, let's do a quick review of what statistics we should look for in an X-Ray Crystallography paper to properly (and quickly) judge the quality of the models that we will be studying today.

As with any X-Ray Crystallography paper, you should start your analysis with Table 1. Although all of these statistics obviously have some significance, you can get a very good feel for the quality of the model from just a few stats:

Crystallography Resources:
Evaluating X-Ray Crystal Structure Papers
Awesome Movies by James Holton

Data Collection
  • Space Group: "A space group is a symmetry group that divides space into repeatable domains." Space groups tell us about the symmetry that exists within the crystal, which can sometimes be useful in modeling larger complexes.
  • Resolution: Defines the "featuredness" of the electron density maps, and influences how confident we can be about the placement of atoms in the maps.
  • Redundancy: Average measurement of how many times each reflection was measured - should be anywhere from 3-20, partly depends on symmetry.
  • Completeness: Percentage of the unique reflections that were actually measured. > 90% is generally good, as long as the reflections that are missing are random and not indicative of a systemic problem in data collection.
  • I/sigma ##.#, (#.#) : Literally the measured intensity values divided by the standard deviation in these measurements, but can be thought of as signal to noise (or how much more intense the spots are than the background). This should generally be at least 2-3 in the inner shell.
  • Rsym ##.#, (##.#): Measurement of the agreement between equivalent reflections as defined by crystal symmetry.

  • Rwork, % / Rfree, %: Rwork is the agreement between the crystallographic model and the experimental data, and the Rfree is the degree to which the current crystallographic model can predict a test set of data not included in refinement (flagged beforehand as the "Free R set"). The difference between these two statistics should generally be 2-6%.
  • Phi and Psi angles (Ramachandran Statistics): Percentage of residues that fall in the various regions of a Ramanchandran plot - almost all residues should be in the favored region (> 95%), a few in the allowed region (at most 5%) and very few to none should be in the disallowed region (< 1%).

Important side note: If we weren't able to find a paper corresponding to this structure, we still have ways for looking up these kind of statistics. Navigate over to the page for this structure (3H47) on the PDB website, click on "Methods," and find the same statistics that you just looked at in Table 1. Many structures are solved these days by high throughput structural genomics initiatives, and this "Methods" tab (or the header of the PDB file, also accessible from the PDB page) is often the only way that you can look over these statistics.

Let's put these skills to use to look at the Table 1 from the HIV Capsid paper:


There are three different structures here, and for the sake of simplicity we will refer to them henceforth as "monomer" (3H47), "hexamer" (3H4E), and "templated" (3GV2) (left to right).

How do these statistics look to you? (Class Discussion)

It is important to note that these are not models of the wild type protein. In the case of the "monomer" and "hexamer" structures, two mutations were made (A14C and E45C) to form hexamers by way of oxidative crosslinking. Two other mutations were also made at the oligomerization interface (W184A and M185A) to reduce aggregation. The "templated" protein is a W184A/M185A mutant of p24, a two residue linker, the full length CcmK4 protein, and remnants of an affinity tag. CcmK4 is a protein that readily hexamerizes and has an accessible N terminus, so it was used as an alternative to crosslinking. It is nearly the same size as the p24 monomer, but it does NOT show up anywhere in the electron density.

Now that we know what we're working with, let's get started in Chimera and start looking at these models!

Introduction to Chimera documentation

Overview of pulldown menus
  • File: Opens, saves, and closes files
  • Select: Various methods for selecting things
  • Actions: Modulates display options
  • Presets: Makes nice displays for publication
  • Tools: Modules for doing cool stuff
  • Favorites: Your favorite stuff
  • Help: Various ways to get help

Go ahead and play around with Chimera for the next fifteen minutes to get acclimated to the interface. We'll start with the low resolution "templated" structure.

1) PDB Code 3GV2 (File -> Fetch by ID)
2) Move things around with the mouse (eg rotate, translate, zoom)
3) Change the display style using pulldown menus (eg ribbons, atom representations)
4) Use presets to change background
5) Save an image


Basic Use
Let's go through some of the basic functionality of Chimera together now.
You can use the command line (Tools->General Controls->Command Line) to make your life easier. The quick reference guide of Chimera commands can be found here:
Quick Reference Guide
Unarguably the most useful command is select, which can be abbreviated as sel. If I wanted to select residues 60-120 from model 0, chains A and B, I would type:
sel #0 :60-120.A-B
Another useful command is the "focus" command. With no arguments, it will zoom out and show you all available structures. The combination of "focus sel" is especially powerful, as it will automatically reposition the viewing window to view whatever you currently have selected.

Displaying Sequences
Another very useful control can be found in Tools->Sequence->Sequence, which displays the sequence for chains of our choosing. The selection is synchronized with your current selection in Chimera, which is very useful. What portion of the residues shown in the sequence actually show up in the model? What is going on with the rest of the residues, and what do you think a red box around a residue means?

Orientation of NTD/CTD
The structure is described as "two concentric circles" of the NTDs and the CTDs, which some of their figures show quite well. We can reproduce this representation in Chimera by assigning different colors to the NTD and CTD.

Exercise: Since the paper does not explicitly state where they define the cutoff for the NTD and CTD to be, we'll use 144 as the start of the CTD. Reproduce the following figure:

Oligomerization Sites
Another very interesting finding from the Pornillos et al. paper was a hypothesis for how oligomerization occurs to form the viral capsid based on crystal packing. A representation of this is shown in Figure 2C of the Pornillos et al. paper:


This is especially interesting to us, since some of the residues found by Dahirel et al. are claimed to be on these oligomerization sites. Let's use Chimera to generate a similar model. Go to Tools -> Higher Order Structure -> Crystal Contacts, select 3GV2, make sure "Create copies of contacting molecules" is selected, and set "Contact Distance" to six. A lot of things pop up at once now, so let's parse what is going on here.

The box that we can see is the unit cell. The dimensions of the unit cell can be found in Table 1, and our initial model is sitting in the middle of the C axis. We could also have predicted this by knowing the space group, as the "C" in C2 denotes that the monomer will sit on the C axis. The neighboring molecules that were created by our most recent action are known to exist by the symmetry of the crystal, so they are "really" there. Whether these contacts are physiologically relevant is another issue, but Pornillos et al. make a good argument for these sites being relevant to oligomerization. Select individual molecules using the select command (notice Active models has a lot more models available now), and delete the ones that are not in the ab plane of the crystal packing. Also delete the crystal packing diagram (likely model 1) and reproduce the image below:


We've now generated a very useful tool for looking at these oligomerization sites. Save this as a single PDB file before we play with it any further.

Now that we have a good model for the higher order structure up and running, let's analyze the location of the W184A and M185A mutants and try to understand how they impact oligomerization. Select these two residues in each of your models and color them red. Does the resulting model make sense to you? Do these intersubunit contacts occur between NTDs or CTDs?


Comparing Structures with Structural Alignments
Let's compare the 7 Angstrom 3GV2 model to 2.7 Angstrom 3H4E model. Close your session, fetch both of the structures, and color them with two different colors. The 3H4E structure has two hexamers in the model, and we'll delete the first hexamer (chains A-F) for the sake of simplicity. Align the two structures using MatchMaker (Tools -> Structural Comparison), using 3H4E as the reference and 3GV2 as the structure to match, and "Best-aligning pair of chains between reference and match structure" selected. What differences can you see between the two structures?

Key: Teal = 3GV2, Purple = 3H4E

1) Look up the data collection and refinement statistics for the highest resolution structure (3H47), and take a quick look at the data collection and refinement statistics. Does the structure seem like a well built and trustworthy model? If not, why?

2) Although the 3H47 structure is a monomer, the space group suggests that we may be able to observe a higher order structure through crystal contacts. Load up the structure,
go to Tools-->Higher-Order Structure and play around with "Crystal Contacts." Why is the P6 symmetry in this crystal packing useful for generating this model? Save this as a single (see options towards the bottom of the dialog) PDB file and close the session.

3) Load your PDB model up in Chimera, and then open the 3GV2 and 3H47 structures. How well do the secondary structure components of 3GV2 line up with the (much) higher resolution 3H47 structure?

4) Open a new session, fetch 3GV2, and enter the following command:
rangecolor bfactor key 2 blue 30 red 50 yellow
This will color the structure by B factors, which relate to the average movement of the atoms at that particular point in the structure. Fetch the 3H47 and align it against chain A of 3GV2. Do you see a relationship between the unbuilt regions of 3H47 and the B factors of 3GV2?

5) Fetch the 3H4E structure and delete chains A-F. Now use the tool of "swapaa" to mutate residues 184 and 185 back to the wild type residues. Why do you think that mutating these residues to alanine slightly decreased oligomerization?

6) Extra Credit: Repeating question 5 with 3H47 throws an error and doesn't work. Can you figure out why?