Pyrobaculum: From Gene to protein

By Michael Forstner

On this webpage I am going to try to explain briefly, which steps to take to get from an identified ORF (i.e. open reading frame) in the Pyrobaculum aerophilum genome to an expressed protein suitable for crystallization studies. As there are too many methods available for doing this, I will not go into any details of a single method, but give an outline what to do and where to find additional information. For performing the individual steps, please refer to the instruction booklets that come with the reagent kits that you want to use or to one of the many general texts on basic molecular biology techniques.

Step1: Identify the ORF and find flanking sequences

An open reading frame is characterized by the presence of a start-codon (most likely ATG, but GTG is very common in Pyrobaculum also) and a stop-codon at its 5'- and 3'-ends, respectively. All the codons in between those two code for one of the 20 amino acids. Once you have identified an ORF of your interest out of the ORF list, go to the DNA sequence of the Pyrobaculum genome and find the flanking sequences. Depending on how you want to continue, you now have several choices for designing your primers. Let's assume for this short tutorial that you are going to use an expression vector with an N-terminal His6-tag that allows for direct in-frame cloning. The one important thing you have to keep in mind is that your PCR product that you amplify must, after cloning into the expression vector, fit in the right reading frame. Thus, if you have a His6-tag, you have to make sure that your open reading frame continues after the tag and the cloning site. Please refer to the specific instructions by the manufacturer of the expression vector for details on the nature of the cloning site.

As we have two DNA libraries available that contain random inserts of DNA pieces of 1-3 and 3-10 kb size, respectively, you should be able to amplify virtually every target DNA that you are interested in. If, for any reason, you don't obtain a PCR product within a reasonable number of trials, chances are that you have bad luck and that your gene of interest is not represented in the library as one piece. Under these circumstances you have the option to isolate whole genomic DNA from Pyrobaculum (you will get a cell pellet from Ken Goodwill) and do a PCR on that, or you can order a clone containing the ORF from Sorel Fitz-Gibbon. For the latter option, you have to provide the ORF number and there is a web-form available for that purpose from the PA genome homepage (password needed).

Step 2: design PCR primers and amplify region of interest

Your PCR primers have to satisfy several demands: they should be unique sequences within the genome, they should contain approximately the same percentage of G and C nucleotides, and they should have the same or a higher percentage of G/C than the DNA fragment to be amplified. The reason for the first demand is pretty obvious: you do not want to amplify unspecific gene fragments. The reason for the other two is that the G/C content influences the "melting" (i.e., strand separation) temperature of the DNA, with a high G/C content increasing that temperature. You want to have a similar melting temperature in both primers to make sure that both anneal to the target DNA at the same temperature. This temperature should be higher than the self-annealing temperature of the DNA to make sure that the target DNA does not self-anneal rather than bind the primers. The annealing temperature of a PCR primer can be easily calculated as shown in Appendix 1.

Once you have the primers synthesized, you will need deoxy-nucleotides and a thermostable DNA polymerase. The usual Taq-polymerase is not particularly suitable for that job as it (1) does not have proofreading capability and introduces a lot of random mutations, and (2) adds a T-overhang at the end of the amplified piece of DNA, thus enabling what's called TA-cloning, but that is not what we want to do here. Better use DNA polymerase from Pyrococcus furiosus, commercially available as Pfu-polymerase or another of the many thermostable DNA polymerases that are around.

A typical PCR reaction consists of 20-25 cycles of a temperature program involving DNA strand separation at 95°C, annealing of the primers at the specific annealing temperature (see above; usually in the range of 40-60°C), and a synthesis ("extension") step at the temperature optimum for the polymerase (e.g., 72°C for Taq). After the last cycle sometimes a final extension step for 10-15 minutes is added. After the reaction is over, it is advisable to cool the reaction tube to 4°C, if you want to keep the mix for a while, or immediately go forward and purify the amplified product and run a gel to determine purity and size.

Step 3: Subclone amplified DNA in expression plasmid and determine sequence

If you see your amplified DNA as a product of the desired size on an agarose-gel after the PCR reaction is done, you have mastered one important step on the way to your recombinant protein. Now you have to purify the DNA (i.e., remove the excess primers and the polymerase) and subclone the product in the expression vector.

For that purpose, you might have restriction sites at the end of the primers (again: refer to the instructions for your vector) that allow for cloning in the right reading frame. If you made no mistake in the design of your primers, you will now be able to cut your pieces of DNA at those restriction sites, allowing for specific ends that allow for the cloning of your product in the vector (the vector obviously has to have the same or compatible restriction sites) by DNA ligation using DNA-Ligase. Alternatively, you sometimes will have to directly clone your PCR product without restriction enzyme digestion. For this purpose you will need a vector with blunt ends and the ligation efficiency will be lower. Furthermore, you will end up with a number of clones that have the insert the wrong way round. Therefore, sequencing is an important step!

You should determine the sequence of your cloned DNA insert in any case to make sure you have the right gene in the right orientation and there are no random mutations that might yield a mutant protein that you wouldn't want to have at this point. Having the sequence determined is a relatively straightforward step and very fast, compared to what is ahead.

Step 4: Express gene-product as recombinant protein in E.coli

Well, alright. What else would you want to do now?? This step can be difficult or not and nobody can tell you beforehand whether it is going to work or not. In general I would recommend the following basic strategy:

(i)Freshly transform an expression strain of E.coli (e.g. BL21 and its derivatives) with the expression plasmid and grow colonies on an LB-agarose plate with a suitable antibiotic. Don't forget the antibiotic; if you do (and you will; I promise): repeat that step!

(ii)Select one colony and use it to inoculate 5-10 ml of LB (and don't forget the antibiotic!!). Grow an overnight culture on a rotary shaker at 37°C at 225 rpm (or so; the bugs don't care, as long as they are well agitated).

(iii) Use this o/N culture to inoculate your expression medium. Either use LB (or similar rich media) or a synthetic minimal medium (e.g., if you want to express a Se-Met protein). If you use LB as an expression medium, be aware that the formulation contains small amount of lactose and that you will induce low-level expression of your recombinant protein. If your protein turns out to interfere with the primary metabolism of the E.coli, they might not survive. Such things happen for example when you try to overexpress regulatory proteins of primary metabolism, such as phosphofructokinase.

(iv) Grow the cells to an OD600 of approximately 0.5-0.7 at 37°C at 225 rpm. This will ensure your bacteria are in the logarithmic growth phase once you induce expression. By the way - I hope you did not forget to put a suitable antibiotic in the expression medium!

(v) Now it's time to induce expression. If you have chosen a strain that makes use of the T7 promoter under the control of the lacZ operon, you will induce expression by the addition of IPTG to whatever concentration the suppliers of the expression system recommend. Continue expression at the optimal temperature. I will describe below, how to determine that.

(vi) At some point you will have to stop expression. Usually, killing the bacteria takes care of that. The time at which you do this depends on your expression efficiency and it is crucial that you try to find the optimum conditions. See below for how to do that. Assuming you have a good idea of when to stop expression, you will now spin down the cells in a suitable cooled rotor, wash them briefly with 0.9% NaCl and resuspend them in a lysis buffer. There are many different recipes for lysis buffers around. Choose one that is suitable for the way of cell disintegration you want to use. I would recommend a neutral Tris or phosphate buffer with a little bit of EDTA and some PMSF added and disrupt the cells by sonication using ultrasound.

(vii) Once you have spun down the debris after sonication, you should determine whether your expressed protein is in the soluble supernatant or in inclusion bodies (i.e., in the pellet). Don't throw away either before you haven't determined where your protein is! Hopefully you will find it somewhere. That makes you ready for step 5.

By the way: how much recombinant protein can you expect? This is a question for which there is no universal answer. I have obtained everything from 1 mg/L expression medium, up to almost 1 g in the case of cytosolic creatine kinase. On average, I would say you can expect something between 10 and 50 mg/L medium. If the expression level in your case is much lower (say less than 1 mg/L), you might want to consider a different expression system or check whether your protein might kill the host cells. Remember: you still have to purify the protein and then crystallize it, so you better start out with a sufficient quantity to account for the losses along the way.

Step 5: Purify the protein

If everything worked up to that point, chances are very good that you will be able to get a purified protein. A simple first step in your purification efforts should be to exploit the thermostability of the Pyrobaculum proteins. Remember: the bug is a hyperthermophile, whereas E.coli, the expression host, is a mammalian enterobacterium that is not made to withstand temperatures close to boiling (Don't do any experiments on your personal strain of E.coli to prove me wrong, please). So: heating up the mixture of proteins where your expressed, recombinant one is, will denature practically all Coli-proteins. Upon cooling down that soup you will realize that heat denaturation is rarely reversible and the denatured proteins will flock out, ready to be spun off in a centrifuge. If you haven't lost your protein in that step, you will have a rather pure protein solution already. To finally clean it, you can now make use of the His6-tag and run the protein over a Ni-column (other metal ions work as well; if you want to use the BioCAD, Zn++ at a neutral pH has been shown to work best with the specific column material that Perseptive supplies). If that is not enough to purify the protein, you should not have to use more than one additional ion-exchange or size-exclusion step for a final polishing.

By the way - and this might not be very popular: how about some protein biochemistry at that point? It is usually quite nice to characterize your recombinant protein a little bit. For example: if it is an enzyme, what are the kinetic parameters, and temperature and pH-optima for the reaction, can you make some assumptions about the mechanism, does it behave different than known enzymes from this class? Try to invest some time thinking about the action of your protein. There are a tremendous number of possible experiments that you can do with minute amounts of the partially purified product, that will tell you a lot about the protein!

Step 6: Concentrate and crystallize

Chances are that once you have a pure protein, it will be rather dilute. Concentrate by whatever means you have to a suitable concentration, closely watching whether the protein starts to precipitate at a certain point. Remember that at this point you can also exchange the buffer for something more suitable for crystallization. With a bit of luck you will be able to obtain a protein solution suitable for crystallization and you can go forward and crystallize. Always remember: IF IT IS IMPORTANT, IT CRYSTALLIZES !

Step 7: Solve the structure

Please refer to Bernhard's crystallography tutorial for hints on how to do that 8)


If all the above steps worked for you, consider yourself lucky. If not: you can either go troubleshooting or choose another gene to work with. On average, expression and purification work for something between 10-20% of proteins without major trouble following standard protocols and an even lower percentage yields diffraction quality crystals in the end. Don't feel bad if you don't end up with pure protein on your first try. There are many ways of troubleshooting for the individual steps and Pyrobaculum has LOTS of ORFs!


Appendix 1: How to calculate the melting temperature of a PCR primer?

A simple formula that yields roughly correct results for a first estimate is:

Tm = 81.5 +16.6 lg([Na+]/(1.0+0.7[Na+])) + 0.41* %GC - 500/size

This formula holds for DNA/DNA adduction and is not valid if you want to hybridize RNA to DNA or RNA to RNA!

A web-interface to a program allowing for the variation of salt and DNA concentration can be found here. And a simple program enabling you to calculate melting temperatures and other parameters of your oligo can be downloaded here.

A thorough description can be found in Richard Owczarzy, et al.: Predicting Sequence-Dependent Melting Stability of Short Duplex DNA Oligomers, Biopolymers 44(1998), 217-239.

Notice : don't count the overhangs.....BR

Appendix 2: How to determine the optimal conditions for bacterial expression of a recombinant protein?

In order to figure out, what are the optimal conditions for the expression of your gene of interest, you will have to set up some trial expressions in small volume. There are two main conditions that you can vary aside from the obvious, the choice of medium. These are temperature and expression time. In terms of temperature, remember that E.coli have evolved to live in the intestine of mammals and thus function best at or around 37ºC. Normally, this is the best temperature to start your expression trials and you should set up small-scale expressions in 5-10 ml of medium at that temperature. Don't vary the growth time of your bacteria before starting expression. Grow them to an OD600 of 0.5-0.7 and induce expression. Then subsequently stop expression in the individual setups after one hour, two hours, three hours and so forth. You can do finer sampling, but usually 30' intervals are good enough and 1 hour intervals are generally alright. Then simply run samples of the individual setups on an SDS gel. If expression occurred, you will see a major protein band that is not present in the uninduced bacteria (you have thought of a control, haven't you?). This band will gradually get stronger with time until it reaches a maximum. After that it will either be stagnant or will start to disappear again. Obviously, the time at which you get maximum expression is what you want to go for. If you have set up many individual expression trials and realized that after five hours you still don't have maximum expression: stop there. One reason for this is that the bacteria at that time will start to metabolize the recombinant protein and you will end up with an increasing amount of partially hydrolyzed product that will be hard to remove from the intact protein.

The other variable to optimize is temperature. Sometimes, expression at 37ºC will yield expression of your protein in inclusion bodies. If this happens, you can either try to isolate the protein out of the inclusion bodies (sometimes not a bad idea), or you might want to try to reduce the expression temperature. This can lead to forcing your recombinant protein to be expressed as soluble, cytosolic protein. Again, you will have to vary expression times to optimize your expression levels and here it sometimes can become necessary to express the protein for a longer time, e.g. over night, to get sufficient quantities of recombinant protein.

 


Back to The Macromolecular Crystallography Home Page
LLNL Disclaimer
This World Wide Web site conceived and maintained by Bernhard Rupp (br@llnl.gov)
Last revised November 10, 1999 12:10
UCRL-MI-125269