The enzyme RNA polymerase
performs the delicate task of unwinding the two strands of DNA
and transcribing the genetic information into a strand of RNA. But how does it know where to start? Our cells contain 30,000 genes encoded in billions of nucleotides. For each gene, the cell must be able to start transcription at the right place and at the right time.
Specialized DNA sequences next to genes, called promoters, define the proper start site and direction for transcription. Promoters vary in sequence and location from organism to organism. In bacteria, typical promoters contain two regions that interact with the sigma subunit of their RNA polymerase. The sigma subunit binds to these DNA sequences, assists the start of transcription, and then detaches from the polymerase as it continues transcription through the gene. Our cells have a far more complex promoter system, using dozens of different proteins to ensure that the proper RNA polymerase is targeted to each gene. The TATA-binding protein is the central element of this system.
The TATA Box
Our protein-coding genes have a characteristic sequence of nucleotides, termed the TATA box, in front of the start site of transcription. The typical sequence is something like T-A-T-A-a/t-A-a/t, where a/t refers to positions that can be either A or T. Surprisingly many variations on this theme also work, and one of the challenges in the study of transcription is discovering why some sequences work and others don't. The TATA-binding protein (sometimes referred to as TBP) recognizes this TATA sequence and binds to it, creating a landmark that marks the start site of transcription. When the first structures of TATA-binding protein were determined, researchers discovered that TATA-binding protein is not gentle when it binds to DNA. Instead, it grabs the TATA sequence and bends it sharply, as seen in PDB entries 1ytb
TATA-binding protein works as part of a larger transcription factor, TFIID, that starts the process of transcription. After it binds to the promoter, it recruits additional transcription factors. TFIIB, shown at the top here from PDB entry 1vol
, binds next. Then a string of other transcription factors bind, constructing a large protein complex that decides whether or not to start transcription. These may include transcription activators, such as TFIIA shown in the middle from PDB entry 1ytf
, that promote the start of transcription. Other factors inhibit the start of transcription, such as the transcription regulator NC2 (negative cofactor 2), shown at the bottom from PDB entry 1jfi
. In all of these pictures, TATA-binding protein is shown in blue, a small piece of DNA is shown in red and the transcription factor is shown in green.
TATA-binding protein uses two types of interactions to recognize and hold the TATA sequence, as seen in this structure from PDB entry 1ytb
. First, as shown at the top, it has a string of lysine and arginine amino acids (colored dark blue) that interact with the phosphate groups of the DNA (colored bright yellow and red). This glues the protein to the DNA. Second, the protein uses specially-placed amino acids to interact with DNA bases. As shown in the lower picture, four phenylalanine amino acids jam into the DNA minor groove and form the kinks that bend the DNA. There are also two symmetrical asparagine amino acids that form hydrogen bonds at the very center. The combination of the unusual flexibility of TATA DNA sequences and these specific hydrogen bonds allows TATA-binding protein to recognize the proper sequence.
As you are looking at these structures yourself, notice that TATA-binding protein, even though it is composed of a single protein chain, is composed of two symmetrical halves. This symmetry is easily seen in the two pairs of phenylalanines and the two asparagines shown in the lower figure. It is thought that an ancient gene duplication created this protein by combining two copies of the same gene. For more information on TATA-binding protein from a genomics perspective, visit the Protein of the Month at the European Bioinformatics Institute.
These pictures were created with RasMol. You can create similar pictures by clicking on the accession codes here and picking one of the options for 3D viewing. The phenylalanines shown above are numbers 99, 116, 190, and 207, and the asparagines are numbers 69 and 159.