NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.
Cooper GM. The Cell: A Molecular Approach. 2nd edition. Sunderland (MA): Sinauer Associates; 2000.
As in most areas of molecular biology, studies of E. coli have provided the model for subsequent investigations of transcription in eukaryotic cells. As reviewed in Chapter 3, mRNA was discovered first in E. coli. E. coli was also the first organism from which RNA polymerase was purified and studied. The basic mechanisms by which transcription is regulated were likewise elucidated by pioneering experiments in E. coli, in which regulated gene expression allows the cell to respond to variations in the environment, such as changes in the availability of nutrients. An understanding of transcription in E. coli has thus provided the foundation for studies of the far more complex mechanisms that regulate gene expression in eukaryotic cells.
The principal enzyme responsible for RNA synthesis is RNA polymerase, which catalyzes the polymerization of ribonucleoside 5′-triphosphates (NTPs) as directed by a DNA template. The synthesis of RNA is similar to that of DNA, and like DNA polymerase, RNA polymerase catalyzes the growth of RNA chains always in the 5′ to 3′ direction. Unlike DNA polymerase, however, RNA polymerase does not require a preformed primer to initiate the synthesis of RNA. Instead, transcription initiates de novo at specific sites at the beginning of genes. The initiation process is particularly important because this is the primary step at which transcription is regulated.
E. coli RNA polymerase, like DNA polymerase, is a complex enzyme made up of multiple polypeptide chains. The intact enzyme consists of four different types of subunits, called α, β, β′, and σ (Figure 6.1). The σ subunit is relatively weakly bound and can be separated from the other subunits, yielding a core polymerase consisting of two α, one β, and one β′ subunits. The core polymerase is fully capable of catalyzing the polymerization of NTPs into RNA, indicating that σ is not required for the basic catalytic activity of the enzyme. However, the core polymerase does not bind specifically to the DNA sequences that signal the normal initiation of transcription; therefore, the σ subunit is required to identify the correct sites for transcription initiation. The selection of these sites is a critical element of transcription because synthesis of a functional RNA must start at the beginning of a gene.
E. coli RNA polymerase. The complete enzyme consists of five subunits: two α, one β, one β′, and one σ. The σ subunit is relatively weakly bound and can be dissociated from the other four subunits, which (more. )
The DNA sequence to which RNA polymerase binds to initiate transcription of a gene is called the promoter. The DNA sequences involved in promoter function were first identified by comparisons of the nucleotide sequences of a series of different genes isolated from E. coli. These comparisons revealed that the region upstream of the transcription initiation site contains two sets of sequences that are similar in a variety of genes. These common sequences encompass six nucleotides each, and are located approximately 10 and 35 base pairs upstream of the transcription start site (Figure 6.2). They are called the -10 and -35 elements, denoting their position relative to the transcription initiation site, which is defined as the +1 position. The sequences at the -10 and -35 positions in different promoters are not identical, but they are all similar enough to establish consensus sequences—the bases most frequently found at each position.
Sequences of E. coli promoters. E. coli promoters are characterized by two sets of sequences located 10 and 35 base pairs upstream of the transcription start site (+1). The consensus sequences shown correspond to the bases most frequently found in different (more. )
Several types of experimental evidence support the functional importance of the -10 and -35 promoter elements. First, genes with promoters that differ from the consensus sequences are transcribed less efficiently than genes whose promoters match the consensus sequences more closely. Second, mutations introduced in either the -35 or -10 consensus sequences have strong effects on promoter function. Third, the sites at which RNA polymerase binds to promoters have been directly identified by footprinting experiments, which are widely used to determine the sites at which proteins bind to DNA (Figure 6.3). In experiments of this type, a DNA fragment is radiolabeled at one end. The labeled DNA is incubated with the protein of interest (e.g., RNA polymerase) and then subjected to partial digestion with DNase. The principle of the method is that the regions of DNA to which the protein binds are protected from DNase digestion. These regions can therefore be identified by comparison of the digestion products of the protein-bound DNA with those resulting from identical DNase treatment of a parallel sample of DNA that was not incubated with protein. Variations of this basic method, which employ chemical reagents to modify and cleave DNA at particular nucleotides, can be used to identify the specific DNA bases that are in contact with protein. Such footprinting analysis has shown that RNA polymerase generally binds to promoters over approximately a 60-base-pair region, extending from -40 to +20 (i.e., from 40 nucleotides upstream to 20 nucleotides downstream of the transcription start site). The σ subunit binds specifically to sequences in both the -35 and -10 promoter regions, substantiating the importance of these sequences in promoter function. In addition, some E. coli promoters have a third sequence, located upstream of the -35 region, that serves as a specific binding site for the RNA polymerase α subunit.
DNA footprinting. A sample containing fragments of DNA radiolabeled at one end is divided into two, and one half of the sample is incubated with a protein that binds to a specific DNA sequence within the fragment. Both samples are then digested with DNase, (more. )
In the absence of σ, RNA polymerase binds nonspecifically to DNA with low affinity. The role of σ is to direct the polymerase to promoters by binding specifically to both the -35 and -10 sequences, leading to the initiation of transcription at the beginning of a gene (Figure 6.4). The initial binding between the polymerase and a promoter is referred to as a closed-promoter complex because the DNA is not unwound. The polymerase then unwinds approximately 15 bases of DNA around the initiation site to form an open-promoter complex in which single-stranded DNA is available as a template for transcription. Transcription is initiated by the joining of two free NTPs. After addition of about the first 10 nucleotides, σ is released from the polymerase, which then leaves the promoter and moves along the template DNA to continue elongation of the growing RNA chain. As it travels, the polymerase unwinds the template DNA ahead of it and rewinds the DNA behind it, maintaining an unwound region of about 17 base pairs in the region of transcription.
Transcription by E. coli RNA polymerase. The polymerase initially binds nonspecifically to DNA and migrates along the molecule until the σ subunit binds to the -35 and -10 promoter elements, forming a closed-promoter complex. The polymerase then (more. )
RNA synthesis continues until the polymerase encounters a termination signal, at which point transcription stops, the RNA is released from the polymerase, and the enzyme dissociates from its DNA template. The simplest and most common type of termination signal in E. coli consists of a symmetrical inverted repeat of a GC-rich sequence followed by four or more A residues (Figure 6.5). Transcription of the GC-rich inverted repeat results in the formation of a segment of RNA that can form a stable stem-loop structure by complementary base pairing. The formation of such a self-complementary structure in the RNA disrupts its association with the DNA template and terminates transcription. Because hydrogen bonding between A and U is weaker than that between G and C, the presence of A residues downstream of the inverted repeat sequences is thought to facilitate the dissociation of the RNA from its template. Other types of transcription termination signals, in both prokaryotic and eukaryotic cells, depend on the binding of proteins that terminate transcription to specific DNA sequences, rather than on the formation of a stem-loop structure in the RNA.
Transcription termination. The termination of transcription is signaled by a GC-rich inverted repeat followed by four A residues. The inverted repeat forms a stable stem-loop structure in the RNA, causing the RNA to dissociate from the DNA template.
The pioneering studies of gene regulation in E. coli were carried out by François Jacob and Jacques Monod in the 1950s. These investigators and their colleagues analyzed the expression of enzymes involved in the metabolism of lactose, which can be used as a source of carbon and energy via cleavage to glucose and galactose (Figure 6.6). The enzyme that catalyzes the cleavage of lactose (β-galactosidase) and other enzymes involved in lactose metabolism are expressed only when lactose is available for use by the bacteria. Otherwise, the cell is able to economize by not investing energy in the synthesis of unnecessary RNAs and proteins. Thus, lactose induces the synthesis of enzymes involved in its own metabolism. In addition to requiring β-galactosidase, lactose metabolism involves the products of two other closely linked genes: lactose permease, which transports lactose into the cell, and a transacetylase, whose function in lactose metabolism is still unknown. On the basis of purely genetic experiments, Jacob and Monod deduced the mechanism by which the expression of these genes was regulated, thereby formulating a model that remains fundamental to our understanding of transcriptional regulation.
Metabolism of lactose. β-galactosidase catalyzes the hydrolysis of lactose to glucose and galactose.
The starting point in this analysis was the isolation of mutants that were defective in regulation of the genes involved in lactose utilization. These mutants were of two types: constitutive mutants, which expressed all three genes even when lactose was not available, and noninducible mutants, which failed to express the genes even in the presence of lactose. Genetic mapping localized these regulatory mutants to two distinct loci, called o and i, with o located immediately upstream of the structural gene for β-galactosidase. Mutations affecting o resulted in constitutive expression; mutants of i were either constitutive or noninducible.
The function of these regulatory genes was probed by experiments in which two strains of bacteria were mated, resulting in diploid cells containing genes derived from both parents (Figure 6.7). Analysis of gene expression in such diploid bacteria provided critical insights by defining which alleles of these regulatory genes are dominant and which recessive. For example, when bacteria containing a normal i gene (i + ) were mated with bacteria carrying an i gene mutation resulting in constitutive expression (an i - mutation), the resulting diploid bacteria displayed normal inducibility; therefore, the normal i + gene was dominant over the i - mutant. In contrast, matings between normal bacteria and bacteria with an o c mutation (constitutive expression) yielded diploids with the constitutive expression phenotype, indicating that o c is dominant over o + . Additional experiments in which mutations in o and i were combined with different mutations in the structural genes showed that o affects the expression of only the genes to which it is physically linked, whereas i affects the expression of genes on both chromosome copies in diploid bacteria. Thus, in an o c /o + cell, only the structural genes that are linked to o c are constitutively expressed. In contrast, in an i + /i - cell, structural genes on both chromosomes are regulated normally. These results led to the conclusion that o represents a region of DNA that controls the transcription of adjacent genes, whereas the i gene encodes a regulatory factor (e.g., a protein) that can diffuse throughout the cell and control genes on both chromosomes.
Regulation of β-galactosidase in diploid E. coli. The mating of two bacterial strains results in diploid cells that contain genes from both parents. In these examples, it is assumed that the genes encoding β-galactosidase (the z genes) (more. )
The model of gene regulation developed on the basis of these experiments is illustrated in Figure 6.8. The genes encoding β-galactosidase, permease, and transacetylase are expressed as a single unit, called an operon. Transcription of the operon is controlled by o (the operator), which is adjacent to the transcription initiation site. The i gene encodes a protein that regulates transcription by binding to the operator. Since i - mutants (which result in constitutive gene expression) are recessive, it was concluded that these mutants failed to make a functional gene product. This result implies that the normal i gene product is a repressor, which blocks transcription when bound to o. The addition of lactose leads to induction of the operon because lactose binds to the repressor, thereby preventing it from binding to the operator DNA. In noninducible i mutants (which are dominant over i + ), the repressor fails to bind lactose, so expression of the operon cannot be induced.
Negative control of the lac operon. The i gene encodes a repressor which, in the absence of lactose (top), binds to the operator (o) and blocks transcription of the three structural genes (z, β-galactosidase; y, permease; and a, transacetylase). (more. )
The model neatly fits the results of the genetic experiments from which it was derived. In i - cells, the repressor is not made, so the lac operon is constitutively expressed. Diploid i + /i - cells are normally inducible, since functional repressor is encoded by the i + allele. Finally, in o c mutants a functional operator has been lost and repressor cannot be bound. Consequently, o c mutants are dominant but affect the expression only of linked structural genes.
Confirmation of this basic model has since come from a variety of experiments, including Walter Gilbert's isolation, in the 1960s, of the lac repressor and analysis of its binding to operator DNA. Molecular analysis has defined the operator as approximately 30 base pairs of DNA, starting a few bases before the transcription initiation site. Footprinting analysis has identified this region as the site to which the repressor binds, blocking transcription. As predicted, lactose binds to the repressor, which then no longer binds to operator DNA. Also as predicted, o c mutations alter sequences within the operator, thereby preventing repressor binding and resulting in constitutive gene expression.
The central principle of gene regulation exemplified by the lactose operon is that control of transcription is mediated by the interaction of regulatory proteins with specific DNA sequences. This general mode of regulation is broadly applicable to both prokaryotic and eukaryotic cells. Regulatory sequences like the operator are called cis-acting control elements, because they affect the expression of only linked genes on the same DNA molecule. On the other hand, proteins like the repressor are called transacting factors because they can affect the expression of genes located on other chromosomes within the cell. The lac operon is an example of negative control because binding of the repressor blocks transcription. This, however, is not always the case; many trans-acting factors are activators rather than inhibitors of transcription.
The best-studied example of positive control in E. coli is the effect of glucose on the expression of genes that encode enzymes involved in the breakdown (catabolism) of other sugars (including lactose) that provide alternative sources of carbon and energy. Glucose is preferentially utilized, so as long as glucose is available, enzymes involved in catabolism of alternative energy sources are not expressed. For example, if E. coli are grown in medium containing both glucose and lactose, the lac operon is not induced and only glucose is used by the bacteria. Thus, glucose represses the lac operon even in the presence of the normal inducer (lactose).
Glucose repression (generally called catabolite repression) is now known to be mediated by a positive control system, which is coupled to levels of cyclic AMP (cAMP) (Figure 6.9). In bacteria, the enzyme adenylyl cyclase, which converts ATP to cAMP, is regulated such that levels of cAMP increase when glucose levels drop. cAMP then binds to a transcriptional regulatory protein called catabolite activator protein (CAP). The binding of cAMP stimulates the binding of CAP to its target DNA sequences, which in the lac operon are located approximately 60 bases upstream of the transcription start site. CAP then interacts with the α subunit of RNA polymerase, facilitating the binding of polymerase to the promoter and activating transcription.
Positive control of the lac operon by glucose. Low levels of glucose activate adenylyl cyclase, which converts ATP to cyclic AMP (cAMP). Cyclic AMP then binds to the catabolite activator protein (CAP) and stimulates its binding to regulatory sequences (more. )
Both the positive and negative control mechanisms that we have discussed act at the level of initiation of transcription. An additional mechanism, transcriptional attenuation, regulates the expression of some genes by controlling the ability of RNA polymerase to continue elongation past specific sites. This mode of regulation has been described best in the E. coli trp operon, which encodes five enzymes involved in biosynthesis of the amino acid tryptophan. These genes are expressed only when tryptophan is not available to the cell in its environment, since otherwise the synthesis of additional tryptophan is unnecessary.
The trp operon is regulated in part by a repressor that, when bound to tryptophan, blocks transcription (Figure 6.10). However, transcriptional attenuation provides an additional level of control that results in more stringent regulation than could be achieved by repression of initiation alone. The site of attenuation is located 162 nucleotides downstream of the transcription start site. If tryptophan is abundant, most transcription terminates at this site; only if tryptophan is scarce does transcription continue to yield functional Trp mRNA.
Regulation of the tryptophan operon. The operon contains five structural genes involved in the biosynthesis of tryptophan: trpE, D, C, B, and A. Expression of these genes is controlled at two levels. The trpR gene encodes a repressor that, in the presence (more. )
The mechanism of attenuation depends on the fact that translation in bacteria is coupled with transcription, so ribosomes begin translating the 5′ end of an mRNA while it is still being synthesized. Thus, the rate of translation can affect the structure of the growing RNA chain, which in turn determines whether further transcription can continue. Transcription termination is signaled by a stem-loop structure that forms by complementary base pairing between two specific sequences of the growing Trp mRNA chain (Figure 6.11). This structure forms if translation of the growing chain is proceeding at a normal rate, as it does when tryptophan is present in adequate supply. If tryptophan is scarce, however, protein synthesis stalls at a critical region of the message. If this occurs, the ribosomes bound to the mRNA block formation of the transcription-terminating stem loop, allowing Trp mRNA synthesis to continue.
Mechanism of transcriptional attenuation. The trp mRNA is translated while still being synthesized. In the presence of high levels of tryptophan, the ribosomes proceed along the message slightly behind the site of transcription. Under these conditions, (more. )
The critical region of Trp mRNA contains two adjacent tryptophan codons, so the rate of translation is highly dependent on tryptophan levels; this is the link between transcriptional attenuation and the availability of tryptophan. If tryptophan levels in the cell are low, the ribosome stalls at this point and transcription of Trp mRNA continues. If tryptophan is abundant, translation continues and transcription is terminated.
By agreement with the publisher, this book is accessible by the search feature, but cannot be browsed.