Overlapping Genes (Genes within Genes)

In 1940s, Beadle and Tatum proposed one-gene-one protein hypothesis which explains that one gene encodes for one protein. However, if one gene consists of 1,500 base pairs, a protein of 500 amino acids in length would be synthesized. In addition, if the same sequence read in two different ways, two different amino acids would be synthesized by the same sequence of base pairs. It means, the same DNA sequence can synthesize more than one proteins at different time. It was realized for the first time when the total number of proteins synthesized by 0X174 exceeded from the coding potential of the phage genome. A similar phonemenon is found in the tumour virus SV40 where the total molecular weight of proteins (i.e. VP1, VP2 and VP3) synthesized by SV40 genes is much more than the size of the DNA molecule (5,200 base pairs i.e. 1,733 codons). From this observations the concept of overlapping genes has emerged.
Genetic map of ØX174. The gene B overlaps with gene A; gene K overlaps with A and C; gene E overlaps with gene D
Fig. 2.8. Genetic map of ØX174. The gene B overlaps with gene A; gene K overlaps with A and C; gene E overlaps with gene D.


For the first time Barrell et al. (1970) gave the evidence for the possibility of the above fact based on the overlapping genes found in bacteriophage ØX174. This virus contains an icosahedral capsid with a knob at each vertex enclosing a single stranded circular DNA (Fig 2.8). Sanger et al. (1977) mapped the whole nucleic acid sequence of phage ØX174, G4 DNA. Barrell et al (1976) have found the sequences of genes D,E and J, and the whole sequence of ØX174.

The ØX174 strand is made up of 5,386 nucleotides of known base sequences. If a single reading frame was used, about 1,795 amino acids would be encoded in the sequence and with an average protein size of about 400 amino acids, only 4-5 proteins could be made.
In contrast, ØX174 makes 11 proteins containing a total of more than 2,300 amino acids. The genes A and B have been characterized by Weisbeek et al (1977). The sequence of gene A is now known to contain all of gene B. Gene B is translated in a different reading frame from gene A. Similarly gene E is encoded within gene D. Another translational control mechanism expands the use of gene A. The 37 K Dalton gene A* protein is formed by reinitiating translation at an internal AUG codon within gene A message. The two translational proteins are synthesized by the same translational phase but the functions of the two proteins differ (Reinberg et al, 1983). Proleta K initiated near the end of gene A,  includes the base sequence of gene B, and terminates in gene C. For example, a reading frame of ......G,AAG,TTA,ACA……. nucleotides encodes the amino acids lysine, leucine and threonine. However, after reading the frame one nucleotide earlier, the codes become.. GAA,GTT,AAC, A... that encode glutamine, valine and asparagine, respectively.

It is obvious that by shifting the reading frame i.e. overlapping the code, the same gene can encode two different proteins. Similarly, in the nucleotide sequence .... TAATG...., TAA acts as termination codon of D gene, and ATG acts as the initiation codon of gene J. Here the nucleotide ‘A’ between A and T overlaps between the two codes. Therefore, the amino acid sequence of A* is similar to a segment of protein A. In addition, overlapping genes have also been detected in  virus SV40, and tryptophan mRNA of E. coli.