Whereas each copy of the human genome contains about 20,000 protein-coding genes, it is also home to more than 1 million copies of a short interspersed repetitive element (SINE) known as Alu. For this reason, Doolittle (1997), perhaps only half jokingly, suggested that the genomes of humans “might be ironically viewed as vehicles for the replication of Alu sequences”.
Alu elements are now known to be transposable elements and are restricted to primate genomes, though neither of these facts was recognized until several years after they were first discovered. They are not capable of autonomous movement and replication in the genome, rather they are “parasites” of other elements like LINE-1 which encode their own means of transposition. Their origin seems to trace to a duplication of a 7SL RNA gene near the origin of the primates. Today, some Alus are implicated in genomic functions while others continue to cause disease — as a general group of sequences, some are parasitic, some are mutualistic, and many are probably commensal, neither conferring benefit nor doing harm. For an excellent overview the biology of Alu elements, see Batzer and Deininger (2002).
The history that is most commonly recounted when it comes to the study of genomic components like pseudogenes and transposable elements is that they were long dismissed as irrelevant “junk”. As I noted with reference to pseudogenes, they were not interpreted this way when discovered, even though this happened 5 years after the idea of “junk DNA” was proposed (Jacq et al. 1977).
Alu elements were first isolated in 1979, and are so named because this involved digestion of genomic DNA with the AluI restriction enzyme (which in turn is named for the bacterium from which it is derived, Arthrobacter luteus) (Houck et al. 1979). Again, if the typical story about noncoding DNA is true, then we should expect the discovery of these elements to have been discussed in terms of their biological insignificance.
Here is what Houck et al. (1979) actually said about the newly discovered elements:
Renatured DNA from human and many other eukaryotes is known to contain 300-nucleotide duplex regions from renatured repeated sequences. These short repeated DNA sequences are widely believed to be interspersed with single copy DNA sequences. In this work we show that at least half of these 300-nucleotide duplexes share a cleavage site for the restriction enzyme AluI. This site is located 170 nucleotides from one end. This Alu family of repeated sequences makes up at least 3% of the genome and is present in several hundred thousand copies.
…
Since these 300-nucleotide sequences, as well as their interspersed unique sequences, occupy such a large fraction of the genome in widely divergent eukaryotes, one imagines that they serve some important biological function. Among other possibilities, it has been proposed that they are involved in gene regulation. Unfortunately, their function remains unproven. In deciding what biological function these repeated sequences might serve, it is important to know the number of different families to which they belong.
…
It has been proposed that the 300-nucleotide interspersed repeated sequences perform a regulatory function either at the DNA or RNA level. The inclusion of over half of these 300-nucleotide sequences in a single family of repetitive sequences (the Alu family) would limit their ability to function as complex regulatory elements.
…
We have found in this work that at least half of 300-nucleotide inverted repeated DNA sequences and half of all other 300-nucleotide repeated sequences belong to one family. Comparing our independent results on inverted repeated DNA sequences, it seems likely that the heterogeneous nuclear RNA duplexes studied by Jalinek are transcribed from the Alu family of repeated sequences. We are currently testing this hypothesis by RNA-DNA hybridization and DNA sequencing. This hypothesis suggests that the function of the Alu family occurs at the level of the heterogeneous nuclear RNA. It has been proposed that such repeated sequences might be processing sites for heterogeneous nuclear RNA. Although other possibilities cannot be ruled out at this time, we find this to be an especially attractive proposal for the function of a single simple class of repeated sequences that are so widely distributed throughout the genome.
In a second paper published in the following year, Rubin et al. (1980) said:
The biological function of this family of sequences is unknown. We and our colleagues have recently noted sequence similarities between a selected portion of the Alu family and several other RNA or DNA sequences, which are known or suspected to be involved in DNA replication, transcription control, and mRNA processing. Together these observations reinforce our belief that a family of DNA sequences which includes 300,000 highly conserved members interspersed throughout much of the mammalian genome, must have an important function.
____________
Part of the Quotes of interest series.
____________
Batzer, M.A. and P.L. Deininger. 2002. Alu repeats and human genomic diversity. Nature Reviews Genetics 3: 370-380.
Doolittle, W.F. 1997. Why we still need basic research. Annals of the Royal College of Physicians and Surgeons of Canada 30: 76-80.
Houck, C.M., F.P. Rinehart, and C.W. Schmid. 1979. A ubiquitous family of repeated DNA sequences in the human genome. Journal of Molecular Biology 132: 289-306.
Jacq, C., J.R. Miller, and G.G. Brownlee. 1977. A pseudogene structure in 5S DNA of Xenopus laevis. Cell 12: 109-120.
Rubin, C.M., C.M. Houck, P.L. Deininger, T. Friedmann, and C.W. Schmid. 1980. Partial nucleotide sequence of the 300-nucleotide interspersed repeated human DNA sequences. Nature 284: 372-374.