Quotes of interest — satellite DNA.

Satellite DNA, also known as tandemly repeated DNA, represents a diverse class of highly repetitive elements consisting of clusters of short repeated sequences. The general category of satellite DNA is now divided into several categories according to the size of the individual repeats, though the specific classification scheme can vary among authors. Thus, one may read reference to satellites (up to hundreds of base pairs per repeat), minisatellites (10-100bp per repeat), and microsatellites (only a few bp per repeat).

The term “satellite” in the genetic sense was first coined by the Russian cytologist Sergius Navashin in 1912, initially in Russian (“sputnik”) and Latin (satelle), and only later translated to “satellite” (Battaglia 1999). This original usage referred to the morphology of a chromosome possessing a secondary constriction at a certain point along its length. The more familiar usage of “satellite” relates to a small band of DNA with a density different (usually lower, because of a high AT-content) from the bulk of the genomic DNA, and which becomes separated from the main band following CsCl centrifugation (Kit 1961; Sueoka 1961). Satellite DNA was discovered in the early 1960s as an artifact of genetic studies involving this technique of centrifugation.

Satellite DNAs are non-protein-coding, and these and other repetitive sequences should have been neglected according to standard renditions of the history of research on noncoding DNA. Does the scientific literature support this claim?

Before “junk DNA” (pre-1972):

A concept that is repugnant to us is that about half of the DNA of higher organisms is trivial or permanently inert (on an evolutionary time scale). Furthermore, at least some of the members of DNA families find expression as RNA. We therefore believe that the organization of DNA into families of related sequences will ultimately be found important to the phenotype. However, at present we can only speculate on the actual role of the repeated sequences.

Britten, R.J. and D.E. Kohne. 1968. Repeated sequences in DNA. Science 161: 529-540.

The existence of repeated sequences in higher organisms led us independently to consider models of gene regulation of the type we describe here. This model depends in part on the general presence of repeated DNA sequences. The model suggests a present-day function for these repeated DNA sequences in addition to their possible evolutionary role as the raw material for creation of novel producer gene sequences. The apparently universal occurrence of large quantities of sequence repetition in the genomes of higher organisms suggests strongly that they have an important current function.

Britten, R.J. and E.H. Davidson. 1969. Gene regulation for higher cells: a theory. Science 165: 349-357.

Although we have localized mouse satellite DNA in the centromeric heterochromatin, this localization does not establish a function for either satellite DNA or heterochromatin. It seems that this function is one which is necessary to the chromosome since the proportion of satellite DNA is maintained in established mouse cell lines even though the chromosomes have undergone other morphological change.

Pardue, M.L. and J.G. Gall. 1970. Chromosomal localization of mouse satellite DNA. Science 168: 1356-1358.

One of the potentially significant aspects of this approach is that it can discover the location of defined DNA sequences on the chromosomes and relate this to their functional distribution at interphase. Thus is seems clear, from the evidence of the enriched content of nucleoli and of centric regions of chromosomes, that these become associated in interphase. The respective functions of centromeres and satellite DNA in this phenomenon are not clear, but a mechanism which obviously coordinates the physical, and perhaps the functional, aspects of different chromosomes may rely to some extent on the chemical homology of the associated satellite DNA.

Jones, K.W. 1970. Chromosomal and nuclear location of mouse satellite DNA in individual cells. Nature 225: 912-915.

Recent reports indicate that the DNA of constitutive heterochromatin is composed to a large extent of short repeated polynucleotide sequences, termed satellite DNA. This discovery has necessitated a critical review of current ideas concerning the origin and function of this portion of the genome of higher organisms. A careful appraisal of the information that has accumulated about heterochromatin since the time of Heitz and on satellite DNA during the last decade suggests that these entities have vital structural functions: they maintain nuclear organization, protect vital regions of the genome, serve as an early pairing mechanism in meiosis, and aid in speciation.

With the assumption that a portion that comprises some 10 percent of the genomes in higher organisms cannot be without a raison d’être, an extensive review led us to conclude that a certain amount of constitutive heterochromatin is essential in multicellular organisms at two levels of organization, chromosomal and nuclear. At the chromosomal level, constitutive heterochromatin is present around vital areas within the chromosomes. Around the centromeres, for example, heterochromatin is believed to confer protection and strength to the centromeric chromatin. Around secondary constrictions, heterochromatic blocks may ensure against evolutionary change of ribosomal cistrons by decreasing the frequency of crossing-over in these cistrons in meiosis and absorbing the effects of mutagenic agents. During meiosis heterochromatin may aid in the initial alignment of chromosomes prior to synapsis and may facilitate speciation by allowing chromosomal rearrangement and providing, through the species specificity of its DNA, barriers against cross-fertilization.

At the nuclear level of organization, constitutive heterochromatin may help maintain the proper spatial relationships necessary for the efficient operation of the cell through the stages of mitosis and meiosis. In the unicellular procaryotes, the presence of a small amount of genetic information in one chromosome obviates the need for constitutive heterochromatin and a nuclear membrane. At higher levels of organization, with an increase in the size of the genome and with evolution of cellular and sexual differentiation, the need for compartmentalization and structural components in the nucleus became imminent. The portion of the genome that was concerned with synthesis of ribosomal RNA was enlarged and localized in specific chromosomes, and the centromere became part of each chromosome when the mitotic spindle was developed in evolution. Concomitant with these changes in the genome, repetitive sequences in the form of constitutive heterochromatin appeared, probably as a result of large-scale duplication. The repetitive DNA’s were kept through natural selection because of their importance in preserving these vital regions and in maintaining the structural and functional integrity of the nucleus.

The association of satellite (or highly repetitive) DNA with constitutive heterochromatin is understandable, since it stresses the importance of the structural rather than transcriptional roles of these entities. Nuclear satellite DNA’s have one property in common despite their species specificity, namely heterochromatization. In this sense the apparent species specificity of satellite DNA may be the result of natural selection for duplicated short polynucleotide segments that are nontranscriptional and can be utilized in specific structural roles.

Yunis, J.J. and W.G. Yasmineh. 1971. Heterochromatin, satellite DNA, and cell function. Science 174: 1200-1209.

After “junk DNA” (1972-1980):

It has recently become possible to measure the interspersion of repetitive and single-copy DNA sequences and to estimate the length of the interspersed sequence elements. Interspersion of repetitive and non-repetitive sequences appears to be a general, if not universal, property of higher organism DNA. Similarities in the lengths of the different classes of sequence are present in the two species for which measurements are available.

These patterns are very likely of functional significance. It is our purpose in this section to focus on the evidence which, in our judgment, leads toward understanding the functional organization of the genome. We do not intend to review the entire subject of DNA sequence organization, and, for example, we only touch on the large literature dealing with satellite DNAs.

In concluding, we return to the question of the organization of DNA sequences. Our approach to gene regulation implies that the location of repetitive sequences provides the hereditary physical basis for the patterns of gene regulation. From this viewpoint, perhaps the most direct and crucial approach to the mechanism of gene regulation in higher organisms now available is the study of DNA sequence organization. More generally, an argument can be made that whether or not this particular model of gene regulation contains some elements of reality, the placement of sequences in the genome is bound to play a basic and significant role. Among the criteria of usefulness for models of gene regulation, therefore, is the extent to which they specify the structural and functional properties of DNA sequence organization. The present state of our technology, in particular of nucleic acid reassociation technology, suggests that the tools are now in hand to unravel the patterns of DNA sequence organization and their functional meaning.

Davidson, E.H. and R.J. Britten. 1973. Organization, transcription, and regulation in the animal genome. Quarterly Review of Biology 48: 565-613.

The DNA of eukaryotic organisms contains serially repeated sequences which vary in amount and complexity from one species to the next. Some of these sequences differ from the bulk of the DNA in G + C content, and hence appear as “satellites” when the DNA is banded in a CsCl density gradient. Several satellite DNAs, such as those of the mouse, the fly, Rhynchosciara and several species of Drosophila have been shown by in situ RNA-DNA hybridization to be located in the centromeric hetero-chromatin. However, very little is known about the function of satellite DNAs. There is no evidence that they code for proteins, and it is unlikely that they are even transcribed within the cell.

Because of their simple sequences the satellites of D. virilis obviously have no coding function for ordinary proteins. This conclusion is in keeping with the fact, known since the 1920’s, that the heterochromatin of Drosophila contains only a very few genes. Also because of their simple structure, and especially because they are not located in the genetic part of the chromosome, the satellites are poor candidates for regulatory genes. It is difficult to postulate any generalized function for the satellites, necessary for all cells of the organism, since the amount of satellite DNA is reduced so drastically in the polytene tissue. Similarly, there is evidence from D. melanogaster that large segments of the heterochromatin can be deleted without adverse effects either on viability or on the normal mitotic behavior of the chromosomes. Indeed the major known effect of deletion of heterochromatin, as in the sc4L scaR chromosome, is disturbance of meiotic disjunction. If the satellite DNAs have any function, it would seem to lie in the rather ill-defined category of “chromosome mechanics”, possibly including chromosome folding, meiotic pairing, or disjunction. One could even speculate that the major role is an evolutionary one, permitting only chromosomes of closely related populations to pair in meiosis, or to be involved in interchromosome exchange of the sort seen regularly in “Robertsonian fusions”.

Gall, J.G. and D.D. Atherton. 1974. Satellite DNA sequences in Drosophila virilis. Journal of Molecular Biology 85: 633-664.

An increasing proportion of the mysteriously abundant DNA of higher organisms is becoming easier to comprehend, in general terms at least. Variations in nuclear DNA content among organisms are being correlated with specific types of non-genic DNA. Several levels of apparent “bureaucracy” in the genome are becoming defined: (1) unique sequences including structural genes and other specific sequences occurring in one or two copes per genome, (2) repeated genes in a few special instances requiring high output of gene products, (3) moderately repetitive DNA sequences that are interspersed in several patterns with DNA of levels (1) and (2) and that may be involved in regulation of gene expression, and (4) highly repetitive and satellite DNA sequences, which are variable in quantity, located in massive tandem arrays, and are organized into condensed forms of chromatin. The present report has dealt with the fourth level of the hierarchy and has described its involvement in the determination of the macrostructure of chromosomes and the genome as a whole. This fourth level appears to exert the most global form of control through playing roles in adaptation to the environment and in the evolution of new species. The term “chromosome-engineering DNA” seems to express appropriately the mode of action of highly reiterated, simple sequence DNA.

Hatch, F.T., A.J. Bodner, J.A. Mazrimas, and D.H. Moore. 1976. Satellite DNA and cytogenetic evolution: DNA quantity, satellite DNA and karyotypic variations in kangaroo rates (Genus Dipodomys). Chromosoma 58: 155-168.

Proposed functions for satellite DNA were evaluated and formally set forth by Walker (1971) and have since been expanded by Mazrimas and Hatch (1972), Lagowski et al. (1973), Lee (1975), Bostock (1971), Walker (1972), and Comings (1972). In a masterful summary and evaluation of current ideas relating repeated DNA to the organization of the eukaryote chromosome (Cold Spring Harbour Symposia on Quantitative Biology 1973) Swift stated that the function of simple sequence DNAs not only appeared to have most investigators mystified, but that the present theories concerning their function were not accepted with much enthusiasm. He did, however, point out that “There is one major hope for making sense of the fact that many higher organisms seem to carry in every nucleus a large portion of their DNA that looks superficially to be completely worthless. This lies in the comparative approach. When do simple sequence DNAs arise in evolution? Can we find two closely related species one with and one without a major block of heterochromatin?”.

In summary, we believe that the Atractomorpha results focus attention on aspects of repeated DNA which are quite different to previously postulated functions. We argue that a large proportion of the highly repeated localised DNA as well as some of the repeated interspersed DNA acts in regularizing recombinational frequency and position. Thus if repeated DNA really does play a role in homologue recognition and chromosome pairing, it is now clear that only a minimum amount functions in this way, and this minimum amount need not be expressed as visible heterochromatin (as in A. australis).

Miklos, G.L.G. and R.N. Nankivell. 1976. Telomeric satellite DNA functions in regulating recombination. Chromosoma 56: 143-167.

Although much discussion has centred on the possible functions of satellite DNA (Edelman and Gally, 1970; Kohne, 1970; Walker, 1971a, b; Bostock, 1971; Yunis and Yasmineh, 1971; Comings, 1972; Rae, 1972; Jones, 1973; Hennig, 1973; Swift, 1973; Southern, 1974; Tartof, 1975; Hsu, 1975; Hatch et al., 1976) the major problem in evaluating function has been a lack of direct experimental manipulation of the satellite DNA content of any chromosome. Of the postulated functions, the more common ones would assign a role for satellite DNA in determining centromere strength (Walker, 1971a, b), aspects of chromosome pairing for regular segregation of homologs (Yunis and Yasmineh, 1971), involvement in the processes of speciation (Hatch et al., 1976), and alterations in the recombination system (Miklos and Nankivell, 1976).

The most important aspect of satellite DNA remains the nature of its functions. Although a large body of data has been gathered concerning its structure, distribution and properties in several different organisms, most of these results have in fact neither supported nor disproved any one of the particular hypotheses of function (see Comings, 1972; Swift, 1973; Hsu, 1975; Miklos and Nankivell, 1976; for evaluations of functions). The most popular hypothesis on satellite DNA function has been, and still is, that satellite DNA is involved in some aspect of chromosome mechanics such as chromosome pairing.

Yamamoto, M. and G.L.G. Miklos. 1978. Genetic studies on heterochromatin in Drosophila melanogaster and their implications for the functions of satellite DNA. Chromosoma 66: 71-98.

Satellites constitute from 1% to 65% of the total DNA of numerous organisms, including that of animals, plants, and prokaryotes. Their existence has been known for about 15 years, but, although it is thought that they must be biologically important, with few exceptions … their functions are still largely in the realm of speculation. This remains true despite their ubiquity and, except for polytenized tissues, their constancy as a fraction of the total DNA in all tissues of the particular animal or plant species in which they are observed.

The molecular diversity of this group of DNA’s, all taken together and classified as “satellites,” may be reflected in each satellite (or possibly groups of satellites) having a distinct function. This belief is based in part on the fact that there are many exceptions to nearly every generalization that has been made about satellite DNA’s.

Skinner, D.M. 1977. Satellite DNA’s. BioScience 27: 790-796.

The idea that the coordinate regulatory system of animal genomes is encoded in networks of repetitive sequence relationships is now a decade old. We and others have developed the concept that genes could be regulated by specific interactions occurring at repetitive sequences in the DNA genome. The premises have been (i) that the differentiated properties of animal cells derive from diverse and specific cytoplasmic messenger RNA (mRNA) sequence sets and (ii) that the cell-specific populations of mRNA’s result from cell-specific patterns of structural gene transcription.

Davidson, E.H. and R.J. Britten. 1979. Regulation of gene expression: possible role of repetitive sequences. Science 204: 1052-1059.

Evolutionary conservation of W [sex chromosomal] satellite DNA strongly suggests that functional constraints may have limited sequence divergence.

Singh, L., I.F. Purdom, and K.W. Jones. 1980. Sex chromosome associated satellite DNA: evolution and conservation. Chromosoma 79: 137-157.

Since the discovery that satellite DNA is located in heterochromatin, its possible role in mediating various heterochromatic functions has been the subject of both controversy and other reviews. Heterochromatin shows many very well defined functions in such diverse processes as chromosome pairing and segregation, position effect variegation, chromosome rearrangements, speciation, and recombination. All of these functions have been analyzed in great detail eithergenetically or cytogenetically, but in no case have the specific DNA sequences responsible for these phenomena been determined. Long tandem arrays that can change rapidly in evolution both qualitatively and quantitatively could act to disrupt normal chromosome behavior. However, the question remains whether such simple tandem arrays have an important positive contribution toward any of the functions attributed to heterochromatin.

With the application of recombinant DNA technology to such highly repeated sequences we now have the tools to characterize genetically altered states of heterochromatin with sufficient precision as to answer these questions.

Brutlag, D.L. 1980. Molecular arrangement and evolution of heterochromatic DNA. Annual Review of Genetics 14: 121-144.

After “selfish DNA”, the decade during which noncoding DNA supposedly was ignored (1980-1989):

The foregoing data support the concept that the so-called “junk” or genetically inactive DNA centered around the centromeric region has a function in controlling the separation of centromere (or its replication into two daughter centromeres) at the junction of metaphase-anaphase in mitosis.

Vig, B.K. 1982. Sequence of centromere separation: role of centromeric heterochromatin. Genetics 102: 795-806.

Satellite DNAs were first discovered over twenty years ago as species of DNA which, due to their unusual base composition, band at densities distinct from bulk DNA upon equilibrium sedimentation (Kit, 1961). Subsequently, it was shown that these DNAs are highly repetitious, that they are arranged in long tandem arrays, and that they are localized typically in pericentric or telocentric heterochromatin. Many of these DNAs, including mouse satellite DNA, have been sequenced. Despite detailed knowledge of the structure and location of satellite DNAs, their potential function(s) have only been hypothesized. These range from none (i.e., selfish DNA) to roles in many events including enhanced or reduced recombination, spindle attachment, gene amplification, chromosome pairing and/or segregation. Unfortunately, most of these hypotheses do not readily lend themselves to experimental investigation.

One major conclusion from the work described is that the association of kinetochores with centromeric regions of mouse chromosomes is not simply due to the presence of mouse satellite DNA sequences. However, mouse satellite DNA does appear to play a crucial role in the maintenance of contact between sister chromatids during metaphase.

Lica, L.M., S. Narayanswami, and B.A. Hamkalo. 1986. Mouse satellite DNA, centromere structure, and sister chromatid pairing. Journal of Cell Biology 103: 1145-1151.

Repetitive DNA evolves more rapidly than other genomic regions. Still, long regions of homology can be found between satellites from closely related species. Statistically significant homologies can even be found between satellites from species very distantly re- lated as the Drosophila and Bovine satellites or between animal and plant species. Whether such homologies have any functional significance, is not known.

The interpretation of these homologies can be addressed with respect to two different theories concerning the function of repeated DNA. The striking coincidence between the size of these repeat units and the mononucleosome DNA length suggests that these repeats have a role in determining chromatin structure. In fact, a sequence-dependent phasing of nucleosomes along repetitive DNA has been found in a mouse satellite DNA and in the African green monkey satellite. This could explain the homologies found between these repeats at the sequence level and also the striking conservation of their size. On the other hand, if this DNA is functionless as suggested by some authors, the homologies found could be a consequence of a common origin for many tandemly repeated families. They could have arisen from conserved genomic sequences by independent amplification events. For example, several families of interspersed repetitive sequences found in animal species are known to derive from different tRNA genes by independent amplification events. Thus, the conservation of size could be explained if, for example, nucleosomes have a role in determining the size of the sequence to be amplified.

No experimental approach to the study of the functional significance of these sequences is readily apparent at present. However, Arabidopsis, with its small genome and simple pattern of repeated DNA may eventually be a useful system for the study of these ubiquitous components of the higher eukaryotic genome.

Martinez-Zapater, J.M., M.A. Estelle, and C.R. Somerville. 1986. A highly repeated DNA sequence in Arabidopsis thaliana. Molecular and General Genetics 204: 417-423.

Tandemly repeated DNA families have long attracted considerable attention from genome-watchers, ever since satellite DNAs were originally isolated, over 20 years ago, as subsets of genomic DNA that were separable from the bulk of DNA by isopycnic centrifugation.

Willard, H.F. and J.S. Waye. 1987. Hierarchical order in chromosome-specific human alpha satellite DNA. Trends in Genetics 3: 192-198.

The species specificity of satellite profiles has long been interpreted as evidence for evolutionary instability of this class of DNA. In turn, this has led to the notion that either satellite DNAs have no function and are simply excess DNA, or that any function would be of a general nature involving chromosome condensation, pairing or recombination.

Lohe, A.R. and D.L. Brutlag. 1987. Identical satellite DNA sequences in sibling species of Drosophila. Journal of Molecular Biology 194: 161-170.

A highly conserved repetitive DNA sequence, (TTAGGG)n, has been isolated from a human recombinant repetitive DNA library. Quantitative hybridization to chromosomes sorted by flow cytometry indicates that comparable amounts of this sequence are present on each human chromosome. Both fluorescent in situ hybridization and BAL-31 nuclease digestion experiments reveal major clusters of this sequence at the telomeres of all human chromosomes. The evolutionary conservation of this DNA sequence, its terminal chromosomal location in a variety of higher eukaryotes (regardless of chromosome number or chromosome length), and its similarity to functional telomeres isolated from lower eukaryotes suggest that this sequence is a functional human telomere.

Moyzis, R.K., J.M. Buckingham, L.S. Cram, M. Dani, L.L. Deaven, M.D. Jones, J. Meyne, R.L. Ratliff, and J.-R. Wu. 1988. A highly conserved repetitive DNA sequence, (TTAGGG)n, present in the telomeres of human chromosomes. Proceedings of the National Academy of Sciences of the USA 85: 6622-6626.

The chromosomes of most mammalian species contain centromeric domains which comprise repetitive DNA sequences. Most of these domains contain blocks of simple sequence DNA families, the properties of which give rise to the characteristic C-band patterns present in mammalian chromosomes. More than one simple sequence DNA family can occupy the same centromeric domain. The biological role of these sequences in the function of an active centromere is unknown; however, one of the simple sequence DNAs in the mouse genome can bind microtubule spindle fibers, which may imply an active role for these particular DNA sequences. Recently, we have shown that one member of the human alphoid family of DNA sequences is physically closer to the functional kinetochore within the centromeric domain of human chromosome 9 than are members of the simple sequence DNA family termed satellite III.

Joseph, A., A.R. Mitchell, and O.J. Miller. 1989. The organization of the mouse satellite DNA at centromeres. Experimental Cell Research 183: 494-500.

“Noncoding DNA has been ignored until recently” (beginning around 1989-1990):

The prevailing view that satellite DNA is mostly ‘junk’ whose presence or absence has no bearing on the fitness of its carriers, has been widely accepted. Most of the support for this came from interspecific comparisons. By adding extra heterochromatic materials or by deleting nearby essential (ribosomal RNA) genes, previous studies only addressed the issue indirectly. We have provided the first direct test of this hypothesis by comparing the fitnesses of Drosophila with, and without, a well characterized array of satellite repeats. A fitness effect is clearly detectable. The observed effect is also inconsistent with the view that the functions of satellite DNA, if any, must be in the germ cells.

It is far fetched to think that all satellite DNAs have a useful role, but it is equally unwise to label them universally as junk in the absence of any other direct proof.

Wu, C.-I., J.R. True, and N. Johnson. 1989. Fitness reduction associated with the deletion of a satellite DNA array. Nature 341: 248-251.

The centromere is the major cis-acting genetic locus involved in chromosome segregation in mitosis and meiosis. The mammalian centromere is characterized by large amounts of tandemly repeated satellite DNA and by a number of specific centromere proteins, at least one of which has been shown to interact directly with centromeric satellite DNA sequences. Although direct functional assays of chromosome segregation are still lacking, the data are most consistent with a structural and possibly functional role for satellite DNA in the mammalian centromere.

As a necessary first step in the identification and characterization of DNA at mammalian centromeres, one approach has been to focus on the structure and organization of the DNA from the primary constriction. Although it has been recognized for over 20 years that the centromeric heterochromatin in chromosomes from virtually all complex eukaryotic organisms consists of various families of satellite DNA, they have only recently been taken seriously as candidates for something other than ‘junk’ DNA or genomic ‘flotsam and jetsam’ (Miklos 1985). Satellite DNA families in different mammalian orders (e.g. rodents and primates) appear largely unrelated in terms of their actual sequences; however, similarities in their overall chromosomal organization and in specific short sequences implicated in centromere protein recognition may offer enticing clues to the potential involvement of at least some satellite DNAs in centromere structure and/or function.

Willard, H.F. 1990. Centromeres of mammalian chromosomes. Trends in Genetics 6: 410-416.

____________

Part of the Quotes of interest series.
____________

Other citations

Battaglia, E. 1999. The chromosome satellite (Navashin’s “sputnik” or satelles): a terminological comment. Acta Biologica Cracoviensia, Series Botanica 41: 15-18.

Kit, S. 1961. Equilibrium sedimentation in density gradients of DNA preparations from animal tissues. Journal of Molecular Biology 3: 711-716.

Sueoka, N. 1961. Variation and heterogeneity of base composition of deoxyribonucleic acids: a compilation of old and new data. Journal of Molecular Evolution 3: 31-40.

Quotes of interest — science news stories.

We have been told in science news stories since the early 1990s that biologists long neglected the potential significance of noncoding DNA. (Sadly, this is in line with the claims made by creationists, who claim that “Darwinism” is to blame despite the obvious fact that Darwinian adaptationism would expect functions. Some biologists likewise play up the notion that we have ignored noncoding sequences and just now are coming to appreciate them, thanks, no doubt, to their own revolutionary insights, but again, this ignores a diverse literature on the topic spanning the rise of the tools necessary for such work up to the present.) But what about the science stories that were actually written during the supposed period during which noncoding DNA was dismissed as uninteresting (i.e. 1980 to the early 1990s)?

If you had a subscription to Science in the 1980s, you would have read stories like these by their science writer Roger Lewin:

Lewin, R. 1981. Evolutionary history written in globin genes. Science 214: 426-427.

Even though the human β-globin complex contains a relatively large number of active genes, 95 percent of the locus is made up of DNA that does not code for proteins. What is the role of this extra DNA, if any? The pseudogenes constitute just a small proportion of the region, although more pseudogenes might exist. Some of the DNA is made up of representatives of well-known families of repetitive sequences. And the remainder is DNA of no known function or comparable sequence.
“We wanted to test the hypothesis that this extra DNA is ‘junk DNA,'” says Jeffreys, “so we compared the β loci in humans, gorillas, and baboons.” Jeffreys and his colleagues reasoned that if it were junk DNA, then over the 20 to 40 million years of evolution represented by humans, apes, and Old World monkeys both the sequence and the overall quantity of intergenic DNA could be expected to vary. “It turned out that the cluster is remarkably stable,” reports Jeffreys. “The overall pattern and size of the cluster is the same, and the rate of nucleotide substitutions is one-quarter to one-fifth of what be expected in functionless DNA”. The noncoding DNA therefore appears not to be junk, but what function it might perform is still a mystery.

Lewin, R. 1982. Repeated DNA still in search of a function. Science 217: 621-623.
[Reporting about an NIH International Workshop in Highly Repeated DNA July, 1982]

Interest in repetitive DNA sequences goes back many years but, as with many aspects of molecular biology, the advent of recombinant DNA technology and DNA sequencing now permits previously unmatched scrutiny of the structures of interest.

If mobility is a reality, and most agree that it probably is, then it seems likely that at least some members of repeat families will have important effects in the genome, even if they have no formal function. Enhancing recombination and altering rates of gene expression are obvious possibilities, while the initiation of new species is a more recondite proposal.

The truth is, however, that the functions of the large and motley collection of repeated DNA families are proving particularly resistant to elucidation. Putative functions are many, including, variously, involvement in chromosome pairing, control of gene expression, processing of messenger RNA precursors, and participation in DNA replication. So far none has been established, save for the single exception of a small family that gives rise to 7S RNA, a molecule that recently was serendipitously discovered to be an essential component of a particle that mediates the secretion of proteins from cells.

Some repetitive DNA will undoubtedly be shown to have a function, in the formal sense; some will likely be shown to exert important effects; and the remainder may well have no function or effect at all and can therefore be called selfish DNA. Repetitive DNA constitutes a substantial proportion of the genome (up to 90 percent in some cases), and there is considerable speculation on how it will eventually be divided between these three groups. Current bets would put a small fraction in the function category, with distribution of the rest rising steeply through the effect and selfish categories.

Satellite DNA unquestionably is a puzzle. What determines the number of copies in a repeat family? And how does the genome tolerate so much of it? Perhaps, as Singer has recently promulgated, just a small fraction of the satellite sequences is essential to some genomic function while the remainder is harmless surplus. This, she indicates, is a comfortable middle ground between the extreme selfish DNA position, which sees no function in all this “junk DNA,” and the adaptationist position, which looks for functions in every structure. The same questions and speculations can be applied to dispersed repetitive DNA.

One observation that might be taken as evidence of function in repeated sequences is the frequency of transcription into RNA. A significant proportion of nuclear RNA contains transcripts of repeated sequences, although 90 percent of this is lost in RNA processing and exit to the cytoplasm. Davidson and his colleagues have shown that in sea urchin the spectrum of repeat families that are transcribed changes during development, an appealing argument for some regulatory function. Most intriguing, however, is the discovery that only a small proportion of any repeat family is ever transcribed. “Most members appear to be quiescent, which must make you cautious when isolating samples in search of their function.”

It is clear that, from their abundance, their unusual structure, and their frequent transcription, dispersed repetitive DNA families cannot be ignored. But it is equally clear that for the most part they, like their tandemly repeated relatives, remain a phenomenon in search of a function.


Lewin, R. 1982. Adaptation can be a problem for evolutionists. Science 216: 1212-1213.

Molecular biology of recent years has revealed many new and intriguing categories of DNA, some of which appear to have no role. One explanation of this has been that the nonaptive sequences provide raw material for future evolution. But the logic of natural selection does not allow for selection for future use. More likely is that the accumulation of nonaptive DNA is a consequence of the innate property of repeated sequences of nucleic acid to replicate and move around the genome. Later it may be recruited to perform some role, in which case it becomes an exaptation.

Lewin, R. 1983. A naturalist of the genome. Science 222: 402-405.

Some mobile elements are large and complex, measuring as much as 10,000 nucleotides in length and carrying many genes, while others are simple sections of repeated DNA just a few hundred nucleotides long. Some people would classify all such elements as “junk” or “parasitic” DNA. Others strongly demur and insist that, for instance, although there is yet to be found any convincing evidence for the involvement of a limited class of elements in development in organisms other than maize, the possibility should by no means be dismissed. In any case it is clear that the mobility of certain genetic elements is essential in the generation of the huge diversity of antibodies in vertebrates and in the production of different antigenic coats in certain parasites. Jumping genes clearly represent a potentially rich source of mutation. In addition, an evolutionary link between mobile elements and retroviruses now seems incontrovertible, as does a causal relationship with certain cancers.

Lewin, R. 1985. More progress in messenger RNA splicing. Science 228: 977.

This summer marks 8 years since eukaryotic genes were first discovered to be interrupted by noncoding sequences, known variously as intervening sequences or introns. The discovery raised two sets of questions. The first concerns the origin and function-if any-of introns, which, by its very nature, is a very difficult question to test and therefore remains somewhat in the realms of speculation, although significant insights are being made.The second focuses on the mechanics of removal of these sequences in the production of mature RNA molecules, and in principle should be experimentally more tractable. The immense effort directed at this second question has produced during the past 8 years some conventional biochemistry, some novel and surprising nucleic acid chemistry, and a great deal of frustration.

Lewin, R. 1986. “Computer genome” is full of junk DNA. Science 232: 577-578.

Many biologists were unhappy with the idea that much of the DNA might have no function, says Loomis. “There is a very strong feeling that if a molecule, or any kind of biological structure, exists, then it must be serving some kind of selectively advantageous purpose. I disagree with this viewpoint very strongly.” Loomis prefers to turn the question around. “We should ask, ‘what is the selective advantage of getting rid of a particular structure?’ This is not common thinking.”

It is of course very difficult to prove that a structure or a sequence of DNA has no function. “People will always say, ah, but you haven’t looked under the right conditions,” says Loomis. In the case of multigene families, the best data come from mutation experiments.

Lewin, R. 1988. Chance and repetition. Science 240: 603.

With some kind of concerted effort to map and sequence the entire human genome now appearing to be inevitable, there will be much excitement at the prospect of discovering what is encoded in the 3-billion-base “message”. There are certain to be some surprises, perhaps even equivalent in magnitude to the discovery a decade ago of long, noncoding sequences that interrupt the great majority of eukaryotic genes. But there are many biologists who expect large parts of the genome to be devoid of any function at all: “We face the prospect of trudging through huge tracts of junk DNA,” remarked British molecular biologist Sydney Brenner during one of the many recent panel discussions on the project.

At least some proportion of the DNA in the genomes of most organisms is in the form of these so-called middle repetitive sequences, ranging from 3% to as much as 70%: typically, the bigger the genome, the more repetitive DNA. There is a long tradition in biology that, seeing structures as extensive as these, argues that there must be a functional explanation for them.

Biologists have long speculated about the function of middle repetitive sequences, with regulation of gene expression being one popular notion. Loomis and Gilpin’s perspective, however, is that, although some middle repetitive sequences may have acquired a function once they have formed, there is no need to invoke function as a selective pressure for their origin.

____________

Part of the Quotes of interest series.


Quotes of interest — 1980s edition (part two).

This is the second installment in the quotes of interest series that focuses in particular on research and discussions from the 1980s, when noncoding DNA supposedly was ignored as irrelevant. The important message being offered is that there was plenty of research into possible functions or lack thereof in noncoding sequences of all types, and that whichever way authors concluded was based on the evidence available at the time, not ideology. This includes the parallel development of neutral theory, many proponents of which did conclude that pseudogenes were nonfunctional on the basis of their high mutation rates compared with coding sequences. Again, the point is not that no one argued against function (I argue against function at the organism level for most noncoding DNA), but that this is based on evidence, not unsupported assumption.

Members of the Alu family of interspersed repeated sequences and its rodent equivalents may be the normal cellular DNA replication initiation sites. In mammalian cells DNA replication proceeds bidirectionally simultaneously from many sites, and thus the initiation sites for replication might be expected to be interspersed repeated sequences with two-fold rotational symmetry. The inverted repeated examples of the Alu family of interspersed repeated sequences and their Chinese hamster equivalents show these attributes. These considerations raise the question of whether the transcription of these repeated sequences by RNA polymerase III, or the interaction of these sequences with the low molecular weight RNA, or both, may play a role in the initiation of DNA replication.

Jelinek, W.R., T.P. Toomey, L. Leinwand, C.H. Duncan, P.A. Biro, P.V. Choudary, S.M. Weissman, C.M. Rubin, C.M. Houck, P.L. Deininger, and C.W. Schmid. 1980. Ubiquitous, interspersed repeated sequences in mammalian genomes. Proceedings of the National Academy of Sciences of the USA 77: 1398-1402.

We have assigned six members of the human β-actin multigene family to specific human chromosomes. The functional gene, ACTB, is located on human chromosome 7, and the other assigned β-actin-related sequences are dispersed over at least four different chromosomes including one locus assigned to the X chromosome. Using intervening sequence probes, we showed that the functional gene is single copy and that all of the other β-actin related sequences are recently generated in evolution and are probably processed pseudogenes. The entire nucleotide sequence of the functional gene has been determined and is identical to cDNA clones in the coding and 5′ untranslated regions. We have previously reported that the 3′ untranslated region is well conserved between humans and rats (Ponte et al., Nucleic Acids Res. 12:1687-1696, 1984). Now we report that four additional noncoding regions are evolutionarily conserved, including segments of the 5′ flanking region, 5′ untranslated region, and, surprisingly, intervening sequences I and III. These conserved sequences, especially those found in the introns, suggest a role for internal sequences in the regulation of β-actin gene expression.

Our finding of highly conserved blocks of nucleotides in two of the five intervening sequences of β-acting genes raises the possibility that these segments have regulatory functions. Conserved internal regions have been reported previously, such as the internal transcriptional enhancer regions of immunoglobulin genes. However, the locations of these enhancers were initially regarded as a peculiarity of the immunoglobulin gene loci. More recently, internal control regions have been detected (but yet unidentified) for the adenovirus E1A gene, human globin genes, and chicken thymidine kinase gene. Any conclusion that the conserved β-actin intron sequences, especially those of IVS I, function as transcriptional enhancers must await direct experimentation. Nevertheless the evolutionary conservation of the immunoglobulin enhancer segments indicates that other transcriptional enhancers or cis-acting regulatory signals would be under selective pressure. It is interesting to note in this regard that the IVS I of both α- and β-globin genes are the most conserved introns of these genes. The IVS I of the human and mouse β-globin genes, for example, has 81 base pairs matching to give a KN(1) value of 0.302. Therefore these introns may well contain part of the proposed downstream regulatory elements.

Ng, S.-Y., P. Gunning, R. Eddy, P. Ponte, J. Leavitt, T. Shows, and L. Kedes. 1985. Evolution of the functional human β-actin gene and its multi-pseudogene family: conservation of noncoding regions and chromosomal dispersion of pseudogenes. Molecular and Cellular Biology 5: 2720-2732.

Although the presence and similar location of pseudogenes in all the mammalian globin gene clusters suggest that pseudogenes may have some as yet unidentified function, the simplest explanation for their existence is that they are the natural consequence of the mechanisms of gene amplification and sequence divergence. The arrangement of genes within the human α-globin gene cluster is consistent with this possibility.

Proudfoot, N.J. and T. Maniatis. 1980. The structure of a human α-globin pseudogene and its relationship to α-globin gene duplication. Cell 21: 537-544.

In summary, the structural analysis of a number of different globin gene clusters suggests that globin gene families are in evolutionary flux. Perhaps pseudogenes are simply a natural consequence of the mechanisms by which multigene families evolve.

Lacy, E. and T. Maniatis. 1980. The nucleotide sequences of a rabbit β-globin pseudogene. Cell 21: 545-553.

Particularly surprising are the intron-exon splice borders of the H3.3 gene. Not only do they contain the standard splice consensus sequences, but in all cases the introns are flanked by 7-8 base pair direct repeats. The function, if any, of these repeats is unclear, since the repeats include both intron and exon bases. One functional difference between these introns can be inferred from the structures of the previously isolated cDNAs. Three of the cDNAs were shown to contain an unspliced intron, but did not carry introns 2 and 3. This could reflect the preferential splicing out of introns 2 and 3 before the splicing out of intron 1. If there is a tendency toward 5′ to 3′ splicing, the unusual splice junctions seen for the H3.3 gene could act to supersede this tendency. The advantage to the organism to remove intron 1 last is unclear but could point to some as yet undetermined function for this intron. In support of this, we have found that a DNA probe derived from intron 1 hybridizes to a single fragment in a Southern blot of total mouse genomic DNA indicating that the sequences in this intron may be conserved, whereas a DNA probe derived from intron 2 does not hybridize.

Wells, D., D. Hoffman, and L. Kedes. 1987. Unusual structure, evolutionary conservation of non-coding sequences and numerous pseudogenes characterize the human H3.3 histone multigene family. Nucleic Acids Research 15: 2871-2889.

A mouse α-globin-related pseudogene (ψα30.5) completely lacks intervening sequences, and could not code for a functional globin polypeptide because of frameshifts. The widespread occurrence of globin pseudogenes in other species suggests that they are not ‘dead’ genes but may be important in controlling globin expression.

The general hypothesis that pseudogenes control the productive genes in some fashion, nevertheless, remains attractive and we are investigating the hypothesis further, including tests in non-erythroid tissues. Certainly, the widespread occurrence of globin pseudogenes argues strongly for their functional importance.

Vanin, E.F., G.I. Goldberg, P.W. Tucker, and O. Smithies. 1980. A mouse α-globin-related pseudogene lacking intervening sequences. Nature 286: 222-226.

The foregoing data support the concept that the so-called “junk” or genetically inactive DNA centered around the centromeric region has a function in controlling the separation of centromere (or its replication into two daughter centromeres) at the junction of metaphase-anaphase in mitosis.

Vig, B.K. 1982. Sequence of centromere separation: role of centromeric heterochromatin. Genetics 102: 795-806.

A highly conserved repetitive DNA sequence, (TTAGGG)n, has been isolated from a human recombinant repetitive DNA library. Quantitative hybridization to chromosomes sorted by flow cytometry indicates that comparable amounts of this sequence are present on each human chromosome. Both fluorescent in situ hybridization and BAL-31 nuclease digestion experiments reveal major clusters of this sequence at the telomeres of all human chromosomes. The evolutionary conservation of this DNA sequence, its terminal chromosomal location in a variety of higher eukaryotes (regardless of chromosome number or chromosome length), and its similarity to functional telomeres isolated from lower eukaryotes suggest that this sequence is a functional human telomere.

The human genome contains a variety of DNA sequences present in multiple copies. These repetitive DNA sequences are thought to arise by many mechanisms, from direct sequence amplification to the unequal recombination of homologous DNA regions to the reverse flow of genetic information. While it is likely that some of these repetitive DNA sequences influence the structure and function of the human genome, little experimental evidence supports this idea at present.
We reasoned, however, that evolutionary conservation of a particular repetitive DNA sequence family might imply that the sequence is essential to cellular function.

Moyzis, R.K., J.M. Buckingham, L.S. Cram, M. Dani, L.L. Deaven, M.D. Jones, J. Meyne, R.L. Ratliff, and J.-R. Wu. 1988. A highly conserved repetitive DNA sequence, (TTAGGG)n, present in the telomeres of human chromosomes. Proceedings of the National Academy of Sciences of the USA 85: 6622-6626.

____________

Part of the Quotes of interest series.


Quotes of interest — pseudogene.

The term “pseudogene” was coined by Jacq and colleagues in 1977. The standard tale of biologists dogmatically ignoring possible functions of noncoding DNA would have it that such a sequence automatically would be dismissed as “junk” when discovered, especially since the notion of a degraded and now non-coding former gene matches Ohno’s concept of “junk DNA” as originally proposed. The reality is that Jacq et al. (1977) did consider whether the sequence had a function, but based on the available data they concluded that the best explanation is that it is “an evolutionary relic”. They did not cite Ohno.

Summary
The 5S DNA of Xenopus laevis, coding for oocyte-type 5S RNA, consists of many copies of a tandemly repeated unit of about 700 base pairs. Each unit contains a “pseudogene” in addition to the gene. The pseudogene has been partly sequenced and appears to be an almost perfect repeat of 101 residues of the gene. The order of components in the repeat unit is (5′) long spacer-gene-linker-pseudogene (3′) in the “+” strand (or H strand) of the DNA. The possible function of the pseudogene is discussed.

The functions of the different regions of the 5S DNA are only imperfectly understood. The gene region 1-121 codes for the mature oocyte 5S RNA, and the presence of a pppG sequence at residue 1 of the mature 5S RNA defines this residue as the point of initiation of transcription by RNA polymerase III (Roeder, 1976). The point of termination of transcription, however, is less clear. Brown and Brown (1976) have argued that the high A + T-rich sequence of residues 119-123 of the gene region is a signal for the termination of transcription. But low yields of a larger transcription product–about 135 residues long–have been isolated by Denis and Wegnez (1973) in pulse-labeling experiments in Xenopus laevis oocytes. Similar length molecules have also been isolated in heat-shocked Drosophila cells by Rubin and Hogness (1975). While clear evidence that these 135-long molecules are precursors of the mature 5S RNA in Xenopus (or Drosophila) is lacking, their isolation clearly demonstrates that longer transcripts may be synthesized in vivo. It is therefore possible that the structural gene for 5s RNA is larger than the 121 residues of the mature 55 RNA and extends into the region of DNA, linking gene and pseudogene for at least another 15 residues.

Thus the known transcription of the 5S DNA system does not explain the presence of the pseudogene. Moreover, no RNA products corresponding to the pseudogene have been isolated, although it is conceivable that these may well have been overlooked or confused with tRNA in earlier studies (Denis and Wegnez, 1973), especially if they occur only in low yield. We are thus forced to the conclusion that the most probable explanation for the existence of the pseudogene is that it is a relic of evolution. During the evolution of the 5S DNA of Xenopus laevis, a gene duplication occurred producing the pseudogene. Presumably the pseudogene initially functioned as a 5S gene, but then, by mutation, diverged sufficiently from the gene in its sequence so that it was no longer transcribed into an RNA product.

This evolutionary explanation for the presence of the pseudogene, however, is incomplete by itself in that it ignores the conservation in sequence of the pseudogene, and indeed of the entire G + C-rich spacer of 5S DNA. In an attempt to explain this, it has been suggested (Brownlee, 1976) that the pseudogene may be a “transcribed spacer” corresponding to a primary transcript of 5s RNA, which is a transient precursor and has not been detected. If this is so, then most of the G + C-rich region of 5S DNA would be the structural gene for 5S RNA. This function, if true, would provide the necessary selective pressure to conserve the sequence of the “linker” and pseudogene region so that the correct processing of the postulated 300-long precursor was maintained. In the absence of any experimental evidence for such a long precursor, however, this suggestion must be regarded as speculative; it is more probable that the pseudogene is a relic of evolution.

____________

Part of the Quotes of interest series.
____________

Jacq, C., J.R. Miller, and G.G. Brownlee. 1977. A pseudogene structure in 5S DNA of Xenopus laevis. Cell 12: 109-120.


Quotes of interest — Nobel Prize special edition.

The story we have been told by creationists and neo-Panglossian scientists is that most if not all noncoding DNA is functional and that this fact has been obscured by long neglect in the scientific community of the potential importance of noncoding elements. In particular, the “junk DNA” and “selfish DNA” ideas put forth in the 1970s and 1980s are suggested to have stifled interest in the possible biological and medical importance of noncoding sequences, which have long been dismissed as irrelevant. The question is, did the scientific community turn its back on researchers interested in the roles of noncoding elements after 1980?

1983 Nobel Prize in Physiology or Medicine
to Barbara McClintock
For her discovery of
mobile genetic elements
[transposable elements]

Barbara McClintock discovered mobile genetic elements in plants more than 30 years ago. The discovery was made at a time when the genetic code and the structure of the DNA double helix were not yet known. It is only during the last ten years that the biological and medical significance of mobile genetic elements has become apparent. This type of element has now been found in microorganisms, insects, animals and man, and has been demonstrated to have important functions.

Such elements were also found to have an important function in the ability of unicellular parasites (trypanosomes) to change their surface properties, thereby avoiding the immune response of the host organism. Recombination of DNA segments proved to be an essential factor in the ability of lymphoid cells to produce a seemingly infinite number of different antibodies to foreign substances. In recent years, evidence has accumulated that transposition of genes or incomplete genes are involved in the transformation of normal cells into tumour cells. Thus, genes controlling cell growth have been found to undergo translocation from chromosome to another during cancerogenesis. The initial discovery of mobile genetic elements by Barbara McClintock is of great medical and biological significance. It has also resulted in new perspectives on how genes are formed and how they change during evolution.

http://nobelprize.org/nobel_prizes/medicine/laureates/1983/press.html


1993 Nobel Prize in Physiology or Medicine
to
Richard J. Roberts and Phillip A. Sharp
For their discovery of split genes
[introns and exons]

Roberts’ and Sharp’s discovery has changed our view on how genes in higher organisms develop during evolution. The discovery also led to the prediction of a new genetic process, namely that of splicing, which is essential for expressing the genetic information. The discovery of split genes has been of fundamental importance for today’s basic research in biology, as well as for more medically oriented research concerning the development of cancer and other diseases.

As a consequence of the discovery that genes are often split, it seems likely that higher organisms in addition to undergoing mutations may utilize another mechanism to speed up evolution: rearrangement (or shuffling) of gene segments to new functional units. This can take place in the germ cells through crossing-over during pairing of chromosomes. This hypothesis seems even more attractive following the discovery that individual exons in several cases correspond to building modules in proteins, so-called domains, to which specific functions can be attributed. An exon in the genome would thus correspond to a particular subfunction in the protein and the rearrangement of exons could result in a new combination of subfunctions in a protein. This kind of process could drive evolution considerably by rearranging modules with specific functions.

http://nobelprize.org/nobel_prizes/medicine/laureates/1993/press.html


____________

Part of the Quotes of interest series.




Quotes of interest — long neglected, some noncoding DNA is actually functional.

I have started a series listing quotes from papers published during the supposed period of neglect of noncoding DNA that, we are told repeatedly by authors of various persuasions, was inspired by the “junk DNA” and “selfish DNA” ideas. For this installment, I want to quote at length from one article which represents a typical discussion of some eukaryotic “junk DNA” turning out to have functions. This is the sort of thing we see regularly in the media and in the scientific literature, so a single example should be sufficient.

The protein-coding portions of the genes account for only about 3% of the DNA in the human genome; the other 97% encodes no proteins. Most of this enormous, silent genetic majority has long been thought to have no real function — hence its name: “junk DNA”. But one researcher’s trash is another researcher’s treasure, and a growing number of scientists believe that hidden in the junk DNA are intellectual riches that will lead to a better understanding of diseases (possibly including cancer), normal genome repair and regulation, and perhaps even the evolution of multicellular organisms.
Rather than the genes, junk DNA “is actually the challenge right now,” says Eric Lander of the Massachusetts Institute of Technology, who is himself a prominent Human Genome Project researcher. And in rising to meet that challenge, geneticists are beginning to formulate a new view of the genome. Rather than being considered a catalogue of useful genes interspersed with useless junk, each chromosome is beginning to be viewed as a complex “information organelle,” replete with sophisticated maintenance and control systems — some embedded in what was thought to be mere waste.

…when geneticists started studying complex, multicellular organisms, it was easy to dismiss the vast reaches of non-protein-coding DNA as a wasteland. Now, however, that notion is being overturned as researchers find that junk DNA is not a single midden heap, but a complex mix of different types of DNA, many of which are vital to the life of the cell.

Some of the earliest indications that junk DNA might have important functions came from studies on gene control. Those studies found that genes have regulatory sequences, short segments of DNA that serve as targets for the “transcription factors” that activate genes. Many of the regulatory sequences lie outside the protein-coding sequences — in the genetic garbage can. “There’s at least five regulatory elements for each [human] gene, probably many more,” says gene control expert Robert Tjian of the University of California, Berkeley. “For a long time it wasn’t appreciated how widespread those elements can be, but now it seems that patches of really important regulatory elements can be buried among the junk DNA.”

Now, however, it appears that some repetitive sequences may contain stretches of DNA needed for gene regulation. What is more, the function of these stretches must be significant, because if their sequences go astray they may result in cancer.

But housing sequences that control the genes isn’t the only role that so-called genetic trash plays. Some repetitive sequences also seem to have a crucial function in maintaining the structure of the genome.

Thus, in a dramatic reversal, the repetitive sequences, once thought to be the epitome of genetic debris, now seem to be needed to maintain the integrity of the chromosomes. But the repetitive sequences aren’t the only forms of genetic garbage moving up in the world. Whereas the repetitive sequences are usually found outside genes, a second type of genetic junk, the introns, are scattered through the genes of higher organisms.

Koop and Hood have found that the DNA of the T cell receptor complex, a crucial immune system protein, shows 71% identity between humans and mice. That finding is startling, since only 6% of the DNA encodes the actual protein sequence, while the rest consists of introns and noncoding regions. “[The finding] certainly questions the assumption that introns are junk,” says Koop. Instead, he says, “it fits the view that chromosomes are information organelles that carry out a variety of functions besides encoding genes, such as maintenance of genome structure and gene regulation.”
That opinion appeals to John Mattick, a molecular biologist at the University of Queensland in Australia, currently on sabbatical at Cambridge University in England. Mattick has proposed that introns provide a previously unsuspected system for regulating gene expression.

“[Mattick’s] idea is very interesting indeed,” says evolutionary geneticist Laurence Hurst of Cambridge University, England. “And it’s perfectly testable.” For example, he says, Mattick’s model predicts that certain genes, like regulatory developmental genes, that must be finely controlled, will likely bear intron-encoded regulatory RNAs.

“There’s too many cases of odd RNAs,” says molecular geneticist Marvin Wickens of the University of Wisconsin, Madison. “It smells like there might be a whole family of regulatory RNAs.” And if that suspicion proves correct, it would be a big boost for Mattick’s new theory, as well as for the status of junk DNA — a status that is likely to keep on rising over the next couple of years. Enough gems have already been uncovered in the genetic midden to show that what was once thought to be waste is definitely being transmuted into scientific gold.

You may be curious, out of all the discussions like this that are being published, why would this be the one that is singled out?

For a simple reason: it was written 14 years ago.

Nowak, R. 1994. Mining treasures from ‘junk DNA’. Science 263: 608-610.

I will talk about the timeline of the “junk DNA” discussion more comprehensively later, but here is what we can tell so far. The term “junk DNA” was coined by Ohno (1972), and in the first detailed discussion of the topic (Comings 1972), the likelihood that some noncoding DNA would be functional was explicitly noted. In any case, Ohno (1972) seems not to have had much influence during the first decade after he coined the term, because in 1980, when “selfish DNA” was introduced, the overwhelming tendency was to assume that all noncoding DNA was present because it was adaptive — this is why Orgel and Crick (1980) and Doolittle and Sapienza (1980) wrote their papers. There was strong resistance to the idea of selfish DNA for at least the first few years after the idea was proposed (Doolittle 1982), and even in the late 1980s there was at most discussion about how much noncoding DNA might be parasitic versus functional. Keep in mind also that DNA sequencing did not become a common method until the late 1970s/early 1980s, and that introns weren’t even discovered until 1977, and then much of the study focused on seeing how abundant they were and on their origin (were they present from the beginning, or did they arise only among eukaryotes?). The term “pseudogene” was coined in 1977 as well. So, the kind of work that people expect when they say detailed functional research wasn’t done could not have started until the 1980s in any case, and in fact there was abundant research investigating possible roles of satellite DNA, introns, and transposable elements during that decade. By the early 1990s, people had begun proposing additional functions for noncoding DNA, including Mattick’s idea about regulatory RNA sequences.

In other words, there was no real period in which noncoding DNA was dismissed by the scientific community, though there was a much-needed shift away from strictly adaptive interpretations in the 1980s. Some individual researchers ignored noncoding regions, but there is no gap in the literature other than limits on what could be done in a methodological capacity. The “new” view of noncoding DNA as potentially important has been proclaimed regularly for at least as long as the claimed period of neglect between 1980 and 1994.

One wonders just how long we will be told that we have long been neglecting noncoding DNA.

________

Part of the Quotes of interest series.
________

Comings, D.E. 1972. The structure and function of chromatin. Advances in Human Genetics 3: 237-431.

Doolittle, W.F. and C. Sapienza. 1980. Selfish genes, the phenotype paradigm and genome evolution. Nature 284: 601-603.

Doolittle, W.F. 1982. Selfish DNA after fourteen months. In Genome Evolution (eds. G.A. Dover and R.B. Flavell), pp. 3-28. Academic Press, New York.

Ohno, S. 1972. So much “junk” DNA in our genome. In Evolution of Genetic Systems (ed. H.H. Smith), pp. 366-370. Gordon and Breach, New York.

Orgel, L.E. and F.H.C. Crick. 1980. Selfish DNA: the ultimate parasite. Nature 284: 604-607.

_________

Part of the Quotes of interest series.

Quotes of interest — 1980s edition (part one).

I previously posted a few quotes from the original authors of the “junk DNA” and “selfish DNA” hypotheses. These showed that the early discussions of these notions did not rule out possible functions for noncoding DNA. Nevertheless, creationists, many science writers, and far too many biologists insist on claiming that noncoding DNA was long dismissed as unimportant because of these ideas. I will be discussing the history of research in this field in some detail later, but for the time being I thought it would be interesting to give some more quotes from papers written in top journals during the supposed period of disregard of noncoding sequences. This is quote-mining, of course, so you are encouraged to consult the original sources. I have not hand-picked these, rather these are the types of papers that come up in searches from this period. By all means, if you know of works from any time that claimed that all noncoding DNA is nonfunctional or discouraged research into possible functions let me know the citation.

There is obviously a continuum of possible selective advantages (positive or negative) to the organism. We had excluded from our definition of selfish DNA those cases where the selective advantage is very high. To decide whether a repeated sequence is parasitic or not, one must determine whether the presence of the repeated sequence in the population is mainly due to the efficiency with which the sequence spreads intragenomically or mainly due to the reproductive success of those individuals in the population who possess repeated copies of the sequence. Only in the former case do we consider it useful to use the term selfish or parasitic DNA, as opposed to useful or symbiotic DNA — the borderline between the two may not be sharp.

In our recent experience most people will agree, after discussion, that ignorant DNA, parasitic DNA, symbiotic DNA (that is, parasitic DNA which has become useful to the organism) and ‘dead’ DNA of one sort or another are all likely to be present in the chromosomes of higher organisms. Where people differ is in their estimates of the relative amounts. We feel that this can only be decided by experiment.

Orgel, L.E., F.H.C. Crick, and C. Sapienza. 1980. Selfish DNA. Nature 288: 645-646.

Perhaps the most surprising discovery in the initial studies of eukaryotic gene structure has been that many genes contain interruptions in the coding sequences. The origin and the function of these intervening sequences (IVS or introns) are not yet well understood but are the subject of intense investigation.

Wallace, R.B., P.F. Johnson, S. Tanaka, M. Schöld, K. Itakura, and J. Abelson. 1980. Directed deletion of a yeast transfer RNA intervening sequence. Science 209: 1396-1400.

As long ago as 1970, Ohno argued, on the basis of genetic load, that much of the eukaryote genome was little more than junk. This viewpoint, which is still unpalatable to many biologists, now has a substantial supporting DNA data base. More recently, this has led Ohno to conclude that genes in the mammalian genome are like ‘oases in a barren desert’ (Ohno 1982) and that for every copy of a new gene that has arisen during evolution, hundreds of other copies have ‘degenerated’ to swell the ranks of junk DNA (Ohno 1985).

It has, in the past, been commonplace to assume that most, if not all, aspects of the morphology, physiology and behaviour of an organism represent adaptive responses to the environment in which that organism lives. This assumption, however, is difficult to test objectively and represents more an article of faith than of fact. Indeed, biologists have become addicted to the adaptationist viewpoint not so much because of the compelling evidence in favour of it, but rather because it seems so eminently logical and reasonable. This view, of course, assumes that functional explanations must necessarily exist for all facets of the bewildering diversity we see within and between genomes. An alternative extreme viewpoint is that eukaryote genomes are, in effect, simply larger, more sophisticated and embellished prokaryote genomes, loaded with non-coding DNA sequences which are in a constant state of flux but without any significant short-term impact on the phenotype.
To decide which, if either, of these interpretations is the more realistic, we need to determine the number of functional genes within a genome and the proportion of these that are developmentally significant. We also require precise information on the changes that go on within a genome at the molecular level and the extent to which these lead to meaningful evolutionary change. Compared to the differences in structural gene composition between related species, we now know that there are much more striking molecular differences in their repeated DNA components. This raises the question of whether this is because such sequences are important or unimportant. There is also a clear need to distinguish between historical chance and biological necessity as causative factors in determining genome structure.

John, B. and G.L.G. Miklos. 1988. The Eukaryote Genome in Development and Evolution. Allen & Unwin, London. p.24-25.

Interest in repetitive DNA sequences goes back many years but, as with many aspects of molecular biology, the advent of recombinant DNA technology and DNA sequencing now permits previously unmatched scrutiny of the structures of interest.

If mobility is a reality, and most agree that it probably is, then it seems likely that at least some members of repeat families will have important effects in the genome, even if they have no formal function. Enhancing recombination and altering rates of gene expression are obvious possibilities, while the initiation of new species is a more recondite proposal.

The truth is, however, that the functions of the large and motley collection of repeated DNA families are proving particularly resistant to elucidation. Putative functions are many, including, variously, involvement in chromosome pairing, control of gene expression, processing of messenger RNA precursors, and participation in DNA replication. So far none has been established, save for the single exception of a small family that gives rise to 7S RNA, a molecule that recently was serendipitously discovered to be an essential component of a particle that mediates the secretion of proteins from cells.

Some repetitive DNA will undoubtedly be shown to have a function, in the formal sense; some will likely be shown to exert important effects; and the remainder may well have no function or effect at all and can therefore be called selfish DNA. Repetitive DNA constitutes a substantial proportion of the genome (up to 90 percent in some cases), and there is considerable speculation on how it will eventually be divided between these three groups. Current bets would put a small fraction in the function category, with distribution of the rest rising steeply through the effect and selfish categories.

Satellite DNA unquestionably is a puzzle. What determines the number of copies in a repeat family? And how does the genome tolerate so much of it? Perhaps, as Singer has recently promulgated, just a small fraction of the satellite sequences is essential to some genomic function while the remainder is harmless surplus. This, she indicates, is a comfortable middle ground between the extreme selfish DNA position, which sees no function in all this “junk DNA,” and the adaptationist position, which looks for functions in every structure. The same questions and speculations can be applied to dispersed repetitive DNA.

One observation that might be taken as evidence of function in repeated sequences is the frequency of transcription into RNA. A significant proportion of nuclear RNA contains transcripts of repeated sequences, although 90 percent of this is lost in RNA processing and exit to the cytoplasm. Davidson and his colleagues have shown that in sea urchin the spectrum of repeat families that are transcribed changes during development, an appealing argument for some regulatory function. Most intriguing, however, is the discovery that only a small proportion of any repeat family is ever transcribed. “Most members appear to be quiescent, which must make you cautious when isolating samples in search of their function.”

It is clear that, from their abundance, their unusual structure, and their frequent transcription, dispersed repetitive DNA families cannot be ignored. But it is equally clear that for the most part they, like their tandemly repeated relatives, remain a phenomenon in search of a function.

Lewin, R. 1982. Repeated DNA still in search of a function. Science 217: 621-623.
[Reporting about International Workshop in Highly Repeated DNA, NIH, July, 1982]

Even though the human β-globin complex contains a relatively large number of active genes, 95 percent of the locus is made up of DNA that does not code for proteins. What is the role of this extra DNA, if any? The pseudogenes constitute just a small proportion of the region, although more pseudogenes might exist. Some of the DNA is made up of representatives of well-known families of repetitive sequences. And the remainder is DNA of no known function or comparable sequence.
“We wanted to test the hypothesis that this extra DNA is ‘junk DNA,'” says Jeffreys, “so we compared the β loci in humans, gorillas, and baboons.” Jeffreys and his colleagues reasoned that if it were junk DNA, then over the 20 to 40 million years of evolution represented by humans, apes, and Old World monkeys both the sequence and the overall quantity of intergenic DNA could be expected to vary. “It turned out that the cluster is remarkably stable,” reports Jeffreys. “The overall pattern and size of the cluster is the same, and the rate of nucleotide substitutions is one-quarter to one-fifth of what be expected in functionless DNA”. The noncoding DNA therefore appears not to be junk, but what function it might perform is still a mystery.

Lewin, R. 1981. Evolutionary history written in globin genes. Science 214: 426-427.

Since the discovery that many eukaryotic genes are discontinuous, a number of studies have been directed towards identifying a function for intervening sequences (IVSs).

Whilst the results presented here point out a clear role for the intron in one tRNA gene family, a common function for all tRNA intervening sequences is not evident. Perhaps tRNA IVSs represent remnants of evolutionary gene rearrangements and only occasionally evolve a role in RNA synthesis. Alternatively, there may be a common but as yet identified function for these IVSs, and the role for the IVS described here for tRNA(tyr) may represent an auxiliary use of the precursor RNA. Clearly, analysis of IVS mutants in other tRNA gene families will be necessary to obtain definitive answers to these questions.

Johnson, P.F. and J. Abelson. 1983. The yeast tRNA(tyr) gene intron is essential for correct modification of its tRNA product. Nature 302: 681-687.

Repetitive sequences are interspersed with single-copy regions in the human genome. Because this arrangement is conserved in hetergeneous nuclear (hn) RNA, the role of repetitive sequences in the control of gene expression at the transcriptional and posttranscriptional levels is conjectured.

A large amount of evidence suggests that most double-stranded regions in hnRNA are transcripts of Alu repeats. The presence of the Alu repeat in mRNA may result from incomplete removal of Alu sequences in the nucleus, such that a region of homology to the Alu repeat is preserved. In this regard, we note that the region of association in RNA complexes (120bp) and the average size of R loops in groups III, IV, and V are significantly smaller than the Alu DNA sequence. This observation could also reflect involvement of Alu sequences in mRNA processing. Recently, evidence of molecular interactions among different species of cytoplasmic RNA has been reported. The presence of Alu repeat transcripts in different cytoplasmic molecules of either mRNA or 7S RNA suggests the potential for in vivo occurrence of interactions involving Alu repeat transcripts. Such interactions may also play a role in the cytoplasmic stability or translation efficiency of mRNA.
Finally, we find that it is most intriguing to have detected a significant frequency of complexes in hybridized RNA of normal T lymphocytes but not of placental tissue. This observation could reflect tissue-specific transcription of Alu sequences.

Calabretta, B., D.L. Robberson, A.L. Maizel, and G.F. Saunders. 1981. mRNA in human cells contains sequences complementary to the Alu family of repeated DNA. Proceedings of the National Academy of Sciences of the USA 78: 6003-6007.

The most striking feature of the Alu repeat family is its large numerical representation in the human genome, which suggests that Alu repeat sequences might be involved in genetic rearrangements, a role which could be identified if we consider the human genome to be a dynamic structure. Although most members of the Alu family are scattered throughout the human genome, some may be clustered in certain genomic regions. Such an arrangement would provide a good opportunity to test the hypothesis that repetitive sequences facilitate genetic rearrangements.

The pattern of interspersion may have been fixed in evolution, with certain Alu repeat members having been recruited for specific cellular functions, for example, in the initiation of DNA replication and as promotor sites for RNA polymerase III.

In general, the human genome seems to be a dynamic structure in which variations can be introduced by sequence rearrangement, certain of which can lead to the formation of circular duplex DNA molecules. This genetic plasticity is quite characteristic of transposable elements, and the consequent genome alterations are relevant to evolutionary changes, while the DNA rearrangements may be involved in human cancer.

Calabretta, B., D.L. Robberson, H.A. Barrera-Saldana, T.P. Lambrou, and G.F. Saunders. 1982. Genome instability in a region of human DNA enriched in Alu repeat sequences. Nature 296: 219-225.

Many students of DNA analysis have been unsuspectingly struck by the regularity and length of banding patterns on sequencing gels produced by simple repetitive DNA. When discussing the meaning of these simple DNA sequences, most professional genome watchers hesitate and simply refer to the enormous amount of sometimes unpalatable literature on the subject. They stress the complexity and intractability of the problem of simple sequences (and repetitive DNA as a whole). Investigators working directly in the field of repetitive sequences must justify the relevance of their efforts before their peers and granting agencies. On this truly meaningful and even existential note, I review here certain related members of the family of simple sequences — the GA(TC)A repeats.

Finally, possible functional implications are touched upong by covering RNA expression data of GA(TC)A-containing sequences. Hypotheses on the control of gene expression by GA(TC)A sequences are not covered because the experimental basis is at best scarce in animal systems. Nevertheless, it should be evident from this review that the conception that all the simple repetitive sequences are just “junk” or genes is simplistic. It is interesting but exceedingly difficult to speculate on why they are a characteristic component of the genomes of present-day animals.

All attempts to identify any natural GA(TC)A translation products in eukaryotes, for example, in monoclonal antibodies, proved fruitless. Hence the question of the functional meaning, if any, of simple, tandemly repeated sequences such as GA(TC)A DNA remains unanswered.
Because of the high copy numbers, the analysis of simple repetitive DNA is a serious, difficult, and unspectacular Sisyphean labor. We have learned to question many of the general preconceptions about the functionality of DNA sequences. Merely because they exist in the genomes of more or less related animal species does not mean that they have a function.

Epplen, J.T. 1988. On simple repeated GA(TC)A sequences in animal genomes: a critical reappraisal. Journal of Heredity 79: 409-417.

The slime moulds can therefore help us to investigate the structure and evolution of repetitive DNA in ‘simple’ eukaryotes and to understand how these sequences contribute to the architecture and function of the eukaryotic genome. Several questions remain, including perhaps the most important: do repetitive sequences perform some definable function?

DNA satellites and mobile genetic elements have both seemingly developed or adapted mechanisms which permit their sequences to multiply in eukaryotic genomes. As suggested a number of years ago, and recently reviewed, this line of thinking suggests that most, if not all, families of repetitive sequence may serve no useful function in eukaryotic DNA. This is the ‘selfish’ or ‘junk’ DNA hypothesis. There have been many supporters of the opposing view that at least some families of repeated sequence must perform some useful function, but so far no fully convincing case has been made for a clearly identifiable role for any repeated sequence family other than repeated genes such as those for rRNA. This may mean either that no such functions exist, or that experimentalists have hitherto possibly not been looking in the right direction. What new information has arisen from recent work that may provide clues as to which new directions to take? …

Hardman, N. 1986. Slime moulds and the origin of foldback DNA. BioEssays 5: 105-111.

There have been several suggested explanations for the presence of noncoding intervening sequences in many eukaryotic structural genes. They may be examples of ‘selfish DNA’, conferring little phenotypic advantage, or they may have some importance in gene expression and/or evolution.

It is possible that the relationship between the location of the splice junction in the gene at the surface of the protein confers a biological advantage and hence is a result of natural selection. Introns and their associated splicing systems could be exploited in many ways during the evolution of a protein.

Craik, C.S., S. Sprang, R. Fletterick, and W.J. Rutter. 1982. Intron-exon splice junctions map at protein surfaces. Nature 299: 180-182.

We conclude from this experiment that the intron in the yeast actin gene does not have an observable function. It is possible that the role of the intron is too subtle to be observed in laboratory conditions of growth or that the intron, while having evolutionary significance, has no present role. To conclude that this is true for all yeast genes that contain introns would of course be premature, but there exist strains in which mitochondrial introns have been removed with no observable effect.

Ng, R., H. Domdey, G. Larson, J.J. Rossi, and J. Abelson. 1985. A test for intron function in the yeast actin gene. Nature 314: 183-184.

Solutions to problems of how introns are dealt with by cells do not address the question of why introns are there at all, questions about intron function. Some introns in some genes perform clearly regulatory roles, since splicing factors specific to the tissue or developmental stage decide when and where splicing should occur (Breitbart et al. 1985). In addition, some introns in some genes contain enhancers or modulators of the expression of those genes (Slater et al. 1985). However, the great majority of introns in protein-coding genes have no such “functions.” Direct experimental as well as indirect comparative data show that most introns can be removed from genes without phenotypic effect (Blake 1985). Thus, in terms of beneficial effects on the fitnesses of organisms, we almost certainly cannot account for the presence of the majority of individual introns, nor for the propensity to have introns at all, even though introns may on the average represent as much as 90% of the length of a gene and perhaps as much as half of the total DNA in some complex eukaryotes such as humans.

Thinking about introns challenges basic concepts of adaptation and function. In particular, it challenges the rather strict adaptationist approach that molecular biologists have traditionally taken toward elements of gene structure.

Doolittle, W.F. 1987. The origin and function of intervening sequences in DNA: a review. American Naturalist 130: 915-928.

Ever since the discovery of split genes, there has been a debate about why they are split. This can be resolved into three separate problems: the origin of the introns that split the genes (separating exons from each other), the role of introns in evolution, and their present function, if any.

Rogers, J. 1985. Exon shuffling and intron insertion in serine protease genes. Nature 315: 458-459.

____________

Part of the Quotes of interest series.