Quote-mine this!

So, I have recently become aware that Genomicron is cited on an intelligent design wiki entry for “junk DNA“. They quote two paragraphs from my post A word about “junk DNA”. Specifically, a paragraph in which I critique the term “junk DNA” as unnecessarily implying non-function for all non-coding DNA, and a paragraph in which I list many (unsubstantiated) hypotheses about universal functions for non-coding DNA. Here are two paragraphs from the post that they don’t quote:

To satisfy this expectation, creationist authors (borrowing, of course, from the work of molecular biologists, as they do no such research themselves) simply equivocate the various types of non-coding DNA, and mistakenly suggest that functions discovered for a few examples of some types of non-coding sequences indicate functions for all (see Max 2002 for a cogent rebuttal to these creationist confusions). Case in point: a few years ago, much ado was made of Beaton and Cavalier-Smith’s (1999) titular proclamation, based on a survey of cryptomonad nuclear and nucleomorphic genomes, that “eukaryotic non-coding DNA is functional”. The point was evidently lost that the function proposed by Beaton and Cavalier-Smith (1999) was based entirely on coevolutionary interactions between nucleus size and cell size.

Does non-coding DNA have a function? Some of it does, to be sure. Some of it is involved in chromosome structure and cell division (e.g., telomeres, centromeres). Some of it is undoubtedly regulatory in nature. Some of it is involved in alternative splicing (Kondrashov et al. 2003). A fair portion of it in various genomes shows signs of being evolutionarily conserved, which may imply function (Bejerano et al. 2004; Andolfatto 2005; Kondrashov 2005; Woolfe et al. 2005; Halligan and Keightley 2006). On the other hand, the largest fraction is comprised of transposable elements — some of which become co-opted by the host genome, some of which play major role in generating genomic variation, some of which may be involved in cellular stress response, and yet others of which remain detrimental to host fitness (Kidwell and Lisch 2001; Biémont and Vieira 2006). The upshot is that some non-coding DNA is most certainly functional — but when it is, this usually makes sense only in an evolutionary context, particularly through processes like co-option. More broadly, those who would attribute a universal function for non-coding DNA must bear the following in mind: any proposed function for all non-coding DNA must explain why an onion or a grasshopper needs five times more of it than anyone reading this sentence.

Funny that my post Function, non-function, some function: a brief history of junk DNA, in which I discuss how anti-evolutionists are wrong about the history and the science of non-coding DNA, is not quoted.

Here’s a quote they are welcome to use: Simply saying “junk DNA will turn out to have a function” is not a scientifically actionable prediction unless you specify what that function will be and a way to test the proposed function.



Effect versus function.

There has been quite a bit of discussion in the media recently about discoveries of [indirect evidence for] functions in [small portions of] non-coding DNA. Unfortunately, the parts in square brackets are often omitted. It is also the case that many reports overlook the important distinction between effect and function, leaving readers with the impression that non-coding DNA can only be either totally insignificant or vitally important.

Here is the relevant part of the Merriam-Webster Dictionary entry on function:

“The action for which a person or thing is specially fitted, used, or responsible or for which a thing exists.”

And on effect:

“Something that is produced by an agent or cause; something that follows immediately from an antecedent; a resultant condition.”

In other words, a function fulfills a specific role to produce a positive result, with a close fit between cause and outcome shaped by either design (in human technology) or natural selection (in biological systems). Effects are also the outcome of identifiable causes, but they can be positive, neutral, or negative and may be generated directly or indirectly by the causal mechanism. Thus, it is not possible to have a function without any effects, but something can exert an effect — perhaps a very important one — without this constituting a function.

Consider an example. The immune system of the body has a clear function: to defend against pathogens. Viruses likewise have functions, but this only makes sense if one considers the issue from the perspective of the viruses themselves and not of their hosts. Specifically, parts of the virus function in allowing them to circumvent the host’s immunity and to usurp its replication machinery. Viruses do, however, have effects on hosts — usually negative, but apparently sometimes indirectly positive.

The genomes of eukaryotes consist of many types of DNA sequences. The exons that encode proteins make up a small percentage (less than 2% in humans), and the rest is non-coding DNA of various sorts: introns, pseudogenes, satellite DNA, and especially transposable elements (also called TEs, transposons, or mobile elements). The latter represent a diverse set of sequences that are capable of moving about and duplicating in the genome independently of the normal replication process. In this sense, they are often considered “parasites” of the “host” genome. Overall, TEs also make up the largest portion of non-coding DNA in the genomes analyzed so far (at least 45% in humans), although the particular types, abundances, and levels of activity of TEs vary among species.

Some TEs have evidently been co-opted (exapted) to perform functions at the host level, meaning that they have moved from being parasites to integrated participants in the functioning of the genome. This includes regulating genes, involvement in the genetic cutting-and-splicing mechanism of the vertebrate immune system, and perhaps cellular stress response. On the other hand, many diseases can result from mutations caused by the insertion of a TE into an existing gene. From the perspective of the host, TEs can have different effects depending on the context: some TEs are functional but some are detrimental. The large majority, however, have not been shown to fall into either category.

Nevertheless, a lack of evidence for either function or harm does not mean that TEs are without effects. It is well known that the total amount of DNA (genome size) is linked to cell size, cell division rate, metabolic rate, and developmental rate. In other words, a large genome is typically found in large, slowly dividing cells within an organism displaying a low metabolic rate and sluggish development. Conversely, organisms with high metabolic rate or rapid development tend to have small genomes. To the extent that total DNA content directly affects cell size and division, these can be considered effects — by their presence in the aggregate — of non-coding DNA elements.

Is slowing down metabolism or delaying development a function? Some authors think so, but most would argue that these are effects that are tolerated by the organism because they are not overly detrimental. That is, parasites spread within the genome and individually may have little or no effect (and no function), but in sum may have substantial effects on the cell and organism. The amount of accumulation would depend on the tolerance of the organism based on its biology. For example, it is unlikely that a mammal with a high metabolic rate could have a genome the size of a salamander’s.

The point of this discussion is to note that seeking functions for non-coding DNA is an interesting area of research, but that even if most sequences are not functional, they can still be important from a biological perspective. Similarly, one would not invoke function for hosts to explain the existence of viruses, nor would one dismiss viruses as unimportant if functions were never found at the host level. One would, however, focus considerable attention on explaining how viruses spread, why some are more virulent than others, and how they exert their effects.


Signs of selfish DNA.

Some people have trouble understanding the fact that eukaryotic genomes are made up primarily of transposable elements — genetic parasites that are there (at least initially) because they are good at being there. Some take on functions, but others remain causes of disease or simply become inactivated and remain as molecular fossils.

Here is one of those pictures that circulates around in emails that I find amusing, but which also conveys something about the concept of transposable elements in the genome (click for larger image):


Basically, the sign appears to exist largely for the sake of existing. But note that a small portion of it at the bottom has a critical function (“Also, the bridge is out ahead”).

It’s a weak analogy, maybe, but that is a pretty funny sign.


Suggestions for science writers.

In an earlier post, I expressed some frustration at the way discoveries about non-coding DNA are reported. I noted in particular ScienceDaily‘s description of the recent publication of the opossum genome sequence. In case you missed it, here it is again:

Opossum Genome Shows ‘Junk’ DNA Source Of Genetic Innovation

(…)

The research, released Wednesday (May 9) also illustrated a mechanism for those regulatory changes. It showed that an important source of genetic innovation comes from bits of DNA, called transposons, that make up roughly half of our genome and that were previously thought to be genetic “junk.”

The research shows that this so-called junk DNA is anything but, and that it instead can help drive evolution by moving between chromosomes, turning genes on and off in new ways.

(…)

It had been initially thought that most of a creature’s DNA was made up of protein-coding genes and that a relatively small part of the DNA was made up of regulatory portions that tell the rest when to turn on and off.

As studies of mammalian genomes advanced, however, it became apparent that that view was incorrect. The regulatory part of the genome was two to three times larger than the portion that actually held the instructions for individual proteins.

Since my post, National Geographic has gotten in on the act as well:

First Decoded Marsupial Genome Reveals “Junk DNA” Surprise

(…)

The study reveals a surprising role in human evolution for “jumping genes”—parasitic bits of “junk DNA” that until now were thought to be nothing more than a nuisance—and may also lead to a number of medical breakthroughs.

(…)

The scientists were also surprised to find that these regulatory sequences have in large part been distributed across the human genome by so-called jumping genes.

These genes have hopped through chromosomes for more than a billion years, leaving behind many copies of themselves. So until now the genes had been widely regarded by scientists as parasites, or “junk DNA,” that played no creative role in evolution.

You can consult my earlier posts for specific complaints on this. More generally, I have the following suggestions for science writers who are reporting on interesting findings about non-coding DNA.

1) Don’t assume that every new discovery is overthrowing some recalcitrant conventional wisdom.

If you want to claim that all scientists have long believed that all non-coding DNA is totally functionless, kindly point to a few examples. Here are a few cases that suggest that you may have a bit of trouble with this.

When Barbara McClintock first characterized transposable elements in 1950, she called them “controlling elements”. Comings (1972), who gave the first detailed discussion of “junk DNA” (his paper, unlike Ohno’s, was an explicit discussion of the topic of “junk DNA” and appeared in print before Ohno [1972], which he cites as “in press”), stated that “being junk doesn’t mean it is entirely useless.”

Orgel and Crick (1980), in their paper introducing the concept of “selfish DNA”, noted very clearly that:

It would be surprising if the host organism did not occasionally find some use for particular selfish DNA sequences, especially if there were many different sequences widely distributed over the chromosomes. One obvious use … would be for control purposes at one level or another. This seems more than plausible.

Doolittle and Sapienza (1980), whose paper appeared along with that of Orgel and Crick (1980), were equally unambiguous on the issue:

We do not deny that prokaryotic transposable elements or repetitive and unique-sequence DNAs not coding for proteins in eukaryotes may have roles of immediate phenotypic benefit to the organism. Nor do we deny roles for these elements in the evolutionary process. We do question the almost automatic invocation of such roles for DNAs whose function is not obvious, when another and perhaps simpler explanation for their origin and maintenance is possible.

2) Don’t imply, intimate, or suggest, directly or indirectly, that the discovery of function in some non-coding DNA sequences means that all non-coding DNA is functional.

Remember to implement the onion test if you are tempted to argue otherwise. Note that simply having “junk DNA found to be functional” as a headline with no qualification or clarification commits the fallacy as well. The last two examples of poor reporting (see here and here) have neglected to mention that the amount of non-coding DNA that was shown to be conserved and presumably functional is less than 5% of the genome. I imagine that a reader’s interpretation may change somewhat when this important detail is made clear.

I am as excited as anyone about new discoveries in genome biology. I have also been critical of the tendency to focus too much on protein-coding genes or simple allele frequency changes in evolutionary science (Gregory 2005). But it does not follow that every new finding is revolutionary in and of itself, nor is it the case that non-coding DNA has been dismissed as unimportant for decades and that its relevance is only now being admitted by stubborn academics. The commentaries of people like Comings, Ohno, Orgel and Crick, and Doolittle and Sapienza were made in response to an overemphasis on functional explanations for all non-coding DNA, but even they did not reject the potential importance of some non-coding elements.

There is a growing frustration among scientists relating to the unnecessary search for “balance in journalists’ reporting. What I see happening with non-coding DNA is the opposite of this, though equally problematic. To wit, many writers are painting a monochromatic picture of genome biologists when in fact there has always been a full spectrum of opinions regarding the importance of non-coding sequences. The material is exciting; it doesn’t need to be embellished with exaggerated controversy to be worth reading about.

___________

References

Comings, D.E. 1972. The structure and function of chromatin. Advances in Human Genetics 3: 237-431.

Doolittle, W.F. and C. Sapienza. 1980. Selfish genes, the phenotype paradigm and genome evolution. Nature 284: 601-603.

Gregory, T.R. 2005. Macroevolution and the genome. In The Evolution of the Genome (ed. T.R. Gregory), pp. 679-729. Elsevier, San Diego.

McClintock, B. 1950. The origin and behavior of mutable loci in maize. Proceedings of the National Academy of Sciences of the USA 36: 344-355.

Ohno, S. 1972. So much “junk” DNA in our genome. In Evolution of Genetic Systems (ed. H.H. Smith), pp. 366-370. Gordon and Breach, New York.

Orgel, L.E. and F.H.C. Crick. 1980. Selfish DNA: the ultimate parasite. Nature 284: 604-607.



Non-coding DNA and the opossum genome.

The genome sequence of the gray short-tailed opossum, Monodelphis domestica, was published in today’s issue of Nature (Mikkelsen et al. 2007). It is interesting for many reasons, including its status as the first marsupial genome to be sequenced, its relatively large genome size, and low chromosome number (2n = 18). It is also interesting because it contains a similar number of genes (18,000 – 20,000) to humans, the vast majority of which exhibit close associations with the genes of placental mammals. Also, in keeping with the hypothesis that transposable elements are the dominant type of DNA in most eukaryotic genomes, the comparatively large opossum genome is comprised of 52% transposable elements, the most for any amniote sequenced so far.

One of the most intriguing discoveries about the opossum genome is that changes to protein-coding genes seem not to have been the driving force behind mammalian diversification. Instead, non-coding elements with regulatory functions — mostly derived from formerly parasitic transposable elements — appear to underly much of the difference.

Now, I would prefer to just talk about the science here, noting that this is yet another great example of the complex nature of genome evolution, the key role played by “non-standard” genetic processes (Gregory 2005), and the ever-increasing relevance of non-coding DNA in genomics. But, inevitably, I must comment on how this discovery has been reported. Here is what ScienceDaily (which I otherwise like a great deal) said about it:

Opossum Genome Shows ‘Junk’ DNA Source Of Genetic Innovation

(…)

The research, released Wednesday (May 9) also illustrated a mechanism for those regulatory changes. It showed that an important source of genetic innovation comes from bits of DNA, called transposons, that make up roughly half of our genome and that were previously thought to be genetic “junk.”

The research shows that this so-called junk DNA is anything but, and that it instead can help drive evolution by moving between chromosomes, turning genes on and off in new ways.

(…)

It had been initially thought that most of a creature’s DNA was made up of protein-coding genes and that a relatively small part of the DNA was made up of regulatory portions that tell the rest when to turn on and off.

As studies of mammalian genomes advanced, however, it became apparent that that view was incorrect. The regulatory part of the genome was two to three times larger than the portion that actually held the instructions for individual proteins.

I will just reiterate two brief points, as I have already dealt with some of these topics in earlier posts (and will undoubtedly have to do so again in the future). One, very few people have actually argued that all non-coding DNA is 100% functionlesss “junk”, and no one is surprised anymore when a regulatory or other function is observed for some non-coding DNA sequences. Moreover, transposable elements are more commonly labeled as “selfish DNA”, and it has been noted in countless articles that they can and do take on functions at the organism level even if they begin as parasites at the genome level. Two, yet again we are talking about a small portion of the genome such that this should not be considered a demonstration that all non-coding DNA is functional. In particular, the authors identified about 104 million base pairs of DNA that is conserved (i.e., shared and mostly invariant) among mammals, about 29% of which overlapped with protein-coding genes. In other words, about 74 million base pairs of non-coding DNA, much of it derived from former transposable elements, is found to be conserved among mammals and shows signs of being functional in regulation. The genome size of the opossum is probably around 3,500 million bases, which means that this functional non-coding DNA makes up 2% of the genome.

A note to science writers. There is nothing surprising about some sequences of non-coding DNA having an important function. The notion that all non-coding DNA has long been assumed to be completely functionless junk is a straw man. And to avoid misleading readers, you really need to specify that most examples of non-coding DNA with a function represent a very small portion of the total genome.

___________

References

Gregory, T.R. 2005. Macroevolution and the genome. In The Evolution of the Genome (ed. T.R. Gregory), pp. 679-729. Elsevier, San Diego.

Mikkelsen, T.S., M.J. Wakefield, B. Aken, C.T. Amemiya, J.L. Chang, S. Duke, M. Garber, A.J. Gentles, L. Goodstadt, A. Heger, J. Jurka, M. Kamal, E. Mauceli, S.M.J. Searle, T. Sharpe, M.L. Baker, M.A. Batzer, P.V. Benos, K. Belov, M. Clamp, A. Cook, J. Cuff, R. Das, L. Davidow, J.E. Deakin, M.J. Fazzari, J.L. Glass, M. Grabherr, J.M. Greally, W. Gu, T.A. Hore, G.A. Huttley, M. Kleber, R.L. Jirtle, E. Koina, J.T. Lee, S. Mahony, M.A. Marra, R.D. Miller, R.D. Nicholls, M. Oda, A.T. Papenfuss, Z.E. Parra, D.D. Pollock, D.A. Ray, J.E. Schein, T.P. Speed, K. Thompson, J.L. VandeBerg, C.M. Wade, J.A. Walker, P.D. Waters, C. Webber, J.R. Weidman, X. Xie, M.C. Zody, J.A.M. Graves, C.P. Ponting, M. Breen, P.B. Samollow, E.S. Lander, and K. Lindblad-Toh. 2007. Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences. Nature 447: 167-177.


What do non-coding DNA and sleep have in common?

Like many scientists, I make use of Web of Science and PubMed to alert me when papers in my field of research are published or when one of my articles is cited. The latter may sound vain, but in fact it is helpful because one’s papers sometimes are cited in unexpected ways that would probably not be discovered by one’s routine literature searches.

This post is about an intriguing example of a connection that I never would have drawn myself and which I probably would not have seen in the literature had I not been alerted that some of my articles on genome size and non-coding DNA had been cited. Specifically, this involves two recent papers on what I would have assumed was a totally unrelated topic: sleep.


I am sure that everyone reading this blog is familiar with the importance of sleep in a physiological sense — if you don’t sleep for a large portion (~1/3!) of every day, both your mind and body suffer. However, in an evolutionary sense, why we sleep remains something of a puzzle. As Savage and West (2007) put it, “Sleep is one of the most noticeable and widespread phenomena occurring in multicellular animals. Nevertheless, no consensus for a theory of its origins has emerged.”

So what does this have to do with non-coding DNA?

In the first paper mentioned above, Savage and West (2007) put forth a framework for studying the function of sleep that relates to cellular damage repair and brain reorganization. This includes linkages with metabolism, body size, and cell size, the latter of which is related to genome size, and thus to the quantity of non-coding DNA. Moreover, the amount of DNA per genome may be related to the genome’s susceptibility to mutational damage (though this remains to be established, and one could come up with a priori arguments for why the relationship might be positive or negative). So, in this case, the amount of non-coding DNA may relate to one or more of the proposed functions of sleep, and thus be relevant to an adaptive interpretation of the question. Interesting stuff.

The second paper, by Rial et al. (2007), takes a rather different approach to the question. They argue that sleep per se is not really necessary — rest would suffice. These authors invoke non-coding DNA not as a possible factor of interest in explanations of sleep, but as an analogy. In particular, Rial and colleagues lament the fact that sleep is almost always approached from an adaptive standpoint because it is relatively complex. “However,” they write, “complexity is not by itself firm proof of adaptation. A well known, complex, but seemingly useless structure, could be invoked as a metaphor for the uselessness of many sleep signs.” That complex, (possibly) largely functionless feature is, you guessed it, non-coding DNA. Sleep, under their interpretation, may be more like non-coding DNA than like the eye, in that it is not the product of adaptation but rather reflects a byproduct of other processes. In addition, whereas some non-coding DNA takes on a secondary function, so may have some components of sleep.

I am not qualified to comment on the scientific merit of either hypothesis, so I will not say any more about their specific arguments. I am, however, pleasantly surprised to see non-coding DNA making an appearance, both mechanistically and conceptually, in discussions of what I would have thought was an unrelated subject of inquiry. It just goes to show the interconnectedness of science, and the importance of reading outside the bounds of one’s own specialized field.

______________

References

Rial, R.B., M. del Carmen Nicolau, A. Gamundi, M. Akaarir, S. Aparicio, C. Garau, S. Tejada, C. Roca, L. Gene, D. Moranta, and S. Esteban. 2007. The trivial function of sleep. Sleep Medicine Reviews, in press.

Savage, V.M. and G.B. West. 2007. A quantitative, theoretical framework for understanding mammalian sleep. Proceedings of the National Academy of Sciences of the USA 104: 1051-1056.


Comments on "Noncoding DNA and Junk DNA" (re-post).

The following is a re-post of my comments on the recently posted Noncoding DNA and Junk DNA at Sandwalk. Needless to say, I am quite pleased to see such active discussion about non-coding DNA. Passages in italics are excerpts from the original article.

TR Gregory said…

Ryan Gregory has serious doubts about the usefulness of the term as he explains in his excellent article A word about “junk DNA”.

Just to clarify, I think the term could be useful — indeed, it was useful when Ohno coined it. The problem is that it is seldom used in an appropriate way. If the meaning were specified explicitly to be “regions strongly suspected of being non-functional with evidence to back it up” (which, incidentally, is not the original definition according to Ohno (1972) or Comings (1972)), and if people used it only in this way, then I would not have a problem with this. But given the difficulty that people seem to have in accepting that some DNA may truly not have a function at the organism level, I don’t know if we could ever get it to be used with such precision.

…a new term, Junctional DNA, to describe DNA that probably has a function but that function isn’t known… think we don’t need to go there. It’s sufficient to remind people that lots of DNA outside of genes has a function and these functions have been known for decades.

That neologism was suggested in response to Minkel’s appeal for a term that would “make the distinction between functional and nonfunctional noncoding DNA clear to a popular audience”. My main suggestion was to call DNA by what it is known to be, if at all possible, by function (“regulatory DNA”, “structural DNA”) or by type (“pseudogene”, “transposable element”, “intron”). Your definition of “junk DNA” is also more precise than most usages, meaning that you specify that the term only be applied to sequences for which there is evidence (not just assumption) of non-function. That leaves us with something in between for journalists to talk about with a catchy buzzword. “Junctional DNA” lets them specify that we’re not talking about “junk DNA” or “functional DNA” — i.e., there is some evidence for function (e.g., being conserved) but no evidence of what that function is. The main utility would be to stop the very frustrating leap that gets made from “this 1% of the genome may have a function, so the whole thing must have this function” kind of reporting. Now they could say “another 1% has moved into the category of ‘junctional DNA'”. I think that would be considerably less misleading than current wording.

Note that I’m avoiding the term “noncoding” DNA here. This is because to me the term “coding DNA” only refers to the coding region of a gene that encodes a protein … there are many genes for RNAs that are not properly called coding regions so they would fall into the noncoding DNA category … introns in eukaryotic genomes would be “noncoding DNA” as far as I’m concerned. I think that Ryan Gregory and others use the term “noncoding DNA” to refer to all DNA that’s not part of a gene instead of all DNA that’s not part of the coding region of a protein encoding gene. I’m not certain of this.

By definition, non-coding DNA is, and always has been, everything other than exons. The reason this is relevant is that early work in genome biology assumed that there should be a 1 to 1 correspondence between DNA content and protein-coding gene number. This is work that occurred for at least two decades before the discovery of introns, pseudogenes, and other non-coding DNA. Now we have more descriptive names for the categories of DNA that are not the genes, all the genes, and nothing but the genes. I actually don’t know of anyone else who would have a problem calling introns, pseudogenes, and regulatory regions “non-coding DNA”. Certainly, Ohno, Crick, and many others have historically put introns in the same non-protein-coding grouping as pseudogenes. It’s just a category — you also have more specific subcategories to apply to each of the types of non-coding DNA. Perhaps your objection relates to an undue emphasis on the distinction between exons and everything else — well, that’s the history of the past half century of this field, so it should be no surprise that the terminology reflects this.

Read Gregory’s article for the short concise version of this dispute. What it means is that junk DNA threatens the worldviews of both Dembski and Dawkins!

Not quite. What you’re leaving out of this is the possibility of multiple levels of selection. In the original edition of The Selfish Gene (1976, p.76), Dawkins argued that “the simplest way to explain the surplus DNA is to suppose that it is a parasite, or at best a harmless but useless passenger, hitching a ride in the survival machines created by the other DNA”. Cavalier-Smith (1977) drew a similar conclusion (before he had read Dawkins), and Doolittle and Sapienza (1980) and Orgel and Crick (1980) [yes, that Crick] independently developed the concept of “selfish DNA” a few years later. This is an explicitly multi-level selection approach because it specifies that non-coding DNA can be present due to selection within the genome rather than exclusively on the organism (or gene, in Dawkins’s case) (see, e.g., Gregory 2004, 2005). (Incidentally, this idea of parasitic DNA dates back at least to 1945, when Gunnar Östergren characterized B chromosomes in this fashion). Of course, they tended to do what Ohno did and applied this one idea to all non-coding DNA, which is too ambitious. The modern view is more pluralistic (see, e.g., Pagel and Johnstone 1992 vs. Gregory 2003). Some non-coding DNA is just accumulated “junk” (in the definition of evidence-supported non-function that you espouse). Some (perhaps most) is “selfish” or “parasitic” and persists because there is selection within the genome as well as on organisms (in fact, an argument could be, and has been, made that “selfish DNA” would be a much more accurate term than “junk DNA” for most non-coding DNA). Some non-coding DNA is clearly functional at the organism level, including regulatory regions and chromosome structure components. Some of these latter functional non-coding DNA sequences are derived from elements that originally were of one of the first two types, most notably transposable elements that take on a regulatory function through co-option (or, in another manner of thinking, that undergo a shift in level of selection).

Junk DNA is not noncoding DNA and anyone who claims otherwise just doesn’t know what they’re talking about.

I’m afraid I don’t follow what you mean here. By your definition, “junk DNA” is any non-functional sequence of DNA, including pseudogenes (i.e., the original meaning). Those sequences do not encode proteins. Hence, your version of junk DNA is non-coding. I think this reflects the confusion that is imposed by the term “junk DNA”, which is why I generally think it is more obfuscating than enlightening.

________

References

Cavalier-Smith, T. 1977. Visualising jumping genes. Nature 270: 10-12.

Comings, D.E. 1972. The structure and function of chromatin. Advances in Human Genetics 3: 237-431.

Dawkins, R. 1976. The Selfish Gene. Oxford University Press, Oxford.

Doolittle, W.F. and C. Sapienza. 1980. Selfish genes, the phenotype paradigm and genome evolution. Nature 284: 601-603.

Gregory, T.R. 2003. Variation across amphibian species in the size of the nuclear genome supports a pluralistic, hierarchical approach to the C-value enigma. Biological Journal of the Linnean Society 79: 329-339.

Gregory, T.R. 2004. Macroevolution, hierarchy theory, and the C-value enigma. Paleobiology 30: 179-202.

Gregory, T.R. 2005. Macroevolution and the genome. In The Evolution of the Genome (ed. T.R. Gregory), pp. 679-729. Elsevier, San Diego.

Ohno, S. 1972. So much “junk” DNA in our genome. In Evolution of Genetic Systems (ed. H.H. Smith), pp. 366-370. Gordon and Breach, New York.

Orgel, L.E. and F.H.C. Crick. 1980. Selfish DNA: the ultimate parasite. Nature 284: 604-607.

Östergren, G. 1945. Parasitic nature of extra fragment chromosomes. Botaniska Notiser 2: 157-163.

Pagel, M. and R.A. Johnstone. 1992. Variation across species in the size of the nuclear genome supports the junk-DNA explanantion for the C-value paradox. Proceedings of the Royal Society of London, Series B: Biological Sciences 249: 119-124.


Junctional DNA.

JR Minkel at the Scientific American blog has responded to the post on Evolgen about his earlier story regarding “junk DNA” (did you catch all that?). At the end of the post, he asks:

Scientists and scientist bloggers: Again, do you care [if journalists call it junk DNA]? If so, what term would you propose instead, or how would you make the distinction between functional and nonfunctional noncoding DNA clear to a popular audience?

Yes, I care, and here are my suggestions. If you mean the general category without any speculation either way about function, then it is simply and accurately “noncoding DNA”. If it has a function, then you specify what that function is: “regulatory DNA” or “structural DNA” or what have you. If the type of sequence is known, then you can use that as well or instead: “transposable elements” or “mobile DNA” or “pseudogenes” or “introns”. Maybe readers won’t know what those terms mean. This is a good opportunity to inform them.

What is missing is a term to describe a given collection of noncoding DNA for which there is thought to be some function, but for which that function and/or the type of sequence is unknown. This would reside somewhere between “junk DNA” (in the vernacular sense) and “functional DNA” (to which specific names can be applied). I therefore suggest the neologism “junctional DNA” to encompass this category. Note that Petsko (2003) suggested “funk DNA” to represent “functionally unknown DNA”, but I think “junctional DNA” is a little less, uh, funky.

Let me be even more specific. The proposed term “junctional DNA” derives from a dual etymology: 1) a simple portmanteau of “junk” and “functional”; 2) an indication that the sequences so described reside at the crossroads between DNA with no evident function and that with a clear function.

Two terms in one day — “the onion test” and “junctional DNA” — how ’bout that.

Incidentally, my annoyance with such reports has less to do with the terminology than with the fact that the highly conserved sequences in question make up about 5% of the total genome. To jump from this to imply that all noncoding DNA is recognized as functional is inappropriate and misleading. I also wish they would cite the source papers they reference; some of us would like to look up the primary material when we see a summary in a news story.

_______________

Update: Other bloggers (RPM of Evolgen in personal correspondence, Sandwalk) seem to think this term is not needed. I point out that this post was given in direct response to Minkel’s appeal for a term that would “make the distinction between functional and nonfunctional noncoding DNA clear to a popular audience”. In light of the fact that a journalist sees the need for such a term, and that it was coined in response to that need, I think ‘junctional DNA’ could be a useful term.


The onion test.

I am not sure how official this is, but here is a term I would like to coin right here on my blog: “The onion test”.

The onion test is a simple reality check for anyone who thinks they have come up with a universal function for non-coding DNA1. Whatever your proposed function, ask yourself this question: Can I explain why an onion needs about five times more non-coding DNA for this function than a human?

The onion, Allium cepa, is a diploid (2n = 16) plant with a haploid genome size of about 17 pg. Human, Homo sapiens, is a diploid (2n = 46) animal with a haploid genome size of about 3.5 pg. This comparison is chosen more or less arbitrarily (there are far bigger genomes than onion, and far smaller ones than human), but it makes the problem of universal function for non-coding DNA clear2.

Further, if you think perhaps onions are somehow special, consider that members of the genus Allium range in genome size from 7 pg to 31.5 pg. So why can A. altyncolicum make do with one fifth as much regulation, structural maintenance, protection against mutagens, or [insert preferred universal function] as A. ursinum?

Left, A. altyncolicum (7 pg); centre, A. cepa (17 pg); right, A. ursinum (31.5 pg).


There you have it. The onion test. To be applied to any ambitious claims that a universal function has been found for non-coding DNA.

____________

1 I do not endorse the use of the term “junk DNA”, which I think has deviated far too much from its original meaning and is now little more than a loaded buzzword; the descriptive term “non-coding DNA” is what I use to refer to the majority of eukaryotic sequences (of various types) that do not encode protein products.

2 Some non-coding DNA certainly has a function at the organismal level, but this does not justify a huge leap from “this bit of non-coding DNA [usually less than 5% of the genome] is functional” to “ergo, all non-coding DNA is functional”.



Macaque genome published.

The April 13 issue of Science includes a collection of papers reporting and analyzing the sequence of the macaque (Macaca mulatta) genome. This marks the third primate genome to be sequenced (after human in 2001 and chimpanzee in 2005). Needless to say, comparisons of three genomes are far more informative than analyses involving only one or two sequences, and the papers contained in the special issue of Science already include some novel insights of evolutionary and medical significance that were previously unattainable. Carl Zimmer at The Loom provides a general summary of some key findings.

There is, rightly, a lot of interest in comparing genes among the three primate species. Non-coding DNA also gets a much-deserved amount of attention; in fact, this time we are fortunate enough to see an entire paper devoted to transposable elements. One general finding of interest relates to the number of transposable elements in the three genomes, which is remarkably similar (and quite high) in the three species. Here is the breakdown:


No wonder Ford Doolittle once remarked, probably only half-jokingly, that “our genomes … might be ironically viewed as vehicles for the replication of Alu sequences”. They do, after all, outnumber protein-coding genes by about 50 : 1.

The Genomes OnLine Database (GOLD) provides a list of other completed and forthcoming genome sequences. The macaque is only the latest in a rapidly growing list of genome projects that will continue to provide exciting new information about the evolution of genomes and the organisms carrying them.