"Because" versus "so that".

I want to make a quick point about how evolution works and how it does not. The reason is that two stories about non-coding DNA posted today include a major misconception about evolution. Unfortunately, this is a misconception attributed in the articles to biologists, so I can only imagine what the state of comprehension is among non-scientists.

The distinction is between “because” and “so that”. In evolution, things evolve “because,” meaning that there are causes and effects that can be identified. Why are some strains of bacteria resistant to antibiotics? Because a mutation that occurred that happened to be beneficial under the conditions of antibiotic treatment became common in the population over the course of several generations. By contrast, things do not evolve “so that”. Bacteria do not experience mutations so that they will become resistant to antibiotic agents.

Why is there so much non-coding DNA? Because transposable elements spread, or because there are accidental duplications that are not eliminated by selection, or because of the interaction of some other mutational processes and their consequences (or lack thereof). So much non-coding DNA did not evolve so that it might someday be useful, or so that it could be coopted when needed, or so that evolution would have more potential in the form of genetic raw materials.

So why, then, do we see quotes like these?

Wired One Scientist’s Junk Is a Creationist’s Treasure:

“I’ve stopped using the term [‘junk’],” Collins said. “Think about it the way you think about stuff you keep in your basement. Stuff you might need some time. Go down, rummage around, pull it out if you might need it.”

Reuters Human instruction book not so simple: studies:

“It is not the sort of clutter that you get rid of without consequences because you might need it. Evolution may need it,” [Collins] said.

That little extra padding might be just what an animal needs to adapt to some unforeseen circumstance, the researchers said. “They may become useful in the future,” Birney said.

The latter quote by Ewan Birney illustrates the problem that can arise when a detailed, nuanced discussion is summarized into a short soundbite. I know this from experience, and I suspect that this is what has happened here, given how his very reasonable interpretation is paraphrased in New Scientist ‘Junk’ DNA makes compulsive reading:

Birney says that the additional switches may be mutations that appear by accident and then generate new slugs of RNA, but because they are produced randomly, most are evolutionarily neutral ‘passengers’ in the genome. There might be rare occasions, however, when a new RNA does confer an advantage.

Collins, on the other hand, seems to have said his bit to two different reporters, so I strain to give him the benefit of the doubt on this one. When I began this blog, I did not think I would be pointing out obvious misconceptions about evolution, genomes, and DNA as propagated by the likes of Collins or Nature. But here we are.


Junk DNA gets Wired.

There is a new article on the Wired website about junk DNA [One Scientist’s Junk Is a Creationist’s Treasure]. I make a very brief appearance in it, and I just want to clarify what I meant by the statement cited (I’m still learning that even an hour-long interview might result in only a short blurb).

My quote is “Function at the organism level is something that requires evidence”. I make this statement because there are several different sorts of DNA sequences in the genome whose presence can be explained even if they do not benefit (and indeed, even if they slightly harm) the organism carrying them. Pseudogenes, satellite DNA, transposable elements (45% of our genome), and other non-coding sequences may or may not be functional — that requires evidence — and some may exist as a result of accidental duplication or even due to selection at the level of the elements themselves (by “intragenomic selection”). The old assumption that all non-coding DNA must be beneficial to the organism or it would have been deleted by now ignores genome-specific processes by which non-coding DNA evolves.

As I have discussed previously, both hardcore adaptationists (if any exist anymore) and creationists have a vested interest in having all non-coding DNA be functional. I believe that real-world variability in genome size argues strongly against such a prospect, but of course it is possible, and this is the point that people like Ohno, Doolittle, Orgel, and Crick made in the 1980s. The important point is that yes, some non-coding DNA is functional at the organism level (as opposed to existing for its own sake or because there is no strong selection against it). And certainly, non-coding DNA has effects at the organism level. But current evidence suggests that about 5% of the human genome is functional, and even the least conservative ENCODE participants (whose primary, and important, objective is to identify the functional elements and their features) are betting that 20% is functional.

In the end, it is obvious that non-coding DNA is the product of evolution whether it all turns out to be functional or not. The cases in which former parasites (transposons) have taken on function at the organism level are a perfect illustration of cooption, which is the same basic process that allows explanations for the evolution of complex structures like eyes or flagella. The research into function of non-coding DNA, which the creationists are eager to cite, can be carried out only under an evolutionary framework — it is meaningless to talk about “conserved non-coding DNA sequences” otherwise.

Finally, let me say one thing about Francis Collins’s quote: “Think about it the way you think about stuff you keep in your basement. Stuff you might need some time. Go down, rummage around, pull it out if you might need it.” With all due respect (which is considerable, given his contribution to the Human Genome Project), it makes no sense to explain the existence of non-coding DNA because it might someday prove useful. Evolution does not work that way. Elements might be coopted, but maintaining this option explains neither the origin nor the persistence of non-coding sequences.

As to what the creationists have to say, well, I leave that to others with more (or less?) patience to attend to.

____________

Updates:


Effect versus function.

There has been quite a bit of discussion in the media recently about discoveries of [indirect evidence for] functions in [small portions of] non-coding DNA. Unfortunately, the parts in square brackets are often omitted. It is also the case that many reports overlook the important distinction between effect and function, leaving readers with the impression that non-coding DNA can only be either totally insignificant or vitally important.

Here is the relevant part of the Merriam-Webster Dictionary entry on function:

“The action for which a person or thing is specially fitted, used, or responsible or for which a thing exists.”

And on effect:

“Something that is produced by an agent or cause; something that follows immediately from an antecedent; a resultant condition.”

In other words, a function fulfills a specific role to produce a positive result, with a close fit between cause and outcome shaped by either design (in human technology) or natural selection (in biological systems). Effects are also the outcome of identifiable causes, but they can be positive, neutral, or negative and may be generated directly or indirectly by the causal mechanism. Thus, it is not possible to have a function without any effects, but something can exert an effect — perhaps a very important one — without this constituting a function.

Consider an example. The immune system of the body has a clear function: to defend against pathogens. Viruses likewise have functions, but this only makes sense if one considers the issue from the perspective of the viruses themselves and not of their hosts. Specifically, parts of the virus function in allowing them to circumvent the host’s immunity and to usurp its replication machinery. Viruses do, however, have effects on hosts — usually negative, but apparently sometimes indirectly positive.

The genomes of eukaryotes consist of many types of DNA sequences. The exons that encode proteins make up a small percentage (less than 2% in humans), and the rest is non-coding DNA of various sorts: introns, pseudogenes, satellite DNA, and especially transposable elements (also called TEs, transposons, or mobile elements). The latter represent a diverse set of sequences that are capable of moving about and duplicating in the genome independently of the normal replication process. In this sense, they are often considered “parasites” of the “host” genome. Overall, TEs also make up the largest portion of non-coding DNA in the genomes analyzed so far (at least 45% in humans), although the particular types, abundances, and levels of activity of TEs vary among species.

Some TEs have evidently been co-opted (exapted) to perform functions at the host level, meaning that they have moved from being parasites to integrated participants in the functioning of the genome. This includes regulating genes, involvement in the genetic cutting-and-splicing mechanism of the vertebrate immune system, and perhaps cellular stress response. On the other hand, many diseases can result from mutations caused by the insertion of a TE into an existing gene. From the perspective of the host, TEs can have different effects depending on the context: some TEs are functional but some are detrimental. The large majority, however, have not been shown to fall into either category.

Nevertheless, a lack of evidence for either function or harm does not mean that TEs are without effects. It is well known that the total amount of DNA (genome size) is linked to cell size, cell division rate, metabolic rate, and developmental rate. In other words, a large genome is typically found in large, slowly dividing cells within an organism displaying a low metabolic rate and sluggish development. Conversely, organisms with high metabolic rate or rapid development tend to have small genomes. To the extent that total DNA content directly affects cell size and division, these can be considered effects — by their presence in the aggregate — of non-coding DNA elements.

Is slowing down metabolism or delaying development a function? Some authors think so, but most would argue that these are effects that are tolerated by the organism because they are not overly detrimental. That is, parasites spread within the genome and individually may have little or no effect (and no function), but in sum may have substantial effects on the cell and organism. The amount of accumulation would depend on the tolerance of the organism based on its biology. For example, it is unlikely that a mammal with a high metabolic rate could have a genome the size of a salamander’s.

The point of this discussion is to note that seeking functions for non-coding DNA is an interesting area of research, but that even if most sequences are not functional, they can still be important from a biological perspective. Similarly, one would not invoke function for hosts to explain the existence of viruses, nor would one dismiss viruses as unimportant if functions were never found at the host level. One would, however, focus considerable attention on explaining how viruses spread, why some are more virulent than others, and how they exert their effects.


Genomics, evolution, and health: comparisons of avian flu genomes.

An article by Steven Sternberg and colleagues is set to appear in the May issue of the journal Emerging Infectious Diseases. In it, the authors describe the results of complete genome sequence comparisons for 36 recent isolates of the avian flu virus (influenza H5N1). Their results “clearly depict the lineages now infecting wild and domestic birds in Europe and Africa and show the relationships among these isolates and other strains affecting both birds and humans”. More specifically,

The isolates fall into 3 distinct lineages, 1 of which contains all known non-Asian isolates. This new Euro-African lineage, which was the cause of several recent (2006) fatal human infections in Egypt and Iraq, has been introduced at least 3 times into the European-African region and has split into 3 distinct, independently evolving sublineages.


Figure 1. Phylogenetic tree of hemagglutinin (HA) segments from 36 avian influenza samples. A 2001 strain (A/duck/Anyang/AVL-1/2001) is used as an outgroup at top. Clade V1 comprises the 5 Vietnamese isolates at the bottom of the tree, and clade V2 comprises the 9 Vietnamese isolates near the top of the tree. The European-Middle Eastern-African (EMA) clade contains the remaining 22 isolates sequenced in this study; the 3 subclades are indicated by red, blue, and purple lines. The reassortant strain, A/chicken/Nigeria/1047–62/2006, is highlighted in red.

This is a study in phylogenetics — that is, it reconstructs evolutionary relationships among viral strains using the same tools that many evolutionary biologists use to study the relationships among species. It is well known that viruses evolve very rapidly, and tracking their their past changes contributes to the ability to predict future ones. As the authors conclude,

These findings show how whole-genome analysis of influenza (H5N1) viruses is instrumental to the better understanding of the evolution and epidemiology of this infection, which is now present in the 3 continents that contain most of the world’s population. This and related analyses, facilitated by global initiatives on sharing influenza data, will help us understand the dynamics of infection between wild and domesticated bird populations, which in turn should promote the development of control and prevention strategies.

Evolution is not something that only happened to the myriad fossil specimens housed in museum drawers, and evolutionary biology is not merely relevant to academics tucked away in research labs. Evolution is both an ongoing process and an active and exciting area of research. More than ever, an understanding of the processes involved is relevant to the well-being of people from all regions of the world.


Darwin’s death.

Today, April 19th, is the anniversary of Charles Darwin‘s death in 1882. I refer you to an excellent post by PZ Myers on Pharyngula about the details of Darwin’s passing [The Death of Darwin].

Darwin is buried at Westminster Abbey in London, within a few yards of Sir Isaac Newton. There is a bronze bust of Darwin as part of a memorial to several scholars near the grave that was installed by his family in 1888. The grave itself is very understated, a simple marble slab in the floor marking his name and the dates of his birth and death.


There is also a memorial to Darwin in Kent, where Down House is located, in the form of a sundial on the side of the local church.


Charles Robert Darwin, 12 February 1809 – 19 April 1882.


Something to ponder.

For those who fear that acknowledging the historical fact of evolution dooms one to a life of bleak insignificance, consider the following.

You are the product of an absolutely unbroken chain of successful ancestors stretching back nearly 4 billion years. In all that time, over billions of generations, not one member of your lineage ever failed to leave viable offspring who, in their turn, left yet more successful descendants. Not a single earthquake, volcanic eruption, meteorite impact, or glacier ever prevented one of your ancestors from contributing to the subsequent generation. Every one of your forebears prevailed in the face of predators, famines, parasites, diseases, and ill fortune. Whether in competition or cooperation, your antecedents triumphed. An untold number of beings have lived and died on this planet, but never — not a single time — did your line falter.

In this, you are not alone. The same is true of every living being on Earth, to whom you are connected directly through converging lines of common ancestry that date back to the very dawn of life. The world did not know you were coming and the machinations of nature did not have you in mind as an endproduct, yet here you are. As an individual, you and each of your brethren, cousins, and more distant evolutionary relatives represent an exceedingly, remarkably, staggeringly improbable occurrence — and are all the more wonderful for it.


Chimps are not more evolved than humans or anyone else.

I like New Scientist. I even did a short interview with them about a cool genomics story (“How chemicals can speed up evolution“, 6 May 2006, p.16). But this headline from their news service really annoys me: Chimps ‘more evolved’ than humans.

The short news article starts out with “It is time to stop thinking we are the pinnacle of evolutionary success…”, which of course is true except that it was time to stop thinking this 150 years ago, and then continues with “… chimpanzees are the more highly evolved species, according to new research”.

What they mean is that, based on the recent study, it appears that the rate of fixation by selection of mutations apparently has been higher in the lineage that has led to chimpanzees than in the lineage that has led to humans since they split from a common ancestor several million years ago. Which lineage experienced the changes can now be inferred by comparison with the macaque genome, which is less closely related to chimps and humans than the latter two are to each other; without such an external comparison, one can not say which lineage had changed, only that one or both of them had. Most likely, this boils down to differences in long-term historical population sizes in the two lineages (selection is stronger in large populations, genetic drift in small populations).

Couching this interesting finding in terms of who is “more evolved” than whom is not helpful, even with the scare quotes. As someone who teaches evolution at the upper-year undergraduate level, I can tell you that students come into the class with a lot of preconceptions about evolution, one of them being the notion that some extant species can be ranked as “more evolved” than others. It is subtle misinformation like this, compounded over many years, that makes my job harder by the time they arrive in my course.

Please, please, PLEASE stop appealing to common misconceptions about evolution in news stories, even if the headline will catch the attention of (previously misinformed) readers.

_________

Updates:


Genome sequences reduce the complexity of bacterial flagella.

I am not interested in engaging in debates with anti-evolutionists, though I am well aware of their key arguments. The big one, of course, is “irreducible complexity” — traits or features that supposedly could not have evolved because there is no conceivable function for their parts individually nor for a subset of their parts collectively. The bacterial flagellum apparently is the ultimate example of this, which explains why this microscopic protein “motor” can drive an entire philosophical argument along these lines.

I think Darwin said it best (as he often did) in 1871: “Ignorance more frequently begets confidence than does knowledge; it is those who know little, and not those who know much, who so positively assert that this or that problem will never be solved by science.”

There is little concern among biologists that the evolution of bacterial flagella will be worked out, just as a tremendous amount of information is now available about the evolution of eyes (the previous Paleyan example of a supposedly un-evolvable structure).

Last year, Pallen and Matzke (2006) presented a discussion of how bacterial flagella may have evolved, based in large part on comparisons of sequences from the various protein components. Many of the proteins that make up a flagellum have homologues that serve non-flagellar functions, strongly suggesting that they were co-opted from pre-existing proteins during the evolution of flagella. (See Matzke’s detailed model of flagellar evolution here and a video based on it here, and Ken Miller talking about flagella here). Specifically, there is ever-mounting evidence that bacterial flagella and the type III secretory system (TTSS) that toxic bacteria use to inject their prey are descended from the same ancestral structure. The fact that the TTSS lacks many of the proteins in flagella but remains functional (for toxin injection rather than locomotion) clearly indicates that not all the parts need to be present for some function to be carried out by the structure.

Pallen and Matzke (2006) noted that further comparisons of complete genome sequences (hence the post on this blog) would reveal additional insights into the evolution of flagella. Enter Liu and Ochman (2007) from the next issue of PNAS.

Liu and Ochman (2007) examined complete genome sequences from 41 species of bacteria with flagella, and were able to identify a core set of 24 proteins common to all of them, which was present in a very early ancestral bacterium. Not only this, but the core genes appear to be the product of multiple rounds of duplication and diversification, perhaps of one original precursor gene.

The gist of the story is that 1) some genes involved in the construction of flagella in modern bacteria are clearly co-opted from pre-existing genes that were doing something else in the cell (Pallen and Matzke 2006) and 2) a core of about two dozen genes common to all flagellated bacteria (and presumably found in their common ancestor) is the product of duplication and divergence whose reconstructed history agrees very well with the presumed evolutionary relationships among bacteria (Liu and Ochman 2007).

This just goes to show the usefulness of genome data for addressing questions that, for the reason outlined by Darwin, seem unanswerable to some. It also opens the door to some exciting future work.

I asked Howard Ochman what he thought the next key steps will be in this line of study. As he put it, “Naturally we would like to know the function of the structures that were specified by the ancestral set of flagellar genes, and how/why these genes remained functional through their successive duplications. We just completed a companion paper on the bacterial flagellar genes that arose later, and we are now branching out in into the other domains of life.”

I will positively assert, out of optimism rather than ignorance, that many more important insights will be forthcoming from these investigations.

(Update: Nick Matzke is very critical of the paper. He also has posted an updated critique that focuses more on the data.)

(Another update: See Carl Zimmer’s post about blogging as scientific debate).

(And yet another update: A complex tail, simply told at ScienceNOW)

_________

References

Aizawa, S.-I. 2001. Bacterial flagella and type III secretion systems. FEMS Microbiology Letters 202: 157-164.

Blocker, A., K. Komoriya, and S.-I. Aizawa. 2003. Type III secretion systems and bacterial flagella: insights into their function from structural similarities. Proceedings of the National Academy of Sciences of the USA 100: 3027-3030.

Gophna, U. , E.Z. Ron, and D. Graur. 2003. Bacterial type III secretion systems are ancient and evolved by multiple horizontal-transfer events. Gene 312: 151–163.

Liu, R. and H. Ochman. 2007. Stepwise formation of the bacterial flagellar system. Proceedings of the National Academy of Sciences of the USA 104: 7116-7121.

Matzke, N.J. 2003. Evolution in (Brownian) space: a model for the origin of the bacterial flagellum. Talk.Origins.

Miller, K.R. 2004. The flagellum unspun. In Debating Design: From Darwin to DNA, edited by W. Dembski and M. Ruse. Cambridge University Press, New York, pp. 81-97.
(available online here)

Musgrave, Ian. 2004. Evolution of the bacterial flagellum. In Why Intelligent Design Fails: A Scientific Critique of the New Creationism, edited by M. Young and T. Edis. Rutgers University Press, New Brunswick, NJ.
(available online here)

Nguyen, L., I.T. Paulsen, J. Tchieu, C.J. Hueck, and M.H. Saier. 2000. Phylogenetic analyses of the constituents of Type III protein secretion systems. Journal of Molecular Microbiology and Biotechnology 2: 125–144.

Pallen, M.J., C.W. Penn, and R.R. Chaudhuri. 2005. Bacterial flagellar diversity in the post-genomic era. Trends in Microbiology 13: 143-149.

Pallen, M. J., S.A. Beatson, and C.M. Bailey. 2005. Bioinformatics, genomics and evolution of non-flagellar type-III secretion systems: a Darwinian perspective. FEMS Microbiology Reviews 29: 201–229.

Pallen, M.J. and N.J. Matzke. 2006. From The Origin of Species to the origin of bacterial flagella. Nature Reviews Microbiology 4: 784-790.


Yet another link between dinosaurs and birds.

Score another point for Darwin’s bulldog. T.H. Huxley, who argued that birds are descended from dinosaurs nearly 140 years ago (Huxley 1868), would undoubtedly be pleased with himself were he privy to the wealth of data that has accrued in support of his conjecture. Evidence from the rather good fossil records of dinosaurs and birds strongly supports this notion, and now two additional lines of evidence can be added in its favour.

The first relates to my own area of study, that of genome size diversity. In a recent paper in Nature, Organ et al. used the link between genome size and cell size of extant taxa (in this case, osteocyte spaces) to infer genome sizes for extinct dinosaurs from various lineages (see Zimmer 2007 f0r a helpful overview). I have promised to post a detailed discussion of the genome size-flight issue another time, so I will briefly note that the main finding is that saurischian dinosaurs, the lineage from within which birds are thought to be descended, had small genomes relative to other reptiles, including ornithischian dinosaurs. This suggests that genome size reduction began (but probably did not culminate) prior to the origin of avian flight and provides yet another intriguing link between saurischian dinosaurs and birds.

The second was reported in papers published by Asara et al. and Schweitzer et al. in the April 13 issue of Science, and revealed intriguing similarities in the amino acid sequences of collagen proteins from a 68 million year old bone from Tyrannosaurus rex and that of a modern chicken. Specifically, amino acid sequence identity was closer to chicken (and presumably to all other modern birds) (58%) than to frog or newt (51%). Note that collagen sequence is usually quite conserved and that what the authors were dealing with were fragments of proteins. Note also that some other interesting comparisons — especially with other living archosaurs, namely alligators or crocodiles — were not possible based on the currently available data. Plenty more to do as follow up to this study, but a very interesting result. A summary of the studies can be found in National Geographic and the New York Times.

This is a nice example of the way in which independent sources of information converge on a common conclusion in evolutionary biology, and how new discoveries simultaneously can raise novel questions and provide innovative means by which to approach them.

———–

References

Asara, J.M., M.H. Schweitzer, L.M. Freimark, M. Phillips, and L.C. Cantley. 2007. Protein sequences from mastodon and Tyrannosaurus rex revealed by mass spectrometry. Science 316: 280-285.

Huxley, T.H. 1868. On the animals which are most nearly intermediate between birds and the reptiles. Annals and Magazine of Natural History, Series 4, 2: 66-75.

Organ, C.L., A.M. Shedlock, A. Meade, M. Pagel, and S.V. Edwards. 2007. Origin of avian genome size and structure in non-avian dinosaurs. Nature 446: 180-184.

Schweitzer, M.H., Z. Suo, R. Avci, J.M. Asara, M.A. Allen, F.T. Arce, and J.R. Horner. 2007. Analyses of soft tissue from Tyrannosaurus rex suggest the presence of protein. Science 316: 277-280.

Zimmer, C. 2007. Jurassic genome. Science 315: 1358-1359.