Genome sequences reduce the complexity of bacterial flagella.

I am not interested in engaging in debates with anti-evolutionists, though I am well aware of their key arguments. The big one, of course, is “irreducible complexity” — traits or features that supposedly could not have evolved because there is no conceivable function for their parts individually nor for a subset of their parts collectively. The bacterial flagellum apparently is the ultimate example of this, which explains why this microscopic protein “motor” can drive an entire philosophical argument along these lines.

I think Darwin said it best (as he often did) in 1871: “Ignorance more frequently begets confidence than does knowledge; it is those who know little, and not those who know much, who so positively assert that this or that problem will never be solved by science.”

There is little concern among biologists that the evolution of bacterial flagella will be worked out, just as a tremendous amount of information is now available about the evolution of eyes (the previous Paleyan example of a supposedly un-evolvable structure).

Last year, Pallen and Matzke (2006) presented a discussion of how bacterial flagella may have evolved, based in large part on comparisons of sequences from the various protein components. Many of the proteins that make up a flagellum have homologues that serve non-flagellar functions, strongly suggesting that they were co-opted from pre-existing proteins during the evolution of flagella. (See Matzke’s detailed model of flagellar evolution here and a video based on it here, and Ken Miller talking about flagella here). Specifically, there is ever-mounting evidence that bacterial flagella and the type III secretory system (TTSS) that toxic bacteria use to inject their prey are descended from the same ancestral structure. The fact that the TTSS lacks many of the proteins in flagella but remains functional (for toxin injection rather than locomotion) clearly indicates that not all the parts need to be present for some function to be carried out by the structure.

Pallen and Matzke (2006) noted that further comparisons of complete genome sequences (hence the post on this blog) would reveal additional insights into the evolution of flagella. Enter Liu and Ochman (2007) from the next issue of PNAS.

Liu and Ochman (2007) examined complete genome sequences from 41 species of bacteria with flagella, and were able to identify a core set of 24 proteins common to all of them, which was present in a very early ancestral bacterium. Not only this, but the core genes appear to be the product of multiple rounds of duplication and diversification, perhaps of one original precursor gene.

The gist of the story is that 1) some genes involved in the construction of flagella in modern bacteria are clearly co-opted from pre-existing genes that were doing something else in the cell (Pallen and Matzke 2006) and 2) a core of about two dozen genes common to all flagellated bacteria (and presumably found in their common ancestor) is the product of duplication and divergence whose reconstructed history agrees very well with the presumed evolutionary relationships among bacteria (Liu and Ochman 2007).

This just goes to show the usefulness of genome data for addressing questions that, for the reason outlined by Darwin, seem unanswerable to some. It also opens the door to some exciting future work.

I asked Howard Ochman what he thought the next key steps will be in this line of study. As he put it, “Naturally we would like to know the function of the structures that were specified by the ancestral set of flagellar genes, and how/why these genes remained functional through their successive duplications. We just completed a companion paper on the bacterial flagellar genes that arose later, and we are now branching out in into the other domains of life.”

I will positively assert, out of optimism rather than ignorance, that many more important insights will be forthcoming from these investigations.

(Update: Nick Matzke is very critical of the paper. He also has posted an updated critique that focuses more on the data.)

(Another update: See Carl Zimmer’s post about blogging as scientific debate).

(And yet another update: A complex tail, simply told at ScienceNOW)

_________

References

Aizawa, S.-I. 2001. Bacterial flagella and type III secretion systems. FEMS Microbiology Letters 202: 157-164.

Blocker, A., K. Komoriya, and S.-I. Aizawa. 2003. Type III secretion systems and bacterial flagella: insights into their function from structural similarities. Proceedings of the National Academy of Sciences of the USA 100: 3027-3030.

Gophna, U. , E.Z. Ron, and D. Graur. 2003. Bacterial type III secretion systems are ancient and evolved by multiple horizontal-transfer events. Gene 312: 151–163.

Liu, R. and H. Ochman. 2007. Stepwise formation of the bacterial flagellar system. Proceedings of the National Academy of Sciences of the USA 104: 7116-7121.

Matzke, N.J. 2003. Evolution in (Brownian) space: a model for the origin of the bacterial flagellum. Talk.Origins.

Miller, K.R. 2004. The flagellum unspun. In Debating Design: From Darwin to DNA, edited by W. Dembski and M. Ruse. Cambridge University Press, New York, pp. 81-97.
(available online here)

Musgrave, Ian. 2004. Evolution of the bacterial flagellum. In Why Intelligent Design Fails: A Scientific Critique of the New Creationism, edited by M. Young and T. Edis. Rutgers University Press, New Brunswick, NJ.
(available online here)

Nguyen, L., I.T. Paulsen, J. Tchieu, C.J. Hueck, and M.H. Saier. 2000. Phylogenetic analyses of the constituents of Type III protein secretion systems. Journal of Molecular Microbiology and Biotechnology 2: 125–144.

Pallen, M.J., C.W. Penn, and R.R. Chaudhuri. 2005. Bacterial flagellar diversity in the post-genomic era. Trends in Microbiology 13: 143-149.

Pallen, M. J., S.A. Beatson, and C.M. Bailey. 2005. Bioinformatics, genomics and evolution of non-flagellar type-III secretion systems: a Darwinian perspective. FEMS Microbiology Reviews 29: 201–229.

Pallen, M.J. and N.J. Matzke. 2006. From The Origin of Species to the origin of bacterial flagella. Nature Reviews Microbiology 4: 784-790.


Whose genome?

The term “genome” is oft-heard but seldom defined, and indeed has more than one meaning. Little wonder, then, that discussions about genome sequences and comparisons thereof can leave otherwise interested audiences more frustrated than enlightened. “What is a genome?” and “whose genome was sequenced?” are legitimate questions, and what follows is an attempt at clarification that is, by necessity, as much philosophical as scientific.

Definition #1: In a broad sense, a genome can be considered as the collective set of genes, non-coding DNA sequences, and all their variants that are located within the chromosomes of members of a given species. This definition does not consider variation among individuals within a species, and instead relates to distinctions between species. It is possible to apply such a definition because, for the most part, animal species do not share DNA extensively and hence their respective gene pools remain distinct (in fact, this forms the basis for defining species under some views). Thus, even though humans and chimpanzees are about 98% identical in terms of their DNA sequences, there is still such as thing as a “human genome” and a “chimpanzee genome” rather than a continuum with humans and chimps at two mildly divergent extremes. This is even true of far closer (but now extinct) relatives of humans such as Neanderthals; on average, the sections of Neanderthal DNA that have been recovered and sequenced are 99.5% identical to that of humans — but these, too, are considered to be part of a separate genome.

The genomic similarities described between species are usually based on comparing a few specific regions of DNA from a small number of representative individuals. If other factors are included in the comparison, such as insertions and deletions of DNA, then any two genomes will register a lower level of similarity — say, 95% for chimpanzees and humans rather than 98%. And indeed, no one would ever mistake a chimpanzee genome for a human genome, in part because they differ in DNA amount and chromosome number (human chromosome 2 is a product of fusion of what remain as two separate chromosomes in other great apes).

Of course, individuals within species are not genetically identical to one another (monozygotic twins notwithstanding), which leads to definition #2.

Definition #2: Because the DNA sequences of even close family members are not identical, it can also be said that each individual carries a unique genome consisting of the DNA in his or her chromosomes. In this case, the focus is entirely on one species and the important factor is the variability that exists among individuals of that species. In terms of DNA sequences on a large scale, members of the same species are extremely similar: overall, any two human beings are probably about 99.9% the same genetically. Nevertheless, complete genome sequencing, though conducted primarily under definition #1, has revealed two major sources of variation among individuals. The first are known as single nucleotide polymorphisms (SNPs, “snips”) and are as their name implies: differences at the level of single base pairs that are present in at least 1% of the population. It is estimated that there are some 3 million SNPs in the human genome (definition #1), with one occurring about every 100-300 base pairs along the more than 3 billion base pair sequence. The second major source of variation, first described in 2006, are known as copy number variants (CNVs). These involve differences among individuals in the insertion and deletion of larger DNA segments. CNVs have proved to be far more common than anyone would have imagined, and can result in differences not just in sequences but in the sizes of genomes among individuals (up to 20 million base pairs in humans, or about 0.5%).

The human genome, definition #1

Two independent research groups reported draft sequences of the human genome in February 2001: the publicly-funded and internationally collaborative Human Genome Project and the private company headed by J. Craig Venter known as Celera Genomics. The interaction of these two initiatives – typically branded as competitive, but also mutually informative – has been discussed many times. The question of how and why the two groups sequenced human DNA is not the subject of interest here – the question at present is whose DNA they analyzed.

The Human Genome Project, being a public effort, had an official policy of releasing all sequence data to public databases within 24 hours of completion, thereby making the information freely available to anyone who carried a copy of the “human genome” in their cells. In keeping with this outlook, the HGP implemented procedures intended to circumvent the focus on individuality and to keep their results in line with definition #1. Thus, they instituted a policy of voluntary donation by dozens of men and women from various ethnic backgrounds, provided samples with random numeric labels, shipped the samples to processing laboratories where they were re-labeled with new randomized codes, destroyed all records of previous labels, and then selected randomly from among the samples. Five to ten samples were collected for every one that was actually assayed, with the source of samples used unknown to both researchers and donors. In other words, the intent was to focus on definition #1 as much as possible and to provide a mixture, or at least a mystery, when it came to the source of the genome sequence of Homo sapiens.

Human nature being what it is, it is likely that most people would find this answer disappointing. Deep down, we want to know whose genome it was. The only information that has been available in this regard is that the largest portion of the source DNA came from a male donor in Buffalo, New York, code named “RPCI-11” (for Roswell Park Cancer Institute, where the genomic library was generated). No name, no other information, and yet somehow it seems satisfying to know that there really is an individual human – a real person in a specific part of the world who walked into a lab, stuck out his arm, and donated his blood – corresponding to all those A’s, T’s, G’s, and C’s.

The situation at Celera was quite different in terms of both data sharing and DNA sampling policies. Celera’s data were not made publicly available during the course of sequencing, and their sampling involved 20 donors, five of which were selected for analysis — though evidently not entirely at random. In fact, it was later revealed that Celera’s president and lead investigator, J. Craig Venter, was the primary source of DNA for the sequence. Venter argued that revealing this fact would dispel the myth of a single “human genome” (i.e., an excessive emphasis on definition #1 that ignores the individual uniqueness inherent in definition #2). Others may have felt that sequencing his own genome made the resulting sequence the property of one individual rather than of humanity at large (i.e., adopting definition #2 exclusively at the expense of the broadly shared definition #1).

The human genome, definition #2

While the Human Genome Project and Celera’s efforts generated single (partially composite) genome sequences, another major initiative is underway which focuses on variation among individuals at the genomic level; i.e., on definition #2. The International HapMap Project aims to identify associated collections of SNPs known as haplotypes, and currently includes samples from 270 people drawn from four major groups. Thirty sets of “trios” (two parents and a child) have come from the Yoruba people of Ibadan, Nigeria. Forty-five unrelated individuals from Tokyo and 45 from Beijing have provided samples. Thirty trios from residents of the United States with roots in western and northern Europe have also been included. SNP haplotypes may vary among populations and are important in the search for particular genes of medical significance. As with the DNA sequence information of the Human Genome Project, data from the HapMap Project are made freely available. A similar initiative to catalogue human diversity from the perspective of CNVs, the Copy Number Variation Project, has also been launched.

“Whose genome?” and individual identity

The question “whose genome was sequenced?” is predicated on concepts of individuality and personhood, which tend to be applied to the members of only a handful of species. Thus, one might be interested in which strain of fruit fly (Drosophila melanogaster), which population of sea urchins (Strongylocentrotus purpuratus), or which varieties of rice (Oryza sativa) had been sequenced, but it would not make sense to ask “who” the fly, sea urchin, or rice plant was. The situation gains complexity when dealing with vertebrate genomes because humans associate closely and emotionally with members of some species and not with others. The desire (or not) to know “who” was sequenced correlates directly with this. By way of example, consider the fact that a single male pufferfish (Takifugu rubripes), a single female chicken (red jungle fowl, Gallus gallus), two female brown rats (Rattus norvegicus), and a small number of female mice (Mus musculus) of the B6 strain have been sequenced, but that there has not been much interest in “who” these individuals were – nor would many people even think to ask the question.

Now consider “man’s best friend”. Not only is it known that the two reported canine genome sequences were from individual dogs, but it is known who those dogs were: Craig Venter’s poodle, Shadow, and a boxer named Tasha. It was also widely noted that samples for the chimpanzee genome were taken from a captive-born male named Clint who lived at the Yerkes National Primate Research Center in Atlanta, Georgia. Indeed, many a news story reported Clint’s untimely death in 2005 at the young age of 24. One might be tempted to argue that intelligence is the determining factor in this case – dogs and chimps are smart and have personalities, but pufferfish and rats do not. Perhaps. But surely the recently sequenced rhesus macaque (whatever her name was) should qualify under these criteria.

In the end, this post is not meant to be a statement about the apparent arbitrariness of our decisions to grant or deny individuality to members of other species. This is about genomes, and how definition #1 is applied intuitively and automatically when dealing with a species like mouse or rat, but that one cannot help but invoke definition #2 when dealing with a dog or human. The fact is that all of these species are composed of variable individuals, each with a unique genome under definition #2. Indeed, it is this variation that makes evolutionary divergence – and thus definition #1 – possible at all.


Macaque genome published.

The April 13 issue of Science includes a collection of papers reporting and analyzing the sequence of the macaque (Macaca mulatta) genome. This marks the third primate genome to be sequenced (after human in 2001 and chimpanzee in 2005). Needless to say, comparisons of three genomes are far more informative than analyses involving only one or two sequences, and the papers contained in the special issue of Science already include some novel insights of evolutionary and medical significance that were previously unattainable. Carl Zimmer at The Loom provides a general summary of some key findings.

There is, rightly, a lot of interest in comparing genes among the three primate species. Non-coding DNA also gets a much-deserved amount of attention; in fact, this time we are fortunate enough to see an entire paper devoted to transposable elements. One general finding of interest relates to the number of transposable elements in the three genomes, which is remarkably similar (and quite high) in the three species. Here is the breakdown:


No wonder Ford Doolittle once remarked, probably only half-jokingly, that “our genomes … might be ironically viewed as vehicles for the replication of Alu sequences”. They do, after all, outnumber protein-coding genes by about 50 : 1.

The Genomes OnLine Database (GOLD) provides a list of other completed and forthcoming genome sequences. The macaque is only the latest in a rapidly growing list of genome projects that will continue to provide exciting new information about the evolution of genomes and the organisms carrying them.