Genome size is good for you.

I imagine that every practicing scientist has experienced, in one form or another, the tendency of many non-scientists to expect all research to be directly beneficial to human health and well-being. I used to respond facetiously to these kinds of expectations when expressed by friends or family members, with something along the lines of “My work has absolutely no practical applications to human welfare whatsoever”.

Of course, this is not true. Genome size is becoming very relevant to fields of inquiry that are likely to have major significance for medicine. Notably, genome size data provide an important indication of the cost and difficulty of sequencing a given genome, and thus represent a prime criterion in the choice of sequencing targets. As an example, I performed a genome size estimate for Biomphalaria glabrata, a planorbid snail that serves as an intermediate host for the trematode flatworm Schistosoma mansoni which causes the debilitating disease known as schistosomiasis. The genome of B. glabrata is one of the smallest so far reported for a gastropod, and is now being sequenced (along with S. mansoni).

More recently, Jenner and Wills (2007) made explicit mention of genome size as an important factor in deciding on the next set of models for evo-devo studies. Discoveries regarding the fundamental genetic underpinnings of development have obvious implications for medical science and here, too, genome size is becoming increasingly seen as important. As they put it,

Whole-genome sequences are an increasingly important resource for many biological disciplines, including evo–devo15, 49, 50. However, financial and technical constraints mean that there is currently a preference for species with small genomes. This compounds the bias that is already introduced by the big six. First, putatively general conclusions about genome evolution might actually be specific to those smaller genomes that have been fully sequenced. For example, when focusing only on sequenced genomes, a close correspondence between genome size and gene number in eukaryotes is observed. The C-value paradox becomes apparent only when genome-size data from non-sequenced genomes is included51. Second, there are important genetic, morphological, physiological and ecological correlates of genome size in a range of animals and plants51, 52. Some correlates seem ubiquitous in animals and plants, such as those between genome size and cell size, body size and the inverse of developmental rate52. Others are group specific: genome size correlates mostly with metabolic rate in homeotherms, but with developmental type and ecology in amphibians53, and is positively correlated with egg size in copepods, plethodontid salamanders and fishes51, 52, 54. Studying these correlated traits in phylogenetically disparate taxa could illuminate the relationships between small genome size and rapid development, as well as the evolution of strongly cell-lineage-dependent development in taxa such as tunicates and nematodes, and the partial fragmentation of their Hox clusters55, 56.

References 51, 52, and 53 in that paragraph are papers of mine, so again I am forced to admit that my work may have some practical application after all.

My main focus is on genome size diversity in eukaryotes, which mostly means differences among species in the abundance of noncoding DNA. In bacteria, most of the genome is composed of protein-coding genes, so unlike in eukaryotes there is a very strong correlation between genome size and gene number. Genome size is generally small in parasites and endosymbionts and larger in free-living species (probably because population bottlenecks and relaxed selection on gene function result in gene loss by deletion bias in bacteria associated with hosts [Mira et al. 2001]).

But this observation is not the link between genome size and human health that I had in mind for this post. In this month’s issue of Antimicrobial Agents and Chemotherapy, Steven Projan argues that genome size is associated with the evolution of antibiotic resistance in bacteria. In Dr. Projan’s own words,

It is observed here that the ability of a given bacterium to evolve toward a multidrug resistance phenotype is a function of genome size. In Table 1, a number of examples are provided, but even an expanded analysis shows that this observation holds true. That is, the larger the genome the greater the propensity of a bacterium to display multidrug resistance phenotypes and the smaller the genome the less likely it is that antibacterial resistance will emerge and disseminate within that species. What is proposed here is that, just as there is a continuum of genome sizes among bacteria, there is a continuum in the ability or propensity of a bacterium to become “multidrug resistant” and that continuum is reflected in the size of the genome. This is not to say that we do not observe resistance to certain agents even in organisms with the smallest genomes (macrolide resistance appears in virtually every pathogen at some level). There is probably a solid biological reason for this observation; organisms with larger genomes are more adaptable to environmental changes because they have more (genetic) information to draw upon. It appears that organisms with smaller genomes have become more “specialized,” residing in particular environmental niches (Treponema pallidum and the Chlamydiae are cases in point), and their lack of versatility in adapting to different environments is also manifest in an inability to develop mechanisms for coping with antibiotics. Indeed, we have learned that virtually each and every time a bacterium either acquires a novel resistance determinant or a mutant strain arises with decreased susceptibility to an antibacterial drug, the bacterium experiences a “fitness burden.” With time, compensatory mutations are selected in which the bacterium accumulates mutations that allow for something like wild-type growth in a strain that is now phenotypically resistant (e.g., topA mutations in gyrB mutant strains). Bacteria with larger genomes simply have a greater opportunity to develop these compensatory mutations. It must be emphasized that it does not matter whether we are discussing the acquisition of a novel resistance gene as opposed to a mutation that alters the target or results in up-regulation of an efflux pump. The accumulating evidence tells us that all require some form of adaptation. Another consequence of this phenomenon is that antibiotic cycling in health care settings is unlikely to result in a reversion of the local microflora to susceptibility as the compensatory mutations “lock in” the resistance phenotype.

He continues by noting, “I and several of those I have discussed this observation with were perplexed that it had not previously been articulated. Although to be fair, others have suggested it is a trivial, if not nonsensical, observation and worthy only of cocktail party conversation… in fact, I believe that this is an important guide as to where and which organisms we actually need novel antibacterial agents for.” Projan blames an overemphasis on individual organisms with small genomes for the overlooking of this potentially important pattern. In other words, it is the sort of thing that can only be applied to human health research if one takes a broad view of genomic diversity.

As much fun as it is to study genome size for purely academic reasons, it seems it actually may be good for us too.


More interest in genome size.

The buzz on a few blogs today is the pending release of new books. Sort of the academic blogger equivalent to summer blockbusters, I suppose. In any case, it’s great to see that two of the eagerly anticipated items, Darwinian Detectives by Norman Johnson and The Origins of Genome Architecture by Michael Lynch, will both include significant space devoted to the topic of genome size. Not having read either book, it would not be prudent for me to recommend them to anyone (and it is no secret that I have problems with Lynch’s model, which is not the first and probably not the last one-dimensional explanation), but I do suggest that eyes be kept open for their arrival in June.

On another practical note, genome size is no longer just considered an important criterion for choosing genome sequencing targets, it has also been mentioned as directly relevant in the selection of the next wave of evo-devo models. It is also an interesting and important subject of investigation in its own right, of course.

So, while I may not ascribe to some of the explanations for genome size diversity that have been put forth of late, I am very glad to see that this is an active area of discussion that is gaining more attention every day.

__________

References

Evans, J.D. and D. Gundersen-Rindal. 2003. Beenomes to Bombyx: future directions in applied insect genomics. Genome Biology 4: 107.101-107.104.

Gregory, T.R. 2005. Synergy between sequence and size in large-scale genomics. Nature Reviews Genetics 6: 699-708.

Jenner, R.A. and M.A. Wills. 2007. The choice of model organisms in evo-devo. Nature Reviews Genetics 8: 311-319.

Pryer, K.M., H. Schneider, E.A. Zimmer, and J.A. Banks. 2002. Deciding among green plants for whole genome studies. Trends in Plant Sciences 7: 550-554.


Units of measurement.

There sometimes is confusion surrounding the units employed in genome size publications. Genome sizes — the amount of DNA per copy of a genome — have traditionally been given in units of mass, namely picograms (1 pg = 10-12 g). More recently, people have been interested in knowing the number of base pairs per genome rather than genomic mass, which makes sense if one wishes to sequences those base pairs.

The fact is that essentially all genome size measurements represent relative estimates based on the density of stain or fluorescence of dye as compared to a standard (more on this in a later post), with the exception of truly complete genomic sequencing, which is very uncommon for eukaryotes. The units into which these relative data are converted is simply a matter of preference, and if one knows the mass of a given nucleotide then one can easily convert between picograms and base pairs. Indeed, Dolezel et al. (2003) did this calculation based on the following data:


The net result is that one easily can convert between picograms and base pairs as follows:

DNA content in bp = (0.978 x 109) x DNA content in pg

DNA content in pg = DNA content in bp / (0.978 x 109)

Yes, there will be a little bit of error involved if there are biases toward AT or GC in the genome. However, “by using the data in Table 1, relative weights of nucleotide pairs can be calculated as follows: AT = 615.3830 and GC = 616.3711, bearing in mind that one phosphodiester linkage involves a loss of one H2O molecule” (Dolezel et al. 2003). In other words, the difference is very slight and is negligible relative to the experimental error inherent in any genome size estimate.

To put it very simply, the units are interchangeable with

1 pg = 978 Mbp
.

————–

Dolezel, J., J. Bartos, H. Voglmayr, and J. Greilhuber. 2003. Nuclear DNA content and genome size of trout and human. Cytometry 51A: 127-128.


A nod to (and from) the Sandwalk.

Larry Moran recently gave the lab a nice nod on his Sandwalk blog. It seems fitting to have one of the first posts on this new blog be an act of reciprocity. So here, in photographic form, is a personal nod to Sandwalk from the Sandwalk.

I’ll probably post a more detailed discussion of the association between genome size, metabolic rate, and flight another day. In the meantime, you can check out the original paper by Chris Organ and colleagues in Nature and the very nice piece about it by Carl Zimmer in Science that prompted Larry’s post. There is also a discussion by Greg Laden which seems pretty reasonable overall.
_________________________________

Further reading:

Gregory, T.R. 2002. A bird’s-eye view of the C-value enigma: genome size, cell size, and metabolic rate in the class Aves. Evolution 56: 121-130.

Organ, C.L., A.M. Shedlock, A. Meade, M. Pagel, and S.V. Edwards. 2007. Origin of avian genome size and structure in non-avian dinosaurs. Nature 446: 180-184.

Zimmer, C. 2007. Jurassic genome. Science 315: 1358-1359.