Reflecting on my first experience with research.

I often tell undergraduates about the importance of conducting research projects in their senior year if they intend to pursue graduate studies in science. There are two main reasons for this. The first is that the labs that they have experienced up to that point in undergraduate courses, though useful for introducing specific concepts, are a poor reflection of what real science is like. As such, it is important for them to experience original lab work, rather than simply following a pre-defined, “cook book” protocol with an expected result. Novel studies have no pre-defined sequence, no a priori expectation of the outcome, and in many cases no established methods in place for generating data. The second is that there are some important lessons that they need to learn about research, perhaps most importantly that whatever they try to do in the lab will not work the first time. The sooner they hit that wall — and get around it — the better. Nature does not give up her secrets easily, I sometimes say.

My first experience with original research came during my fourth year at McMaster University (Hamilton, Canada), though the story actually begins the year before. In the late 1990s, McMaster offered a course on “environmental physiology” which explored the various ways that animals had become adapted to different extreme environments and lifestyles. For example, how insects can survive in deserts or how deep diving mammals conserve oxygen. I was very interested in organism biology and wanted to take this course, but it was a senior course and was only offered in alternate years. This meant taking it in my 3rd year without the prerequisites, which the instructors agreed to let me (and my roommate) do, as long as we took the prerequisities after the fact.

There were fewer than 10 people in the class (several of whom were graduate students) as it had a reputation for being very intense, with long labs on Friday evenings. Because of the small class size, we came to know the professors rather well, and I naturally asked them to supervise my undergraduate thesis project the following year. Specifically, I worked with Dr. Chris Wood, who is a very well regarded fish physiologist. Most of the work in his lab at the time had to so with metal toxicity, waste excretion, and so on, but one area struck me as particularly interesting — it turned out that an earlier study had suggested that fish who grew the most rapidly did not swim well. Being interested in evolution and organism biology, the notion of a trade-off between growth and swimming seemed like a very interesting issue to explore. I proposed a project that would look at this possible trade-off between growth and swimming, but also would include a component of feeding competition and social hierarchy (Who gets the most food? Who grows fastest? Do dominant fish swim worse than subordinate fish?).

There were, of course, numerous obstacles to overcome. How could I identify individual fish? How could I measure dominance rank and feeding? How should swimming performance be assessed? What size of fish should be used? And so on. After much trial and error, I settled on a system for identifying fish using a coded system of ink dots on the skin which were made using an injector that was once used to inject anaesthetic into the gums of dental patients. Lesson 1: Be prepared to be creative in terms of what counts as a scientific apparatus. (Dr. Wood once published a study conducted in Kenya which described the fish as being kept in “amber Tusker chambers” — Tusker being the local beer).

Figure from my undergraduate thesis indicating the identification
method I developed. Pretty decent for Windows Paint, if I may say so myself.

Growth rate was relatively straightforward in principle: weigh and measure the lengths of the fish at regular intervals. Fortunately, someone else in the lab had constructed a “fish measuring tube”, which was a transparent plastic half-cylinder with a ruler under it, on a slant with drain holes at the bottom. Worked a treat.

Some dude weighing a fish.

Swimming performance was assessed using a huge swim tunnel that had been built years before, but which was modified so I could reach in and get the fish to keep swimming once they had become stuck against the grating at the end of the swim section. I measured both maximum sustainable velocity (called Ucrit) as well as burst swimming.

 Diagram from my thesis showing the swim tunnel apparatus.

The actual swim tunnel, which we knew as “Big Bertha”.

The quantification of feeding rates of individual fish (and by extension, their dominance rank) probably represented the most unusual component of the study. These were assessed by feeding the fish opaque glass beads mixed into the food and x-raying them so the beads in their stomachs could be counted. McMaster’s biology department is adjacent to a hospital, so I was able to secure the use of a small, portable x-ray machine that I could cart over to the lab on weekends. To make bead-laden food, I ground up fish pellets and mixed them with beads in a spaghetti maker, extruded the mush into strands and cut them into small pieces. I had to prepare all the food this way as a control, but I only gave them food with beads on days when they were to be x-rayed. To develop the x-ray images, I would go into the dark room in the radiology depatment of the hospital. This involved feeding the exposed films through an automatic developing machine which dropped the developed images into a slot outside the dark room. More than once I came out to find radiologists glancing curiously at my pictures of fish.

A typical x-ray film showing the difference in feeding among individual fish.

Close up of one fish showing the beads (white spots) in its stomach.

My first attempt at the study did not go well. The fish I had chosen were too small, and some even escaped from the swim chamber. I had chosen too high a ration to feed the fish, and the competition among individuals was negligible. I even had a flood in one tank and came in to find several fish flopping around on the floor before I returned them to their tank. Lesson 2: Be prepared for even the most carefully planned experiment to be a bust at first. As a matter of fact, I decided to start again after the holidays with larger fish and different rations. Given the short time involved (two semesters, one of which had then passed), it took some convincing for Dr. Wood to let me start from scratch. However, I knew I would not be confident in the data from the first run and I was fairly certain I had worked out the bugs the first time through.

In the end, we discovered that the fish who eat the most and grow the fastest do indeed show poorer sustained swimming performance, but only in the case where the ration is limited. Faster growing fish have better burst swimming abilities, by contrast. This supports the idea that there may be trade-offs between rapid growth and some types of swimming ability, at least when food is limited, meaning that there may be limits on the benefits of growing more quickly than other individuals.

I worked in the Wood Lab during the summer after this study and did two follow-up experiments. We also polished up my thesis and submitted it to a journal — I still remember how excited I was to have my first paper in review. Some major revisions later, the paper was accepted for publication. Lesson 3: Be prepared for peer reviewers to pick apart every detail of your work. (Not to worry, as I have since reviewed several fish physiology and feeding behaviour papers in turn).

I finished these studies more than a decade ago, and I now work in a very different area of biology. Nevertheless, the lessons that I learned during my first experience with original research are as significant as ever now that I advise students. If I have done my job well in this role, you may read a similar account from one of my students 10 years hence.

________

Gregory, T.R. and C.M. Wood. 1998. Individual variation and interrelationships between swimming performance, growth rate, and feeding in juvenile rainbow trout (Oncorhynchus mykiss). Canadian Journal of Fisheries and Aquatic Sciences 55: 1583-1590.

Gregory, T.R. and C.M. Wood. 1999. Interactions between individual feeding behaviour, growth, and swimming performance in juvenile rainbow trout (Oncorhynchus mykiss) fed different rations. Canadian Journal of Fisheries and Aquatic Sciences 56: 479-486.

Gregory, T.R. and C.M. Wood. 1999. The effects of chronic plasma cortisol elevation on the feeding behaviour, growth, competitive ability, and swimming performance of juvenile rainbow trout. Physiological and Biochemical Zoology 72: 286-295.

What would I do with more research support? Part Two: "Targeted exploration".

In the first post in this series, I introduced the background topic of my research focus, namely the evolution and impacts of genome size diversity in animals. Before moving on to the specific projects that I would most like to do in the near term if I had the funds, I want to discuss the basic philosophical approach that much of my lab’s work follows.

As I noted recently, there is a strong tendency among many biologists to assume that only “hypothesis-driven” science is valid and informative. I disagree with this position very strongly, as I think it causes people to focus on narrow questions and runs a real risk of making most science little more than an exercise in confirming and refining what we already know. Moreover, it is only feasible to structure one’s research in the simple, falsificationist hypothesis-testing format if there is extensive background knowledge available. When working in a new area where little is known, this is not possible.

Does this mean we should be allowed to just stumble around without really testing any ideas? Of course it doesn’t. The alternative is to step back from individual hypotheses and to carry out what I call “targeted exploration”. This means that we do not feel it necessary to formulate our research in the simplistic “Ho, H1” format with a “yes/no” result structure. Instead, we take what information is available and try to identify patterns. If no information is available at all for some area, then we might explore it with the specific purpose of looking for patterns. Once a possible pattern is identified, we determine ways of testing how broadly it holds and what might be causing it. This involves more exploration, but specifically in areas that are intended to provide the necessary data to test the broad pattern. If the pattern holds, then we can formulate even more specific ideas about causation, leading eventually to the testing of particular hypotheses.

Some important points should be noted. First, targeted exploration does not conflict with focused hypothesis testing. Rather, it ultimately feeds into hypothesis-driven research, but is particularly important because it takes us into new territory rather than working within existing areas. Second, it is not done blind. There is a specific reason to target particular areas. Third, as it does not have a simple refuted/supported result but rather can be set up to reveal many different things, the results can be very informative either way. Finally, because it is based on large-scale sampling, exploration of this type has the beneficial side effect of closing some major gaps in our basic knowledge.

Let me give you an example of how this works.

Insects are by far the most diverse group of animals, at least in terms of described species. However, they have traditionally been poorly covered in animal genome size studies. When I was a graduate student, I compiled the Animal Genome Size Database, which made it possible to look across all the data that were available and see what patterns emerged. Based on work in amphibians, it was apparent that species with complex developmental programs including metamorphosis had smaller genomes than species without metamorphosis. I wondered if something similar might apply to insects, given that there are orders with complete metamorphosis (holometabolous development) and orders with incomplete metamorphosis (hemimetabolous development).

That is step 1: ask a question and look for a pattern. The data for insects were very limited, but it did seem as though insects with complete metamorphosis possess smaller genomes than those lacking complete metamorphosis, making this similar to the case in amphibians. However, there were not really enough data to say much about this, so as part of my graduate work I set out to get more insect data. I added a few hundred species, mostly just whatever I could get locally, and doing my best to include species from several orders with and without metamorphosis. That is step 2: assemble a dataset that can at least be used to identify a possible pattern. At this stage, the sampling is somewhat unconstrained — just get whatever you can, with the question still in mind. Why do it like this? Because a) you don’t have enough information to be very specific in what data you need, b) you’re working in a new area, so any data you get will be informative, and c) you don’t know if the pattern you are looking for is really the main pattern, so it is best to sample more widely in case some other pattern shows up.

Here is what I found:

With the exception of one beetle species out of more than 150 (and I still want to check this myself), no insects with complete metamorphosis appear to have genome sizes larger than 2pg (~ 2 billion base pairs). On the other hand, orders without complete metamorphosis often include species with enormous genomes.

So, step 3 is then to see whether this holds with a broader sampling. Now we are getting into the targeted exploration. What we need is a) more data from holometabolous orders (do they exceed this threshold and we just haven’t found them?) and b) more from hemimetabolous orders (do most of them have examples that are larger than the threshold?). Since this possible pattern was identified, we have added hundreds of species from both kinds of insects, including about 400 butterflies and moths (holometabolous, none larger than 2pg), 90 wasps, ants, and bees (holometabolous, none larger than 2pg), 75 flies (holometabolous, none larger than 2pg), and 100 dragonflies (about 1/5 of known diversity in North America; hemimetabolous, a few larger than 2pg). So far, so good, and this work continues with current projects on wasps, flies, caddisflies, and stone flies. But questions remain: Does this hold in additional orders? Is there really a link between development and genome size in insects? Why 2pg? Are there other explanations (e.g., other constraints, phylogenetic effects, differences at the level of mutational mechanisms)?

For step 4, we started to test this idea that development constrains genome size in insects. First, we looked at the rate of development (egg to adult) within a single genus (Drosophila), and found a significant correlation with genome size. We have also started looking at “curious” orders that may be exceptions that prove the rule: for example, mayflies have an additional nymphal moult that other hemimetabolous orders don’t, so this may impose an additional constraint and keep their genomes small — I have only looked at one so far (yes, small), but I will let you know how it turns out once we do a large sample. We are also looking at specific comparisons within orders based on a combination of their traits (developmental rate, parasitic vs free living, body size, flight) and phylogenetic relationships. In this case, shifts in lifestyle are especially informative because they may illustrate an evolutionary association between genome size and the characteristics of interest.

Assuming these patterns hold up and we are convinced that development is linked with genome size, we will want to know how — thus, step 5. The most likely mechanistic bridge between genome size and organism development is cell division. However, no one has looked at cell division rate across insects with different genome sizes. This would be much more difficult than doing large-scale surveys, but it could be focused on a few representative species with different DNA amounts. If we really want to know if DNA content affects cell division, we would need to examine this experimentally in step 6 — for example, by actively adding or removing different amounts of DNA and observing the effects on cell cycle parameters. I have been trying for a few years to get funding to do this (in yeast initially), but no success.

I think it is obvious that this kind of approach falls outside the typical hypothesis-driven focus. However, it does get us from knowing almost nothing in step 1 to formulating and testing specific hypotheses in step 6. Along the way, we have greatly expanded the available dataset, and have revealed several additional patterns worh exploring within some orders. If I had to express each step in the form of hypotheses, I probably could, but because we are exploring so many questions at once in each step, it makes more sense to just think about questions and make sure the sampling will allow us to generate answers. Without the existing knowledge base, focusing on one hypothesis only is premature and very limiting in what it will accomplish.

Obviously, we are not just interested in insects. Over the rest of the series, I will talk about other groups that we are eager to explore, and will discuss in more detail some of the focused work on mechanisms that I am interested in. Some of these therefore begin at step 1, others at step 6, and some somewhere in between.

How science works.

I am exhausted by the constant necessity of defending discovery and exploration science as being as legitimate as (and, in my opinion, far more influential than) “the” scientific method of testing very focused hypotheses. I am especially tired of having to reassure my graduate students that, despite what other faculty insist on implying, what they are doing is good science. As you can imagine, I am thrilled to see that the excellent resource site Understanding Science (a follow-up to their equally wonderful Understanding Evolution) agrees with me.

The alternative that they propose is pretty complex, but so is the real world.


If you’re one of those people who thinks there is one scientific method and that it involves only testing hypotheses, then you owe it to yourself and to your (and my) students to read this resource.

Flaws of the fudge factor.

A nice property of blogs is that they provide an opportunity to share ideas that, for one reason or another, never made it to publication. Here is a letter to Science that I wrote a few years ago which was bounced. It is in response to a piece called “Testing hypotheses: prediction or prejudice” by Peter Lipton (Science 307: 219-221; see replies by several others in Science 308: 1409-1412).

Flaws of the fudge factor

Lipton’s (1) analysis of common explanations for the intuitive notion that prediction is more convincing than accommodation is interesting and apt. His own contribution to the discussion – namely that differences in the opportunity for “fudging” in support of a favoured hypothesis drive this disparity – is less compelling. A few things to consider in this regard:

1) Who can fudge? Lipton (1) argues that “fudging” is more problematic in accommodation than in prediction, but this appears to be based on the assumption that the predictions are made by one scientist and tested independently by another. Certainly, this pertains to some fields of physics, but in a great many instances in the life sciences, the predictor and tester are one and the same. Thus, there is ample opportunity for (unintentional) “fudging” of predictive tests in support of a preferred hypothesis at all stages, from experimental design to data collection to interpretation. Moreover, one could make the argument that “tests” based on accommodation, which use data generated independently by many other authors and in which errors are random with respect to the hypothesis being tested, are more – not less – objective. It also should be self-evident that if an author ignores contradictory data during an accommodation-based test, then other members of the scientific community are likely to expose this omission.

2) A question of scale. If the comparison being made is in regard to a single datum, then indeed prediction may be far more convincing than accommodation. However, if the scales differ, as they often do in real life, such that the prediction relates to a tractable and therefore simple test but the accommodation deals with multiple, independent types of information, then the latter may be considered much more convincing. Therefore, it matters a great deal that prediction often tests one aspect of an issue whereas accommodation can include a much broader “consilience of inductions” (2).

3) Prediction as confirmation. The history of science is replete with examples in which predictive testing, while undoubtedly particularly convincing, has served mainly to confirm ideas that had been developed primarily on the basis of accommodation. This has included some of the most revolutionary breakthroughs in science, including relativity, natural selection, atomic theory, and genetics (both Mendelian and molecular). The question of which is the superior way to advance science is a false dichotomy.

4) An alternative hypothesis: x + y > x. Perhaps the most important point overlooked by Lipton (1) is that hypotheses that are not compatible with existing data – i.e., which have not already achieved accommodation – are rejected before ever being tested by prediction. In this sense, a more parsimonious explanation for the perception that prediction is more powerful than accommodation is that the former is a second-order test that only occurs after the first has been completed. In other words, prediction seems more powerful because its very existence implies that a more fundamental criterion, accommodation, has already been met. It is a mere truism that two tests will be more convincing than one, especially if the second test indicates inherently that the first has been passed.

References
1. Lipton, P. Testing hypotheses: predictions and prejudice. Science 307: 219-221, 2005.

2. Whewell, W. The Philosophy of the Inductive Sciences, Founded Upon Their History, 1840.

What would I do with more research support? Part One: Background.

One of the great joys of being a scientist is that we get to spend our lives exploring the aspects of the natural world that most intrigue and excite us. However, the equally great frustration of being a researcher is that our curiosity and passion invariably outstrip the resources available for our explorations. It often feels like we spend the bulk of our creative energy begging for money, and when this is declined — as it often is — it can be crushing. What keeps us going is the conviction that what we are doing, and what we have not yet found a way to do, is interesting and important and worth pursuing.

The primary focus of my research is the evolution of genome size in animals. Genome size is the amount of DNA in one copy of the chromosome set of a species, generally measured in terms of the number of base pairs (bp) or in mass (in picograms, or 10-12g). What makes this an intriguing topic of research is the enormous variability that exists across species: in animals, genome sizes range more than 7,000-fold. Think about that for a moment. Some animals have 7,000 times more DNA in their cells than others. Even within vertebrates, there is huge diversity at the genomic level: the largest (lungfish) is 350 times larger than the smallest (pufferfish). Or consider amphibians, which range about 120-fold from the smallest in some frogs to the largest in a few aquatic salamanders.

The human genome contains about 3.2 billion base pairs. In the simplest terms, one might expect this to be the largest genome of all — humans are the most complicated organisms (right?) and that should require the most genes (right?) which in turn means more DNA (right?). This was indeed the assumption when researchers began assessing genome sizes in the late 1940s — before the structure of DNA was elucidated, and even before it had been established that DNA is the hereditary molecule. At this time it was reported that the amount of DNA in a species’ cells is mostly constant (thus, genome size is also called “C-value”). This itself was suggested to indicate that DNA, and not protein, serves as the molecular basis of inheritance. However, it was also obvious by 1951 that the amount of DNA varies dramatically among species, and that the “complexity” of an animal and its genome size are decoupled. There are, it was discovered, salamanders with 40x more DNA per genome than in humans. This made no sense. DNA amount is constant within species because it is what genes are made of, and yet more complicated organisms (which presumably require more genes) may have substantially less DNA in their genomes than simpler organisms. This became known as the “C-value paradox” in the early 1970s.

It was not long before the apparent “paradox” was resolved: most DNA in animal and plant genomes is not genes (it is “non-coding DNA”). This means that genome size need not be related to the number of protein-coding genes, and that there is no reason to expect more complex animals to have more DNA in their genomes. However, this raised many new questions: What is this non-coding DNA? Where does it come from? How does it increase or decrease in amount in different genomes? Does it have any effect on the organism? Does it have any function? Why do some species have so much of it and others so little?

Despite several decades of research, most of these questions remain at best only partially answered. This is where my lab’s research comes in. We are interested in genome size diversity across all animals, in its effects on organism biology, and in the factors ranging in scale from individual DNA elements to ecological properties that accentuate or constrain amounts of DNA in the genomes of different species.

One thing that has become clear over the past several decades is that genome size is not randomly distributed across taxa. Some, like birds, all seem to have relatively small genomes. Others, like salamanders, all have large genomes. The quantity of DNA also relates to important features such as cell size and cell division rate, such that large genomes are found in cells that are big and divide slowly. Because all animals are made of cells, this means that any feature relating to cell size or cell division rate could be indirectly related to genome size. Body size is an obvious possibility, at least when cell numbers are held mostly constant. Metabolic rate is another possibility, because the larger a cell gets, the lower its relative surface area is, and this can influence gas exchange. Developmental rate is yet another, because slower individual cell divisions can add up to protracted development overall.

We have found that body size is correlated with genome size not only in some invertebrates like flatworms and copepod crustaceans, but also within specific groups of vertebrates like rodents, bats, and birds. Inverse relationships between genome size and metabolic rate have been reported in both mammals and birds, and in particular it has been argued that flight imposes a constraint on genome size due to its high metabolic demands. This latter idea has been around for several years, but it has recently become the subject of renewed interest and some intriguing new discoveries. For example, my colleague Chris Organ has used fossil cell size measurements to reveal that theropod dinosaurs (the lineage from which birds evolved) already had somewhat reduced genome sizes relative to other lineages before birds evolved, and that pterosaurs (the first vertebrates to evolve flight) also had small genomes. One of my students has been working on flight in birds, and showed that wing parameters associated with flight ability are related to genome size as well. We have also found recently that hummingbirds have the smallest genomes among birds (this isn’t published yet, but we’re writing the paper as we speak).

In terms of development, we have found in insects like lady beetles and vinegar flies that larger genomes are associated with slower overall development. Similar correlations have been known for some time in amphibians. What is more interesting is the pattern that we see with regard to metamorphosis, which represents a period of rapid and extreme physical reorganization. Groups with intensive metamorphosis, like frogs living in deserts that complete their life cycle quickly during wet seasons, have very small genomes (smaller than birds). Others, like aquatic salamanders that have lost the ability to metamorphose, have some of the largest genomes among animals. This also seems to apply to the major lineages of insects. Orders exhibiting complete metamorphosis (“holometabolous development”) appear almost never to exceed about 2 billion base pairs, whereas some without complete metamorphosis (“hemimetabolous development”) can be very large — there are grasshoppers with 5x more DNA than in humans.

Although genome size has been investigated for more than 60 years, some of these trends are only now coming to light. One reason is that we are focusing on the “big picture” now. Another reason is that we have technology that allows us to estimate genome sizes for large numbers of species. To give one example, an undergraduate student and I produced new data for more than 300 species of moths last summer alone. Previously, only 50 moth species had been analyzed (almost all of them in a pilot study I did a few years ago). Of course, this is a miniscule fraction of the 180,000 or so described species in the order, but it’s infinitely better than no information at all. Various students of mine have begun filling other major gaps, including in mammals, birds, insects, worms, and molluscs, but a huge amount of work remains just to get a basic picture of genomic diversity and its significance.

Over the upcoming series of posts, I will highlight some of the projects that I am very interested in undertaking, but which are on indefinite hold due to lack of funds. (It’s not that I haven’t tried — but granting agencies tend not to like this kind of large-scale “discovery” science as compared to the testing of very focused hypotheses). There are several reasons why I think it is worth doing this. First, most members of the public get only snippets of what goes on in research labs, most often provided by news reports. The raw curiosity that drives basic research is not often conveyed, particularly when projects are first conceived (vs. once they’re completed and published). Second, this is the stuff that gets me out of bed in the morning, and I hope that others can share in the excitement that my students and I feel when we think about, and try to answer, these fundamental questions about the diversity of life. Third, I believe it is useful for people to grasp the frustration that every scientist lives with when he or she feels that there are great ideas collecting dust for simple lack of funds. Finally, it provides an opportunity to talk about some intriguing animal groups from a perspective that most people haven’t considered. In that sense, it should be an interesting exercise in thinking about the wondrous biological diversity that surrounds us.

In the meantime, you are welcome to explore the Animal Genome Size Database to get a sense of the tremendous diversity — and glaring gaps in our knowledge — that drive my research program.