Faulty genetics not at fault

Photo by Joy Shan.

Suganthi Balasubramanian, associate research scientist in Mark Gerstein’s lab in Yale’s Molecular Biophysics and Biochemistry Department, was the lead Yale author in a paper published in the Feb. 17 issue of Science as part of a collaborative effort called the 1000 Genomes Project. This project aims to sequence the genomes of a large number of people in order to serve as a comprehensive resource on genetic variation. The research deals with genetic changes called loss-of-function variants, which are predicted to seriously disrupt protein coding within the body. As part of an initiative led by Yale and the Wellcome Trust Sanger Institute, Balasubramanian and fellow researchers worked to experimentally filter a range of these variants to develop a high-quality catalogue of the variants that cause true loss of function.

Q What kind of data analysis did you do, and how did it fit into the larger project?

A Our main role at Yale was to annotate these genetic variations. We know [the variations] are in the genome but we need some kind of identification, some signpost to see where is it and what it means. So we looked to see if [the variations] are in a protein coding region, and, if so, how do they change the amino acid sequence of the protein? This was essentially our role in the 1000 Genomes Project: to map and provide functional annotation of all the coding variants.

Q What light does your research shed on other genetics research today?

A Essentially this project contributes several things. First, people generally assume that loss-of-function variants are rare and, when observed, very harmful because they lead to disease and aren’t very common. People haven’t questioned why we see so many loss-of-function variants. Our careful analyses show that it’s very important to validate these variants. There are many ways to make erroneous variation calls, and you have to make sure you are really seeing what you see. The 1000 Genomes provided us with 3,000 loss-of-function variants, and we went through a long process of analysis with computational and experimental filters and came to only 1,285 true loss-of-function variants. There are lots of sequencing studies being done right now that look for genetic variation but in order for them to be clinically relevant, they have to be carefully validated. Our study provides a high quality catalogue of loss-of-function variants.

Q Could you describe the process of collecting the data and building the catalogue?

A After the work of many different groups, we receive a huge file that tells us where the different variants are in the human genome. We looked at this these files for 185 different people of different ancestries, and we’d map the variant to protein-coding regions and annotate them. I’d look at the genome in a linear representation, and I wanted to know what site the variant was in — was it in a coding gene? Is it in an entron? Is it in a non-coding region? This is what annotation is. I wanted to know where the variants landed, and I’m particularly interested in the ones that land in a protein-coding gene, which only constitute less than 2 percent of the genome. So to functionally annotate these variants, we first map them to coding gene, then we figure out what it does to the protein and what changes in function it causes. This has been done before, but we’re doing it on a large scale and fast.

Q What are the clinical implications of this catalogue?

A Essentially, we now have a candidate set of loss-of-function-containing genes, and this can now be used for target gene prioritization for diagnosing and treating diseases.

Q What are the project’s next steps?

A This work was based on only on 185 samples. The next step is called Phase I of the 1000 Genomes Project, and Phase I has data on over 1000 samples now. This work here was only the pilot phase. So we are going to get a much bigger catalogue. Our goal is to use the empirical rules we learned to build a more comprehensive loss-of-function catalogue from the 1,000 Genomes data … We’ll also add on other filters. So far we’ve look at variations one at a time, but now we’ll look at other variations in the same gene to see the overall effect on gene. Of course we’re also very interested in building an experimental chip where you can basically go and look at these variations in thousands of people. We want to target some specific variations, and we want to develop a system where you can probe only these specific variations. This is our big hope, and we hope it’ll lead to some good clinical discoveries.

Q What would be the clinical benefit of only probing the specific variations?

A When you sequence someone, you typically have 3 to 4 million variations, and it’s impossible to know which ones might be interesting in terms of biological functions or need closer looking at. It’s like finding a needle in a haystack; we need some way to figure out which variants to look at. So with a small subset to probe, you have a more defined targeted data set to experimentally research.

Clarification: Feb. 28 2012

This article was updated to reflect the exact date of publication of an article in Science in which Balasubramanian was the lead Yale author.


  • CharlieWalls

    I am intrigued by your finding “only 1,285 true loss-of-function variants” out of 3,000 presumptive examples. This opens many questions about definitions, procedures and mistakes. Also, where might focused exon sequencing fit — presumably a valuable information for filtering. But most important: conserved sequences in non-coding DNA from this large study should be of extreme value towards a better understanding of the other “99%” of the genome, where transcription is extensive and surely mysteries lurk not yet imagined!
    P.S. Never in my life have I seen ‘intron’ spelt ‘entron’…! Connotes something quite different and says a lot about education at Yale.

  • controlforconfounds

    An undergrad mispells a word most educated people have never seen before, and you draw a conclusion about the state of the education at Yale? Well, then. I certainly won’t take the time to point out stylistic errors in your post…this could go on forever. Let’s just call it mutually assured destruction, shall we?

    • River_Tam

      > An undergrad mispells a word most educated people have never seen before

      As a data point, I went to one of the worst high schools in the country and I learned about introns in my biology class.

      • controlforconfounds

        Keep in mind, this is a journalist’s article, not a scientist.

        • River_Tam

          I was just contesting your claim that “most educated people have never seen” the word “intron” before.

          I don’t really care about the mistake, although it does make clear that the journalist doesn’t even have the most rudimentary understanding of the subject she’s writing about AND that she couldn’t be bothered to do the research.

  • CharlieWalls

    The word happens to be in a very conventional computer dictionary and google finds “About 6,230,000 results”. Surely even prep school science exposes one to “intron”. It seems a liberal education without some of the ideas in science, these days, is not very comprehensive. [Coming to genes in eukayotic cells from experience in prokayotes, researchers were surprised to find large inserted regions of extraneous sequence information which they called “introns”. The adjacent regions actually coding protein were termed “exons”.]

  • controlforconfounds

    If people were supposed to know every word in the dictionary and its spelling…what would be the point of a dictionary? Obviously such an educated person as you would have no need for one, so you probably don’t even have one in your house or flat, do you? You know a fun word that is in a standard dictionary? Floccinaucinihilipilification. Kudos to Jo Brand on QI.

    • CharlieWalls

      Answers for “control…”, respectively: To see how to spell a word, and Yes, two on shelves, one in computer and most important, spell checker. But its not about knowing the words; it’s about what people think about with them (which I assume you really know). The comment above by ‘River_Tam’ is a telling data point. And as to what a journalist may know — these days that’s a question on the minds of many.

  • controlforconfounds

    “And as to what a journalist may know — these days that’s a question on the minds of many.”
    Haha! We may disagree, but that’s pretty funny.

    But still, I went to one of the best high schools in my state, but our biology classes weren’t good. It just depends. You can’t make an assumption about what educated people should know based on one (River Tam) person’s experience just as you can’t make an assumption about an entire university’s quality based on one word mispelled by one journalist in one article. Also, Joy Shan is a freshman…she hasn’t been at Yale for long, so any knowledge or lack thereof cannot be more than partially attributed to the education here.

  • The Anti-Yale

    Lighten up.

    This is a posting board with anonymous posters doing 98% of the posting (except for myself and Professor Solomon, on this day.).

    That means it is one-step-up from graffiti on a bathroom wall.

    You don’t criticize GUM (grammar, usage and mechanics) on a digital bathroom wall.


    M. Div. ’80, etc.

  • The Anti-Yale


    The idea of ‘anonymous editors’ is laughable: Pickeypedia?