DNA bases represented by lit letters

Are reports of 'gene X for disease Y' always correct?

19/4/04. By John Pickrell

Studies claiming that a particular genetic variant is associated with a disease should be viewed with caution, as John Pickrell reports.

Few days go by without a new study linking a genetic variant to an increased risk of developing a complex disease such as diabetes, Alzheimer's or heart disease. These findings have the potential to be immensely important in the prevention, prediction and treatment of illnesses. But how often do these studies live up to their promise?

Although some diseases are the result of one single faulty gene (cystic fibrosis for example), most are the result of a tangled web of genetic and environmental factors. Many single genes contribute, but each has only subtle effects.

To look for genetic variation between people that can explain why some are likelier to get certain diseases, geneticists compare the DNA of a group of healthy individuals with to that of a similar size group of disease sufferers. The results of such association studies can be difficult to interpret or reproduce, however: if the genetic effect is small, it is often hard to tell whether the association is true or false, or it may be true only in a certain population.

Reproducible associations

In a report published in the March 2002 edition of the journal Genetics in Medicine, Professor Joel Hirschhorn (Children's Hospital Boston, Harvard Medical School) and colleagues reviewed many of the genetic association studies published between 1986 and 2000.

The researchers identified 268 genes that contained variants associated with one of 133 common diseases or traits such as lung cancer, Parkinson's disease, epilepsy and schizophrenia. In total, these 268 accounted for 603 different gene-disease associations.

(The review did not examine single gene disorders, nor disease associations with HLA or blood group antigens – as many such links are well established.)

Feature: The major histocompatibility complex

Of the 603 associations, 166 had been studied three or more times and, of these, only six associations were reproduced consistently (being statistically significant in 75 per cent or more of the studies). "That's certainly fewer than I would have thought," said Professor Hirschhorn.

Table 1. The six reproducible gene-disease associations

Disease/trait
Gene (variant)
More details
Chromosome browser link*
Alzheimer's disease
Apolipoprotein E (epsilon 4)
 
Type 1 diabetes
Insulin (upstream VNTR)
INS
Creutzfeldt-Jakob disease
Prion protein
 
HIV infection/AIDS
Chemokine (C-C motif) receptor 5
 
Deep vein thrombosis
Factor 5 (Leiden mutation)
 
F5
Graves disease
CTLA4
   
*The interactive chromosome browser requires Flash.

Of the six associations, the most reproducible was the association of ApoE4 and Alzheimer's disease, supported by dozens of statistically significant reports.

What of the other 160 gene-disease associations? Had the follow-up reports refuted the original study, showing them to be 'false-positives', or were the genetic effects just too weak to be significant? A follow-up report co-authored by Joel Hirschhorn, Kirk Lohmueller of Georgetown University in Washington DC and others, published in the February 2003 issue of Nature Genetics, hinted that the situation might not be as dire as those results suggest.

The second review examined 25 of the 160 associations in a 'meta analysis' – a statistical technique that pools the results of multiple studies. For eight of the 25 associations, the pooled follow-up studies did support the first report.

The researchers concluded that "although false positive associations are abundant, there are also many real associations lurking in the data. These true associations probably confer a modestly higher risk of common disease and thus are difficult to detect".

"Association studies have been 'over-hyped' over the last few decades," said Professor John Todd at the University of Cambridge. "People started with the most optimistic model but are now forced to be realistic."

Improving association studies

Part of the problem of why false associations get reported is due to the failings of the common statistical methods used by scientists to prove the significance of a result. The typical cut-off point for a significant result means that it has a 95 per cent chance of being a non-random finding. "However, that means that one in 20 times, a gene-disease link will appear to be 'significant' because of random statistical fluctuations and be incorrectly declared to be biologically important," said Professor Hirschhorn - an important issue with the huge numbers of gene-disease links being tested monthly.

That problem may also be exacerbated by what he calls 'the desk drawer effect': a tendency for scientists to publish significant findings alone, even though the same effect may have been tested ten times before a link was found. The negative reports end up languishing in the desk drawer. Meanwhile, the first report overestimates the true genetic effect – as it is the first to reach statistical significance and be published (a phenomenon also known as 'winner's curse').

Another potential problem is the genetic variation that exists between populations. For example, Northern Europeans are more likely to have type I diabetes than people from Southern Europe. "In a study of diabetes, if more Northern Europeans end up in the 'disease sufferers group' than in the healthy control group, then you might detect genetic variation that exists between the two populations but is unrelated to the disease", said Professor Hirschhorn.

There are ways to remedy these problems, though. One issue is agreeing on stringent statistical criteria for genetic association studies. "It's very important that we define better statistical thresholds for declaring genetic associations," said Professor Hirschhorn. "We should see more reproducible results as the statistical power of studies increase," added Professor Todd.

Hirschhorn and Todd also both agree that projects must include larger groups than the few hundred sick, and few hundred healthy people typically used. Thousands may be required to detect very subtle genetic effects. "With small sample sizes, you often detect things by chance alone," said Professor Todd.

In fact Professor Todd's team is now carrying out just such a study. The project is looking for new genetic variants linked to diabetes type I, with cooperation from clinicians across the UK. The team hopes to have up to 8000 diabetes sufferers on board, with blood samples and paperwork collected at clinics nation-wide.

The key to creating studies of this size will be for research groups to club together says Todd, who is working alongside scientists at the Wellcome Trust Sanger Institute near Cambridge. "This year really is the beginning of powerful concerted efforts to uncover genes responsible for complex diseases…we now have a much greater grasp on what's needed for the future."

Links

Joel Hirschhorn, Children's Hospital Boston, Harvard Medical School, research page

John Todd, Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory , University of Cambridge

Further reading

Hirschhorn J et al (2002) A Comprehensive review of genetic association studies. Genetics In Medicine. 4: 45–61. Abstract

Lohmueller K et al (2003) Meta-analysis of genetic association studies supports a contribution of common varients to susceptibility to common disease. Nature Genetics 33: 177–182. Abstract

Share |
Wellcome Trust, Gibbs Building, 215 Euston Road, London NW1 2BE, UK T:+44 (0)20 7611 8888