Poster Presentation 43rd Lorne Genome Conference 2022

A hidden Markov model to identify inherited disease-causing variants using shared genetic markers  (#201)

Erandee Robertson 1 2 , Mark Bennett 1 2 3 , Bronwyn Grinton 3 , Karen Oliver 1 2 3 , Thessa Kroes 4 , Mark Corbett 4 , Jozef Gecz 4 5 , Michael Hildebrand 3 6 , Lynette Sadleir 7 , Ingrid Scheffer 3 6 8 9 , Sam Berkovic 3 , Melanie Bahlo 1 2
  1. Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
  2. Department of Medical Biology, University of Melbourne, Parkville, Victoria, Australia
  3. Epilepsy Research Centre, Department of Medicine, The University of Melbourne, Austin Health, Heidelberg, Victoria, Australia
  4. Adelaide Medical School & Robinson Research Institute, The University of Adelaide, Adelaide, South Australia, Australia
  5. South Australian Health and Medical Research Institute, Adelaide, South Australia, Australia
  6. Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, Victoria, Australia
  7. Department of Paediatrics and Child Health, University of Otago, Wellington, New Zealand
  8. Department of Paediatrics, The University of Melbourne, Royal Children's Hospital, Parkville, Victoria, Australia
  9. Florey Institute, Melbourne, Victoria, Australia

There have been rapid developments using whole exome or genome sequencing to detect rare disease-causing genetic variants, however there are thousands of individuals with only clinical chromosomal microarray or SNP array data available. Individuals who inherit the same genetic variant from a common ancestor also share genomic regions either side of the disease-causing variant. This suggests that the presence of a disease-causing variant may be inferred by identifying its associated haplotype.

We developed a statistical algorithm called FoundHaplo, which is a hidden Markov model designed to identify individuals with inherited disease-causing genetic variants using SNP data. Repeat expansions are the cause of at least 50 diseases that are typically inherited with strong founder effects and are therefore an excellent candidate set of diseases to demonstrate the utility of FoundHaplo. We performed a simulation study to evaluate the performance of 29 repeat expansion diseases. FoundHaplo correctly predicted 94% of simulated samples sharing a region of at least 1 cM surrounding the disease-causing variant and 100% of simulated samples sharing 2 cM or more.

We focused on three disease-causing variants associated with epilepsy with a known founder effect. Familial adult myoclonic epilepsy type 1 and 2 (FAME1, FAME2), which are both caused by repeat expansions, and a variant in SCN1B, which causes generalised epilepsy with febrile seizures plus. Using FoundHaplo, we searched for putative variant-associated haplotypes in genotype data for ~1,600 individuals with epilepsy. FoundHaplo was able to identify the shared disease-causing haplotype in eight individuals from three families with FAME1, six individuals with FAME2 and ~30 individuals from nine families with the SCN1B variant.  

FoundHaplo enables the use of SNP data to determine the genetic causes of diseases in more individuals by using data from individuals known to have the disease-causing variant to screen patients with unknown causes of disease.