There have been rapid developments using whole exome or genome sequencing to detect rare disease-causing genetic variants, however there are thousands of individuals with only clinical chromosomal microarray or SNP array data available. Individuals who inherit the same genetic variant from a common ancestor also share genomic regions either side of the disease-causing variant. This suggests that the presence of a disease-causing variant may be inferred by identifying its associated haplotype.
We developed a statistical algorithm called FoundHaplo, which is a hidden Markov model designed to identify individuals with inherited disease-causing genetic variants using SNP data. Repeat expansions are the cause of at least 50 diseases that are typically inherited with strong founder effects and are therefore an excellent candidate set of diseases to demonstrate the utility of FoundHaplo. We performed a simulation study to evaluate the performance of 29 repeat expansion diseases. FoundHaplo correctly predicted 94% of simulated samples sharing a region of at least 1 cM surrounding the disease-causing variant and 100% of simulated samples sharing 2 cM or more.
We focused on three disease-causing variants associated with epilepsy with a known founder effect. Familial adult myoclonic epilepsy type 1 and 2 (FAME1, FAME2), which are both caused by repeat expansions, and a variant in SCN1B, which causes generalised epilepsy with febrile seizures plus. Using FoundHaplo, we searched for putative variant-associated haplotypes in genotype data for ~1,600 individuals with epilepsy. FoundHaplo was able to identify the shared disease-causing haplotype in eight individuals from three families with FAME1, six individuals with FAME2 and ~30 individuals from nine families with the SCN1B variant.
FoundHaplo enables the use of SNP data to determine the genetic causes of diseases in more individuals by using data from individuals known to have the disease-causing variant to screen patients with unknown causes of disease.