Poster Presentation 43rd Lorne Genome Conference 2022

TOP MOVIE: a tandem orthogonal parsimonious machine learning optimized workflow for rapid Mendelian variant interpretation and genomic diagnosis (#134)

Pei Dai 1 2 3 , Andrew Honda 3 , Rachel Fieldhouse 3 , Aaron Statham 3 , Thomas Ohnesorg 3 , Ben Lundie 3 , Eric Lee 3 , Matthew Hobbs 3 , Arthur Poulet 3 4 , Joseph Copty 3 , Georgina Hollway 1 3 , Michel Tchan 3 5 , Kishore R Kumar 5 6 , Peter Schols 7 , Cyrielle Kint 7 , Warren Kaplan 3 , Wunna Kyaw 1 3 , Tri G Phan 1 2 3 , Leslie Burnett 1 2 3 8
  1. Faculty of Medicine, St Vincent's Clinical School, UNSW, Sydney, NSW, Australia
  2. Clinical Immunology Research Consortium of Australasia (CIRCA) , Sydney, NSW, Australia
  3. Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
  4. University Claude Bernard, Lyon, France
  5. Genetic Medicine, Westmead Hospital, Sydney, NSW, Australia
  6. Molecular Medicine Laboratory and Neurology Department, Concord Repatriation General Hospital, Concord, NSW, Australia
  7. Diploid, Invitae Corporation, Leuwen, Belgium
  8. Northern Clinical School, Faculty of Medicine and Health, University of Sydney, St Leonards, NSW, Australia

BACKGROUND: Genomic variant interpretation to the clinical standards of a diagnostic laboratory is a labour-intensive process that can take hours to days. Our aim was to reduce this workload by shrinking the number of variants needing manual curation through eliminating informationally redundant genomic attributes.

METHODS: We evaluated several machine learning algorithms using a clinically validated training dataset of pathogenic and non-pathogenic variants to develop NINO, a parsimonious decision tree genomic classifier built on optimisation of variant annotations in our existing pipeline. We used NINO to generate a candidate list of potentially pathogenic variants and AMELIE, a freely available phenomic classifier, to rank these variants based on phenotypic relevance. The resultant workflow is TOP MOVIE, a Tandem, Orthogonal Parsimonious Mendelian Optimized Variant Interpretation Engine.

RESULTS: NINO reduced the number of genomic attributes needing evaluation by an order of magnitude. The addition of the phenomics classifier AMELIE in tandem with NINO further decreased the number of candidate variants requiring curation. The resultant TOP MOVIE workflow significantly reduces the variant search space and identifies the causative pathogenic variant with exponential decrease in turn-around time (TAT).

CONCLUSIONS: TOP MOVIE performs as well as human experts but is exponentially faster. It can be easily implemented in any clinical diagnostic laboratory and optimised using its existing annotation pipeline and referral population, and can be customized and updated without any requirement for programming expertise. Our machine-learning optimised parsimonious classifier (NINO) correctly classified known pathogenic variants using only a small proportion of commonly-used genomic attributes, suggesting that existing in silico annotation tools may already hold sufficient information content for accurate diagnosis. TOP MOVIE is currently clinically validated for single nucleotide variants and indels (≤ 20 nt) in genomic coding regions; we have not yet applied it to structural variants or non-coding variants.