Oral Presentation 43rd Lorne Genome Conference 2022

Detecting genomic variations in COVID-19 virus associated with worse disease outcome (#14)

Priya Ramarao-Milne 1 , Yatish Jain 1 2 , Michael Kuiper 3 , Letitia Sng 1 , Natalie Twine 1 2 , Laurence O.W. Wilson 1 2 , Denis Bauer 1 2 4
  1. Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, Sydney, NSW, Australia
  2. Department of Biomedical Sciences, Macquarie University, Sydney, NSW, Australia
  3. Data 61, Commonwealth Scientific and Industrial Research Organisation, Sydney, NSW, Australia
  4. Applied BioSciences, Faculty of Science and Engineering, Macquarie University, Sydney, NSW, Australia

Currently, the Global Initiative on Sharing All Influenza Data (GISAID) contains the largest SARS-CoV-2 viral sequence database to date, containing 1.5 million sequences as of 11th May 2021. Despite the large number of sequences deposited, the utility of the large majority of samples for data analysis is limited due poorly annotated clinical information. Nonetheless, we have identified samples from patients annotated with favourable outcomes (such as mild, asymptomatic disease) as our controls, and samples annotated with patients with unfavourable outcomes (dead, critical) as our cases. In this study, we have used our machine learning tool, VariantSpark, to perform an association study on 3412 cases and 7109 controls with the aim of detecting mutations in SARS-CoV-2 that correlate with poor patient outcome. Our approach identified mutations previously known to impact viral transmission rates and disease severity, such as D614G and V1176F, associated with the Brazil and South Africa variants of concern. We also find mutations in the nsp14 protein, and novel mutations in the spike regions associated with worse patient outcome. Using our epistasis tool BitEpi, we also identify putative higher order epistatic interactions which could represent novel interacting loci which impact disease severity. Lastly, we use AlphaFold to predict the consequences of our candidate mutations on protein structure. Taken together, our study identified novel candidate loci and mutations of interest which warrant further investigation.