Individuals of European ancestry disproportionately dominate participation in human genetic studies, to the detriment of scientific inquiry and the equitable translation of genomics research. One genomic study type, expression quantitative trait locus (eQTL) mapping, identifies statistical associations between the genotype at a given locus and variation in gene expression. eQTL studies have been valuable tools for understanding the regulatory consequences of disease-associated genetic variants. However, not all eQTLs are shared across populations. Understanding the biological and genomic features of these population-specific eQTLs could allow us to predict the portability of eQTLs identified in European cohorts to understudied populations.
Here we use summary statistics from two published multi-population eQTL studies to classify eQTLs as population-specific or shared between at least two populations (African American, European American, Indonesian etc.). We train machine learning models to predict whether or not an eQTL is specific to its discovery population using publicly available information on the evolutionary, functional, and expression properties of these eQTLs. Of all considered features, we find allele frequency, eQTL effect size, gene conservation (e.g. LOEUF, phyloP) and gene expression measures are the most informative predictors. The success of our model in classifying eQTLs as shared or population-specific in a held out test set of eQTLs (auROC > 80%) suggests the properties of eQTLs could be used to assess the probability a particular eQTL is specific to a population. Since current Eurocentric biases in genomic resources are likely to persist for some time, our approach could be an important step toward a more equitable understanding of gene regulation, and hence more equitable personalised medicine.