Poster Presentation 43rd Lorne Genome Conference 2022

Using consortium-level epigenetic data to identify cell type-specific genes provides a computational anchor to define biological diversity in single cell data   (#257)

Yuliangzi Sun 1 , Woo Jun Shim 1 , Sophie Shen 1 , Quan Nguyen 1 , Nathan Palpant 1
  1. Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia

Current methods in cell type identification from single cell data rely on unsupervised clustering algorithms which use data structure and parameter settings to partition data but lack reference to any biologically meaningful reference point to define cell diversity. This study develops a novel cell type identification and visualisation method that integrates cell type specific gene information as a computational anchor to define any heterogeneous cell population in single cell data. We use patterns of H3K27me3 domains deposited across hundreds of diverse cell types to determine genes governing cell type-identity in an unsupervised manner. The method devises a repressive tendency score (RTS) that represents the association between each gene and broad H3K27me3 domains, enabling prediction of cell-type specific regulatory genes from any cell type. Importantly, the use of RTS values provides a simple, quantitative value assigned to all protein coding genes to study orthologous gene expression data, eliminating the need for limitations imposed by using reference epigenetic data linked to specific cell types. We develop a method using the quantitative value of the RTS as an unsupervised biological anchor point to identify any cell type. Using a topographical contour plot concept, we use RTS values identified in cells as contour lines in a map. For each cell, we first identify the most abundant RTS priority gene then use its corresponding RTS value to quantify the position of the cell in the contour plot. We use a weighted density estimation plot-based visualisation approach where RTS value is the weighting parameter to adjust cell density in a 2D UMAP space. We couple the cell type identification UMAP with a computational method drawing on epigenetic co-modulation of genes that provides an unsupervised method to reveal the structural and regulatory basis of any cell type to parse gene programs underpinning cell identity and function.