Poster Presentation 43rd Lorne Genome Conference 2022

PreSTEge: An ensembl machine model to Predict Stochastic Translation Efficiency genome-wide (#169)

Attila Horvath 1 , Yoshika Janapala 1 , Eduardo Eyras 1 , Ross Hannah 1 , Thomas Preiss 1 2
  1. Australian National University, Reid, ACT, Australia
  2. Victor Chang Cardiac Research Institute, Darlinghurst, NSW, Australia

Translational control represents the true endpoint of gene expressional regulation, allowing immediate and selective changes in protein level over the timeframe of minutes, which is impossible at the level of transcription. Therefore, it is crucial to accurately measure protein synthesis rate both in physiological and pathological conditions. The traditional measure of protein synthesis rate termed Translation Efficiency is defined as monosome occupancy, inferred from ribo-seq experiments, normalised by the estimated RNA copy numbers estimated from transcriptome profiling. This measure does not take into account the detrimental effect of collided ribosomes (disomes) harbouring efficient protein synthesis. To overcome these limitations we built a theoretical framework to understand ribosome collision and its relation to translational dynamics and designed an improved measure for protein synthesis rate. To test our newly developed measure, Translation Complex Profile sequencing (TCP-seq) has been utilised an improved version of ribo-seq that can profile both monosomes and disomes. Our theoretical model revealed that disomes can be resulted from micro-collisions as a natural consequence of high-rate translation. This finding has been confirmed by a thorough simulation featuring an in-silico translatome with several codon patterns resulting in unique monosome and disome patterns. These characteristics resemble the experimentally measured TCP-seq data. Having detected and removed putative stalling sites using a Hidden Markov model-based peak detector, the rate of monosomes and disomes proved to be indicative of the protein synthesis rate as well as the cellular localisation of the translated messenger RNA. Here we present PreSTEge which utilises the predictive power of stochastic disome formation in translation dynamics based on the predictions of an independently trained Support Vector Machine and a Conditional Forest model. As revealed by our thorough benchmark against traditional TE measures, PreSTEge represents the most accurate measure available for protein synthesis rate.