Poster Presentation 43rd Lorne Genome Conference 2022

dedUCE: efficient identification of ultraconserved elements and applications to genome assembly completeness (#271)

Cadel Watson 1 , Mitchell J Cummins 2 , Yasir Kusay 1 , Maxine Halbheer 1 , Eric Urng 1 , John S Mattick 2 , Richard J Edwards 2
  1. School of Computer Science and Engineering, University of New South Wales, Sydney, NSW, Australia
  2. School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia

Ultraconserved elements (UCEs) are DNA sequences which are extremely conserved and found almost unchanged in the genomes of multiple, divergent species [1]. UCEs have been found in a wide variety of organisms, including mammals, fish, insects, birds, and plants. Whilst the evidence suggests that that they are the result of natural selection, indicating biological importance, their function has thus far proven elusive [2]. The recent explosion in the quality and quantity of reference genomes across multiple taxa provides new opportunities for investigating the prevalence, evolution and role of UCEs. However, the field is hampered by a lack of fast and resource-efficient algorithms to identify UCEs. Furthermore, common alignment-based algorithms fail to identify non-syntenic UCEs. 

Here, we present dedUCE, a novel tool for identifying all UCEs in a set of genomes. dedUCE uses a hash-based algorithm to rapidly identify core UCE kmers that are shared by multiple genomes, before extending and merging candidates into a final comprehensive but non-redundant set of UCEs. dedUCE can support UCEs appearing out-of-order due to genetic rearrangements and/or assembly artefacts, and is able to return UCEs with inexact homology and support. The tool can then apply these UCEs to genome assemblies and produce three related measures of completeness, based on the presence of UCEs and their ordering in a consensus syntenic map. 

Preliminary results show that dedUCE can identify all UCEs in a group of 40 mammalian genomes in 8 hours on a 16-core machine, which is orders of magnitude faster than previous algorithms. Using UCEs identified in organisms closely related to four target assemblies, dedUCE provides a good complement to existing completeness measurements, especially in targeting non-coding regions and identifying local rearrangements of UCE pairs. 

DedUCE therefore improves on existing methods for UCE identification, enabling large-scale analysis, and provides new tools for measuring assembly completeness.

  1. Gill Bejerano, Michael Pheasant, Igor Makunin, Stuart Stephen, W. James Kent, John S. Mattick, and David Haussler (2004). Ultraconserved El- ements in the Human Genome. Science, 304(5675):1321–1325.   Konstantinos
  2.  Kritsas, Samuel E. Wuest, Daniel Hupalo, Andrew D. Kern, Thomas Wicker, and Ueli Grossniklaus (2012). Computational analysis and char- acterization of UCE-like elements (ULEs) in plant genomes. Genome Research, 22(12):2455–2466.