Oral Presentation 43rd Lorne Genome Conference 2022

Small but mitey: high-quality long-read assembly of a streamlined mite genome from contaminated sequencing data (#17)

Richard J Edwards 1 2 , Stephanie H Chen 1 2 3 , Jason G Bragg 3 4
  1. School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
  2. Evolution & Ecology Research Centre, University of New South Wales, Sydney, NSW, Australia
  3. Research Centre for Ecosystem Resilience, Australian Institute of Botanical Science, The Royal Botanic Garden Sydney, Sydney, NSW, Australia
  4. School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney, NSW, Australia

As pilot data for project on myrtle rust resistance, we previously assembled two Myrtaceae genomes using 10x Chromium linked reads: Rhodamnia argentea (silver malletwood) and Syzygium oleosum (blue lilly pilly). Both draft genomes achieved scaffolding (N50 > 850 kb) and completeness (BUSCOv3 embryophyta_odb9 > 90 %) of sufficient quality to be annotated by NCBI RefSeq. However, signs of arthropod sequence contamination were subsequently found in the Rhodamnia argentea assembly. We therefore sought to identify and eliminate this contamination during improvement and curation of the genome for publication.

A risk-averse analysis highlighted 49.6 Mb (11.95%) on 2,996 of 15,781 scaffolds of possible arthropod origin. An improved assemblyof the same tree, incorporating ~50X long-read (ONT) sequencing, has confirmed this contamination as 11 scaffolds (34.6 Mb) that are distinct from 75 R. argentea assembly scaffolds (346.7 Mb), increasing the likelihood of contamination over the integration of horizontally transferred genes. Taxonomic analysis of predicted protein-coding genes using Taxolotl (https://github.com/slimsuite/taxolotl) suggested that the contamination most likely originates from some form of mite (Order: Trombidiformes), but limited NCBInr mite sequences precluded better taxonomic resolution. Curiously, these contamination scaffolds showed a high depth of coverage (~36X), but a fairly low BUSCO completeness of 58.1% (v5 Augustus, metazoa_odb10 n=954), apparently inconsistent with typical mite genomes.

Phylogenomic analysis with available mite genomes identified the closest relative as Aculops lycopersici, a microscopic (0.2 mm long) eriophyoid mite with a heavily streamlined 32.5 Mb genome. Original low completeness appears to be from a combination of genome reduction and poor performance of that BUSCO version; BUSCO v5 MetaEuk eukaryota_odb10 (n=255) reports 82.8% completeness, which is approaching the 86.3% of A. lycopersici. Here, we discuss the evidence that we have assembled a highly complete but streamlined genome from an unknown eriophyoid mite, plus the need to improve genomic representation of contaminating pest species.