CpG islands (CGIs) represent a conserved DNA sequence class found at vertebrate gene regulatory elements. CGIs are characterised by sequence features compatible with transcriptional activation including: elevated CpG density, GC content and presence of transcription factor binding sites (TFBS). Additionally, CGIs lack DNA methylation (5-methylcytosine, 5mC), a chemical modification to DNA associated with gene silencing. CGIs have largely been studied in hypermethylated vertebrate genomes, where lack of 5mC is a defining feature. However, it is unclear whether 5mC is the primary driving force behind CGI evolution and regulatory function. Unlike vertebrate genomes, invertebrate genomes are typically sparsely methylated, thus the possibility of invertebrate genomes containing CGIs has not been greatly considered. This study aims to establish whether CGIs are a vertebrate-specific innovation, or if they are a deeply conserved feature of metazoan regulatory elements that exist independently of 5mC.
In this study, non-methylated CpG island-like sequences (NMIs) were isolated and sequenced from eight invertebrate species containing variable genomic 5mC using BioCAP-seq, a biochemical method based on protein affinity pulldown of CpG-rich DNA. Analysis of invertebrate NMI maps revealed increased CpG and GC content at NMIs compared to control regions of the genome. Enriched BioCAP-seq signal indicative of NMI presence was found at in silico predicted CGIs. Whole-genome bisulfite sequencing and ATAC-seq data validated that invertebrate NMIs are hypomethylated and associated with accessible chromatin, respectively. Promoter-associated NMIs were enriched in motifs corresponding to methyl-sensitive and chromatin remodelling TFBS and were more highly conserved than non-NMI promoters (phastCons score, p-value < 0.001), in line with decreased CpG mutation rates described at vertebrate CGIs.
In summary, invertebrate NMIs resemble vertebrate CGIs, challenging the assumption that 5mC is a major determinant of CGI function. Understanding the epigenetic factors necessary for CGI evolution will provide valuable insights into the fundamental mechanisms that control gene expression.