Neuropsychiatric disorders are a spectrum of complex and highly debilitating conditions. Genetic risk plays an important part in who develops neuropsychiatric disorders and many risk genes are being identified, but the underlying mechanisms are poorly understood. Genome-wide association studies (GWAS) have led to the discovery of potentially disease significant loci and the importance of non-coding and intergenic regions of the genome in disease risk. However, ascertaining a biological significance to the GWAS loci has been limited by the scope of traditional sequencing approaches. In this study, we try to mitigate the limitations of short and long-read sequencing by pairing them with CaptureSeq.
We performed short-read (SR), capture short-read (SR CapSeq), capture long-read (LR CapSeq) and capture long-read with the selection of reads >1.5 kb (LR FracCap) sequencing on three regions (Cerebellum, Superior Temporal Cortex and Striatum) of postmortem human brain. We compared the performance of “targeted” short and long-read sequencing (CaptureSeq) based on their ability to identify and quantify risk gene isoforms and lncRNAs. The “targets” included 3147 regions (including protein-coding genes, lncRNAs and intergenic regions) linked to neuropsychiatric disorders.
Compared to SR, 164-fold, 68-fold and 55-fold enrichment were observed on average in the target regions of SR CapSeq, LR CapSeq and LR FracCap, respectively. LR FracCap sequencing identified and quantified the largest number of captured risk genes (LR Cap = 129, LR FracCap = 135) and isoforms (LR Cap = 302, LR FracCap = 363), following by LR CapSeq. The two long-read techniques were also extremely effective in identifying novel transcriptomic features missed by the two short-read methods like potentially novel lncRNAs (LR Cap = 200, LR FracCap = 195) in intergenic regions. These results indicate the potential of long-read capture sequencing for effective quantification of the transcriptome and detection of novel features.