Long Read Data Analysis Tools

The Transcriptomics module offers a variety of tools specifically designed for the analysis of long-read RNA-sequencing data from PacBio or ONT platforms.

Long-Read Alignment with minimap2: Minimap2 is a sequence alignment tool designed to efficiently and accurately map long and noisy DNA or RNA sequences against a reference genome or sequence collection. It uses a 'minimizer' indexing approach to rapidly identify potential alignment anchors, which are then refined through dynamic programming to generate accurate and sensitive alignments.
Transcript Identification and Quantification: OmicsBox offers a variety of tools for the identification and quantification of transcripts from long-read RNA-sequencing data, suitable for different use-cases.
- PacBio-based Identification with IsoSeq: PacBio's IsoSeq pipeline preprocesses PacBio single-molecule sequencing data and defines transcript models. This composable workflow combines existing tools and algorithms with a novel clustering technique to handle the increasing data output from PacBio sequencing platforms.
- Identification and Quantification with FLAIR: FLAIR enables transcriptome reconstruction and quantification from long-read RNA sequencing data. During reconstruction, it uses reference annotations and/or short-read data to correct splice junctions observed in long reads, then identifies both known and novel transcript isoforms. For quantification, FLAIR can map long reads to either a newly reconstructed transcriptome or a provided reference transcriptome.
- Identification and Quantification with IsoQuant: IsoQuant performs genome-based analysis of long RNA reads, enabling reconstruction and quantification of transcript models with high precision and good recall. When a reference annotation is provided, IsoQuant assigns reads to annotated isoforms based on intron-exon structure and performs quantification at both gene and isoform levels.
- Reference-free Isoform Reconstruction with the isON-pipeline: The isON-pipeline reconstructs transcriptomes from long-read sequencing data (PacBio or ONT) without requiring reference annotations or genomes. This three-component pipeline (isONclust3, isONcorrect, and isONform) is particularly well-suited for non-model organisms.
Curation of Long-Read Transcriptomes with SQANTI3: SQANTI3 enables quality control and filtering of custom transcriptomes generated from long-read RNA sequencing data. It compares these transcriptomes against reference transcriptomes and incorporates orthogonal data including short reads, CAGE peaks, polyA peaks, and polyA motifs.
Combining Transcriptomes with TAMA Merge: TAMA Merge combines multiple transcriptome annotations into a single, unified transcriptome. This tool is particularly useful when transcriptome reconstruction has been performed separately on individual samples that need to be integrated.