BAM File Quality Control
Introduction
This tool allows the evaluation of alignment files of RNA-Seq datasets comprehensively. It makes use of the R package RSeQC which provides a number of modules that quickly inspect sequence quality, nucleotide composition bias, PCR bias, and GC bias. The RNA-Seq specific modules allow to evaluate:
- sequencing saturation
- mapped reads distribution
- coverage uniformity
- strand specificity
- transcript level RNA integrity
- and more.
Please cite RSeQC and Samtools as:
- Wang L., Wang S. and Li W. (2012). RSeQC: quality control of RNA-seq experiments. Bioinformatics (Oxford, England), 28(16), 2184-5.
- Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G. and Durbin R. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics (Oxford, England), 25(16), 2078-9.
Run Bam Quality Control for RNA-Seq alignment data
It can be found in the Transcriptomics Module of OmicsBox under Bam File Quality Control. The wizard allows providing various input BAM files, an optional BED, GFF, or GTF reference, and several parameters that depend on the input data (Figure 1 and Figure 2)
Input
Aligned RNA-Seq short-reads in BAM format (single or paired-end) can be provided as input.
Configuration
- Gene Models
A BED file with gene models can be provided, that has to match the version of the reference genome during the previous mapping step. Chromosome names have to match and the BED file is expected to have 12 tab-separated columns.
Providing a BED file is optional and recommended because many statistics and plots are not available without this. - Minimum Mapping Quality
Establish the minimum mapping quality (Phred-scaled) for an alignment to be considered "uniquely mapped". - Read Alignment Length
Set this to the original read length. For example, all these cigar strings ("101M", "68M140N33M", "53M1D48M") suggest the read alignment length is 101. - Read Sample Rate
The number of aligned reads will be used to calculate the mismatch and deletion profiles. The default value is 1000000. - Minimum Intron Length
Minimum intron length in base pairs. The default value is 50. - Min Reads for Junction Calls
The minimum number of supporting reads necessary to call a junction. The default value is 1.
Results
- Table with the main information about all the analyzed samples (Figure 4).
- Report with specific information about each sample (Figure 3).
- Charts can be created from the main table side panel.
Charts
In the side panel of the table (Figure 4) there are different action buttons with different chart categories (some examples below):
- General Charts
These charts are related to general characteristics. - Annotation Based Charts
These charts are related to the distance of Full Splice Match (FSM) and Incomplete Splice Match (ISM) transcripts to annotated Transcription Start Sites (TSS) and Transcription Termination Sites (TTS).