Skip to content

BAM File Quality Control

Introduction

This tool allows the evaluation of alignment files of RNA-Seq datasets comprehensively. It makes use of the R package RSeQC which provides a number of modules that quickly inspect sequence quality, nucleotide composition bias, PCR bias, and GC bias. The RNA-Seq specific modules allow to evaluate:

  • sequencing saturation
  • mapped reads distribution
  • coverage uniformity
  • strand specificity
  • transcript level RNA integrity
  • and more.

Please cite RSeQC and Samtools as:

  • Wang L., Wang S. and Li W. (2012). RSeQC: quality control of RNA-seq experiments. Bioinformatics (Oxford, England), 28(16), 2184-5.
  • Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G. and Durbin R. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics (Oxford, England), 25(16), 2078-9.

Run Bam Quality Control for RNA-Seq alignment data

It can be found in the Transcriptomics Module of OmicsBox under Bam File Quality Control. The wizard allows providing various input BAM files, an optional BED, GFF, or GTF reference, and several parameters that depend on the input data (Figure 1 and Figure 2)

Input

Aligned RNA-Seq short-reads in BAM format (single or paired-end) can be provided as input.

Configuration

  • Gene Models
    A BED file with gene models can be provided, that has to match the version of the reference genome during the previous mapping step. Chromosome names have to match and the BED file is expected to have 12 tab-separated columns.
    Providing a BED file is optional and recommended because many statistics and plots are not available without this.
  • Minimum Mapping Quality
    Establish the minimum mapping quality (Phred-scaled) for an alignment to be considered "uniquely mapped".
  • Read Alignment Length
    Set this to the original read length. For example, all these cigar strings ("101M", "68M140N33M", "53M1D48M") suggest the read alignment length is 101.
  • Read Sample Rate
    The number of aligned reads will be used to calculate the mismatch and deletion profiles. The default value is 1000000.
  • Minimum Intron Length
    Minimum intron length in base pairs. The default value is 50.
  • Min Reads for Junction Calls
    The minimum number of supporting reads necessary to call a junction. The default value is 1.

Figure 1: Input Page

Figure 2: Configuration Page

Results

  • Table with the main information about all the analyzed samples (Figure 4).
  • Report with specific information about each sample (Figure 3).
  • Charts can be created from the main table side panel.

Charts

In the side panel of the table (Figure 4) there are different action buttons with different chart categories (some examples below):

  • General Charts
    These charts are related to general characteristics.
  • Annotation Based Charts
    These charts are related to the distance of Full Splice Match (FSM) and Incomplete Splice Match (ISM) transcripts to annotated Transcription Start Sites (TSS) and Transcription Termination Sites (TTS).

Figure 3: Report Page

Figure 4: Results Table

Figure 5: Clipping Profile Chart

Figure 6: Deletion profile Chart

Figure 7: Read GC Content Distribution Chart

Figure 8: Read NVC Distribution

Figure 9: Read Quality Chart

Figure 10: Read Duplication Rate

Figure 11: Inner Distance Chart

Figure 12: Junction Annotation Pie Chart

Figure 13: Junction Saturation Chart

Figure 14: Gene Body Coverage Distribution