Skip to content

Single-cell RNA-Seq Filtering

Introduction

In addition to filtering low-quality reads like in bulk RNA-seq analysis, it is important to filter low-quality cells in Single-cell RNA-Seq (scRNA-Seq) datasets. This can be easily achieved by filtering cells based on their expression. This approach is based on the following assumptions:

  • Cells with a comparatively higher total number of counts could correspond to doublets or multiplets, that is, groups of two or more cells that have been sequenced together.
  • Cells with a comparatively lower number of counts may have resulted from sample preparation or sequencing artifacts. For example, broken cells that have lost part of their RNA content, cells for which the mRNA capture efficiency has been low, cells with a low PCR amplification during sequencing, etc. In any case, these cells in the dataset don’t reflect their original mRNA content.
  • Cells with a high percentage of total counts that correspond to mitochondrial RNA (mtRNA) could correspond to dying cells that began apoptotic processes. Additionally, this feature could be indicative of broken cells. This is because cells with broken cellular membranes could still conserve the mtRNA inside the mitochondrial membrane, so the percentage of mtRNA is greater compared to intact cells.
  • Cells with a low number of detected features (genes) could have their mRNA content damaged. Sequencing reads obtained from this damaged mRNA can be different from the reference genome, which is used during quantification. This will ultimately cause a low mapping rate during quantification, causing a low number of genes to be detected.

Violin Plots are widely used by single-cell data analysts to determine cells meeting these characteristics (Figure 1). In this plot, each dot represents a cell, the position on the Y-axis is the value for the measured statistic, and the position on the X-axis is random. The shape of the violin plot represents the density of cells for a given value, thus it gives a general idea of the value distribution. Additionally, a bow plot is shown on the top of the violin. The line in the middle shows the median value, and the top and bottom lines of the box show the Q1 and Q3 of the distribution, respectively. The dots beyond the whiskers are considered outliers. Thus, in order to identify, for example, cells with a high number of counts in Figure 1, we could establish a threshold in the area where the violin narrows down. Or if we would like to be more restrictive, we could establish the threshold in the whiskers of the boxplot.

image-20240108-151129.png

Figure 1. Violin plot showing the distribution of the total number of counts per cell in a count table. Red lines show suggested thresholds for cell filtering.

Run Single Cell RNA-Seq Quantification

Input

In order to perform the Single Cell Filtering, a Count Matrix object has to be opened. It can be loaded from different formats by going to transcriptomicsLoadSingle Cell RNA-Seq Count Matrix or generated from FASTQ sequencing files with the Single Cell RNA-Seq Quantification tool available in transcriptomicsSingle Cell RNA-SeqSingle Cell RNA-Seq Quantification.

Once loaded, go to the Side Panel → Actions → Filtering (Figure 2).

image-20240108-153841.png

Figure 2. Launch filtering from a scRNA-Seq Count Matrix Side Panel.

Configuration

The following filters can be applied in the wizard (Figure 3). Default values are taken from the count table.

  • Minimum Cells. Include features (genes, exons, etc.) detected in at least this number of cells. This filter is meant to exclude features that are not very informative. It removes rows from the count table.
  • Minimum Counts. Discard cells with less than this number of reads/counts.
  • Maximum Counts. Discard cells with more than this number of reads/counts.
  • Minimum Features. Discard cells with less than this number of features.
  • Maximum Features. Discard cells with more than this number of features.
  • Maximum % Mitochondrial Genes. Discard cells with more than this percentage of mitochondrial genes.

  • Mitochondrial Genes File. File with a list of mitochondrial genes, one per line. The gene IDs or names present in the mitochondrial genes file must correspond to the ones used in the count table.

Output

The output is a new scRNA-Seq object as shown in Figure 2 with cells filtered out.

image-20240108-154508.png

Figure 3. scRNA-Seq Filtering wizard.