Single Cell RNA-Seq Trajectory Inference

Introduction

Monocle3 is a scRNA-Seq data analysis toolkit developed by Trapnell lab, mainly used for Trajectory Inference analysis. Trajectory inference analysis aims to reconstruct the developmental trajectory of single cells, mapping out their developmental paths or states. The cells are stratified with the "Pseudotime", which measures the progression of individual cells along some biological processes. The essential input required for trajectory inference analysis is the knowledge of starting points or root cells. Please cite Monocle3 as:

Qiu, Xiaojie, et al. "Reversed Graph Embedding Resolves Complex Single-Cell Trajectories." Nature Methods, vol. 14, no. 10, 21 Aug. 2017, pp. 979–982, 10.1038/nmeth.4402.

Qiu et al."Single-Cell mRNA Quantification and Differential Analysis with Census." Nature Methods, vol. 14, no. 3, 23 Jan. 2017, pp. 309–315, 10.1038/nmeth.4150

Trapnell, Cole, et al. "Pseudo-Temporal Ordering of Individual Cells Reveals Dynamics and Regulators of Cell Fate Decisions." Nature Biotechnology, vol. 32, no. 4, 1 Apr. 2014, pp. 381–386, 0.1038/nbt.2859.

**Figure 1.** Monocle3 Wizard in OmicsBox

Accessing Monocle3 in Omics Box

One of the essentials to perform trajectory analysis is knowing the starting or root cells. Therefore, the Monocle3 Trajectory Inference wizard is available right after the clustering analysis. After completing the scRNA-Seq Clustering, click "Trajectory Analysis" on the side panel to initiate Monocle3 (refer to Figure 2).

**Figure 2**. Trajectory Inference with Monocle3, available as the side panel option of scRNA-Seq Clustering output (After Seurat Clustering Analysis)

Select Starting Points (Root Cells) of the Trajectory

Select the root node (a collection of root cells) for Monocle3, as it serves as the reference point for trajectory construction. OmicsBox offers two methods to provide this information:

Cell Metadata: The Monocle3 wizard automatically reads the cell metadata information from the clustering results.
Metadata Group: The options available will display all potential columns with root information based on the information in the cell metadata.
Root Cells: After selecting the column with potential starting point information, decide on the actual starting point. If you're considering experimental time, the starting point might be the initial capture time (0h) or something similar. Alternatively, it could be a specific cell type, like a hematopoietic stem cell or Seurat’s cluster-x.
List of Root Cells: Supply a text file with the list of cell IDs as root cells (with one name per line). These cells act as the starting points for the trajectory.

**Figure 3.** Selection of starting points for the trajectory analysis

Configuration 1. Data Pre-processing

Transformations: Monocle3 can perform transformations independently or use the transformation applied during Seurat’s clustering analysis. When "Raw counts" is selected, the following options are available to fine-tune normalization and feature selection.
Normalization Method: Normalization aims to minimize non-biological variation. Two options are available: log-normalization and size-factor normalization. Log normalization standardizes data, which is especially useful for columns with high variance. Size-factor normalization removes bias from each cell by dividing its counts by size factor. The user can also skip the normalization by selecting "none".
Principal component analysis (PCA): This classic dimension reduction method creates linear combinations of gene expressions termed as principal components (PCs). These PCs, orthogonal to each other, effectively capture the gene expression variation and often have a lower dimensionality.
1. Dimensions: This refers to the number of dimensions post-PCA. Selecting the top 50 principal components for datasets exceeding 5,000 cells is advisable.
2. Scaling: Scaled data facilitates model learning. Scaling before PCA computation is beneficial when dealing with variables in different units.
3. Embeddings: Monocle3 can recompute the UMAP embeddings from scratch or use Seurat’s UMAP/t-SNE embeddings for trajectory analysis. When "Re-Compute UMAP" is selected, the following options are enabled to compute the UMAP embeddings.
UMAP Minimum Distance: This parameter dictates UMAP's cell clustering tightness. Low values result in dense cell clusters, while higher values emphasize preserving broad topological structure.
UMAP Neighbours: This balances local versus global structures. Lower values direct UMAP to concentrate on local structures, while higher values emphasize a broader view, potentially sacrificing fine details.

**Figure 4.** Parameter tuning for Monocle3-based trajectory analysis.

Configuration 2. Clustering and Trajectory Control

Clustering:

Clustering of cells during the trajectory analysis significantly reduces the computational complexity of learning the trajectory graph. In a trajectory, clustering represents the stable checkpoints (cell-states) in a biological process. Monocle3 in the OmicsBox can perform its own clustering or use the clusters inferred during Seurat’s clustering analysis.

Cluster Method: Select the algorithm for clustering. If "Seurat-Clusters" is selected, the cluster labels from Seurat clustering results are transferred to the Monocle3. The other options are "Louvain Clustering" and "Leiden Clustering", for which the following parameters are available:
Nearest Neighbours: Set the number of expected clusters (k-clusters).
Resolution: Set the resolution clustering. A higher value generates a larger number of smaller clusters, and a lower value generates a smaller number of larger clusters.

Fine Tune Trajectory:

Allow Disjoint Graph: Activating allows merging different partitions into a single trajectory. Otherwise, distant partitions are allotted "Infinite" pseudotime.
Allow loops: Whether to look for potential cyclic trajectories within the data.
Number of Centers: This will determine the expected centres in a trajectory.
Prune Branches: Whether to remove branches that do not meet the specific length criteria.
Minimum Branching Length: Set the minimum length of the branch (number of centres in a branch).

**Figure 4.** Parameter tuning for Monocle3-based trajectory analysis and clustering.

Side Panel Actions

You can see the side panel actions after completing the analysis and obtaining the results. The currently available side panel options include:

UMAP/t-SNE: This will open up an interactive wizard.
Add Cell Metadata: Add information per cell.
Summary Report: Produces a summary of the analysis.
Extract Count: Retrieves the count of different Pseudotime Ranges.
Autocorrelation: Analyze differences in the gene expression along the trajectory.
Differential Expression: Analyze the differences in gene expression.

**Figure 5.** Side Panel Action after the Monocle3 trajectory analysis in OmicsBox

Actions: UMAP/t-SNE

Opens UMAP/t-SNE wizard for intuitive analysis of the results.

**Figure 6.** Interactive UMAP/t-SNE Plots for intuitive interpretation of the results.

Actions: Summary Report

Generates a summary report of the analysis.

**Figure 7.** Detailed summary of the results and the parameters used during the analysis.

Actions: Extract Cluster Counts

Group: Select the group for which you want to extract the counts. It includes all the columns of the cell-level metadata describing different attributes of the cells.
Subgroup(s) to Extract: Select the subgroups (e.g., a specific cell type) from which to extract the counts.

**Figure 8.** OmicsBox wizard for the extraction of raw counts from Monocle3 results.

Actions: Autocorrelation Analysis

Monocle3 provides a way of finding genes that vary between groups of cells in UMAP space. It uses a statistic from spatial autocorrelation analysis called Moran's I, which Cao & Spielmann et al. showed effective in finding genes that vary in single-cell RNA-seq datasets. Visit Monocle3 Autocorrelation Analysis using OmicsBox to learn more.

**Figure 9.** OmicsBox wizard for Monocle3 Autocorrelation Analysis.

Actions: Differential Expression

For differential expression analysis, refer to the single-cell differential expression tutorial. In this context, the pseudotime range labels are used instead of Cluster labels.

**Figure 10.** OmicsBox wizard for Monocle3 Differential Expression Analysis.

Side Panel Charts

Expression Trends: Charts the trends in gene expression.
Distribution of Cell in Pseudotime: Displays how cells are distributed across different pseudotime values.

**Figure 11.** Options for data visualization after trajectory analysis in OmicsBox

Expression Trend (Side-Panel Charts)

Gene ID/ Name: Choose the feature or gene for which you want to plot the trend.
Scaling: Large variations in counts can sometimes obscure finer details. Scaling adjusts the data range to highlight these subtleties.
Log Transform: If a dataset contains minimal differences among large values, applying a log transformation can magnify these variations, making them more explicit. The process adds a pseudo-count of 0.5 to raw counts before log transformation.
Smoothness: Modulate the trend line's smoothness to fit your preference.
Colour Cells By: colour the cells based on inferred clusters or partitions from Monocle3.

**Figure 12.** Wizard for plotting gene/feature expression trends along pseudotime using monotonic spline in OmicsBox

Output

Monocle3 in OmicsBox delivers several outputs aligned with a conventional trajectory analysis. These outputs comprise a primary output table, three key plots, and a succinct report detailing the parameters used and the results obtained.

Main Output Table with Pseudotime Information: This table provides detailed pseudotime data for the analyzed cells.
Trajectory UMAP: A visualization showing the trajectory of cells using the UMAP (Uniform Manifold Approximation and Projection) technique.
Expression UMAP: A plot that illustrates how the expression of a particular gene varies across cells in UMAP embeddings.
Expression Trends: Displays the trends in gene expression across pseudotime.
Distribution of Cells Over Pseudotime Ranges: This visualization depicts how cells are spread across various pseudotime values.

Output Table Fields:

Cell: The names of the cells are provided in the count table and experimental design file.
Pseudotime: The pseudotime assigned to each cell by Monocle3. Cells not allocated a pseudotime will not have a value in this column.
Pseudotime Range: This field represents the clusters of pseudotime. For cells without an assigned pseudotime, it will explicitly state so. The intervals for these ranges are left-closed (right-open).
Cluster: The clusters to which the cells have been assigned.
Partition: This refers to the assigned super-cluster or partition.

Results (Trajectory UMAP)

The Trajectory UMAP is a visualization that combines a UMAP coloured by the continuum of pseudotime with a superimposed line graph. This line graph represents the overall progression pattern among the cells. Using the pseudotime slider, users can focus on cells within a particular pseudotime range. If a cell hasn't been assigned a pseudotime, the visualization will display the progression line without any coloured cells associated with that specific cell. Additionally, it offers an interactive selection of cells, allowing users to select a starting cell and run trajectory analysis interactively.

**Figure 11.** Interactive Trajectory UMAP of Monocle3 in OmicsBox

Expression Trend (Side-Panel Charts)

The expression trend plots the expression of a chosen feature gene per cell against its pseudotime using the monotonic spline interpolation method. This visualization offers insights into the expression trends of a specific gene feature along the pseudotime in different cell clusters or partitions.

**Figure 12.** Expression trend of the selected feature in OmicsBox

Results (Distribution of Cells Across Pseudotime Range) (Side-Panel Charts)

This visualization displays the distribution of cells based on their pseudotime range, showcasing the number of cells within each specific range. By correlating this distribution with cell type annotations, one can identify progenitor cell types or intermediate cell states, which often possess lower pseudotime values. The intervals for these ranges are defined as left-closed (right-open).