Time Course Expression Analysis
Introduction
This tool is designed to perform time-course expression analysis of count data arising from RNA-seq technology. Based on the maSigPro program, this application allows the detection of genomic features (e.g. genes) with significant temporal expression changes and significant differences between experimental groups. The software package maSigPro, which belongs to the Bioconductor project, implements a two steps regression strategy to find genes for which there are significant expression profile differences in time course RNA-seq experiments.
Expression Data
The pairwise differential expression analysis application expects gene expression levels in a count table. In OmicsBox, count tables can be generated via the Create Count Table application.
Count tables can also be imported from a text file. Go to transcriptomics → Load → Load RNA-Seq Count Table (Figure 2) and select your .txt file containing the count table.
Run Analysis
Go to transcriptomics → Differential Expression Analysis. If there’s no count table project opened, the first wizard page (Figure 3) will ask to upload either a Count Table Project (.box file) or a Count Table File (.txt, .csv, or .tsv file). On the second wizard page, choose the "Time Course Expression Analysis" option (Figure 4).
If a count table is already loaded in OmicsBox (see above section), this one will be used to perform the analysis. In this case, the analysis can be run by both clicking on the "Diff. Expression Analysis" in the Side Panel or by going to transcriptomics → Differential Expression Analysis. Now the first wizard page will ask to select the type of differential expression analysis (Figure 4).
In the next pages, it is possible to specify different analysis parameters, which are divided into three distinct sections: Preprocessing Data (Figure 5), Experimental Design (Figure 6), and Analysis Options (Figure 7).
Preprocessing Data Page
-
Filter low count genes:
-
CPM Filter: Establish a filter to exclude genes with low counts across libraries, as those genes may interfere with the subsequent statistical approximations. Filtering is performed on a count-per-million (CPM) basis to account for differences in library size between samples (e.g. a CPM of 1 corresponds to a count of 6 in a sample with 6 million reads).
- Samples reaching CPM Filter: Set a minimum number of samples in which the gene's CPM is above the filter level (is expressed). If this value is set to e.g. five, at least 5 of the samples have to be above the given CPM. The number of samples of the smallest group is usually taken (e.g. in an experiment that has two replicates for each condition (or group), a gene should be expressed in at least two samples). Set value to 0 if no filter is desired.
-
Normalization procedure:
-
Normalization Method: Normalization is an important step to make the samples comparable and to remove possible biases (as sequencing depth bias) in count data. You can select the normalization method to be used:
- TMM: Weighted trimmed mean of M-values. In this method, weights are obtained from the delta method on Binomial Data (this method is recommended).
- RPKM: Reads Per Kilobase per Million mapped reads. This method corrects for gene length and the number of sequencing reads (gene length is required).
- Upper-quartile: 75% quantile for the counts for each library is used to calculate the scale factors for normalization.
- None: No normalization method is applied.
- Feature Length File: For RPKM normalization load a tab-delimited file (or ID-Value object) with two columns containing the name and length of each gene or genomic feature.
Experimental Design Page
- Experimental design file: Select your .txt file containing your experiment descriptors associated with each sample in tab-delimited format. As shown below, rows correspond to samples and columns to experimental descriptors. A column must contain the associated time points for each sample, and another column should show the assignment of samples to experimental groups. Make sure that the names in the first column of the experimental design table are exactly the same as the sample names in the count table header. If your experimental design file has fewer samples than the count table, only the samples contained in this file will be analyzed.
Click here to expand ...
Sample Time Group
B12_A6_06hpi_1 6 A6
B12_A6_06hpi_2 6 A6
B12_A6_06hpi_3 6 A6
B12_A6_12hpi_1 12 A6
B12_A6_12hpi_2 12 A6
B12_A6_12hpi_3 12 A6
B12_A6_18hpi_1 18 A6
B12_A6_18hpi_2 18 A6
B12_A6_18hpi_3 18 A6
B12_A6_24hpi_1 24 A6
B12_A6_24hpi_2 24 A6
B12_A6_24hpi_3 24 A6
B12_K1_06hpi_1 6 K1
B12_K1_06hpi_2 6 K1
B12_K1_06hpi_3 6 K1
B12_K1_12hpi_1 12 K1
B12_K1_12hpi_2 12 K1
B12_K1_12hpi_3 12 K1
B12_K1_18hpi_1 18 K1
B12_K1_18hpi_2 18 K1
B12_K1_18hpi_3 18 K1
B12_K1_24hpi_1 24 K1
B12_K1_24hpi_2 24 K1
B12_K1_24hpi_3 24 K1
pps_A6_06hpi_1 6 A6
pps_A6_06hpi_2 6 A6
pps_A6_06hpi_3 6 A6
pps_A6_12hpi_1 12 A6
pps_A6_12hpi_2 12 A6
pps_A6_12hpi_3 12 A6
pps_A6_18hpi_1 18 A6
pps_A6_18hpi_2 18 A6
pps_A6_18hpi_3 18 A6
pps_A6_24hpi_1 24 A6
pps_A6_24hpi_2 24 A6
pps_A6_24hpi_3 24 A6
pps_K1_06hpi_1 6 K1
pps_K1_06hpi_2 6 K1
pps_K1_06hpi_3 6 K1
pps_K1_12hpi_1 12 K1
pps_K1_12hpi_2 12 K1
pps_K1_12hpi_3 12 K1
pps_K1_18hpi_1 18 K1
pps_K1_18hpi_2 18 K1
pps_K1_18hpi_3 18 K1
pps_K1_24hpi_1 24 K1
pps_K1_24hpi_2 24 K1
pps_K1_24hpi_3 24 K1
Analysis Options
-
Design Type: Choose the design type to adjust the analysis.
-
Single Series Time Course: Detects genes that show significant expression changes over time. You only have to select the time factor of your experimental design in "Targets".
- Multiple Series Time Course: Find genes with significant temporal expression changes and significant differences between experimental groups. You have to establish the time and experimental factors, and select the control condition of your experimental design in "Targets".
-
Statistical Settings:
-
Significance Level (Alfa): The level of FDR control used for variable selection in the stepwise regression.
- R-squared Cutoff: Cutoff value for the R-squared of the regression model.
-
Visualization of Results:
-
Number of Clusters: Establish a number of clusters to group genes by similar expression profiles.
-
Clustering Method: Choose a clustering method for data partitioning.
- Hierarchical Clustering: Performs a hierarchical cluster analysis using a set of dissimilarities for the features being clustered.
- K-Means Clustering: This is intended to divide the points into K clusters such that the sum of squares of the points to the centers of the clusters assigned is minimized.
- Model-Based Clustering: The optimal model according to BIC for EM is initialized by hierarchical clustering for Gaussian mixture models. This method computes an optimal number of clusters. Keep in mind that this method requires more time.
Results
Once the input counts have been processed and analyzed via the "Time Course Expression Analysis" tool, a new tab is opened containing statistical results obtained by the stepwise regression statistical test (Figure 8):
-
Tags: genes labeled with a tag are found significantly expressed over time and/or between conditions (R-squared ≥ R-squared Cutoff).
-
Cluster tags. There will be as many cluster tags as the number specified in the "Number of Clusters" parameter. Each cluster is assigned a different color. Genes found significantly expressed over time are then grouped by their expression profile by the clustering algorithm. Thus, genes labeled with i.e. "Cluster-1" tag, change their expression over time and, moreover, with a similar trend.
- Condition tags. Only available when the "Multiple Series Time Course" option is chosen. Genes labeled with this tag are found to be differentially expressed between conditions. The condition tags are always red and the text is the comparison made. For example, if we have three conditions A (control), B, and C, we will have two read condition tags: "AvsB" and "AvsC".
- Feature: feature or gene name.
- P-Value: if it’s significant, it indicates that the gene expression changes over time.
- P-Value_beta0: if it’s significant, it means that the gene’s expression at time point 0 is different from 0.
- P-Value_Time: if it’s significant, it means that the gene expression follows a linear trend, especially at the beginning. That is, that it increases or decreases linearly.
- P-Value_Time2: if it’s significant, that means that the gene expression profile has a curvature. That is, it changes the expression behavior at some point (i.e. first the expression increases and then decreases).
- R-squared: how well the data fits the model obtained for that gene’s expression.
For the "Multiple Series Time Course", additional p-values are calculated, one for each control vs condition combination. For two conditions named "A" (control) and "B" :
- P-Value_BvsA: if it’s significant, it means that the gene’s expression at time point 0 in condition B is different than in A.
- P-Value_TimexB: if it’s significant, it means that the linear gene expression is different between conditions A and B. That is, that the gene expression in one condition increases or decreases more than in the other condition.
- P-Value_Time2xB: if it’s significant, that means that the change in expression is different for A and B (i.e. in condition A it increases and then decreases, but in condition B it never decreases).
Only the genes that have passed the established Significance Level are shown in the new tab. For further details please refer to the maSigPro User's Guide.
There could be missing p-values. That means that this characteristic is not significant in the gene expression profile. So it is not considered for constructing the gene’s expression model, and thus its value is not stored.
Additionally, a report will be open that shows a summary of the time-course expression analysis results, including the cluster of features with similar expression profiles (Figure 9).
Side Panel Options
Actions
Summary Report
Generate a summary of the results as shown in Figure 9.
Fisher’s Exact Test
Fisher’s Exact Test (FET) can be used to find biological functions (represented by GO terms or other annotations) over and under-represented in a set of genes (test set) with respect to a reference group (reference set). Roughly speaking, it tests if the proportion of genes annotated with a specific biological function in the test set is significantly higher or lower than the proportion in the reference set. For more details about the analysis and the results, please visit the Fisher’s Exact Test section on the Functional Analysis module. In this case, the test set is made with the genes labeled with a certain tag(s).
-
Test Set.
-
Groups. Select the tags to test for functional enrichment. The genes labeled with the selected tag(s) will be used as the test set.
-
Reference Set.
-
Remaining Genes. If checked, use the remaining genes in the count table, that are not part of the test set, as the reference set.
- Groups. Only available if the "Remaining Genes" option is unchecked. The genes labeled with the selected tag(s) will be used as the reference set. They can’t overlap with the tags selected in the test set.
If a gene is present both in the test and in the reference sets, it will be removed from the reference.
Charts
Different statistics charts can be generated for a global visualization of the results. These charts can be found under the Side Panelof the TimeCourse Results viewer.
MDS Plot
Generates a two-dimensional scatterplot in which the distances represent the typical log2 fold changes between samples. You can select an experimental factor by which you want to color the MDS graphic (Figure 11).
Experiment-wide Expression Profiles
Plot showing the expression level levels across samples for each cluster of genes (Figure 12).
Summary Expression Profiles
Plot showing the median level expression of each cluster of genes across time (Figure 13).
Export
- Export Raw Counts. Export the original count matrix.
- Export Normalized Counts. During the Time Course Expression Analysis, raw counts are transformed according to the normalization method selected in the analysis configuration. This tool exports the count table with normalized values.
- Export Table. Export the information contained in the main results table into a text file.
Context Menu Options
Expression Profile by Gene
Graph of gene expression profiles over time for a particular gene (Figure 14). It is possible to see them by right-clicking on the chosen gene and selecting the "Show Expression Profile" option.