Time Course Expression Analysis

Introduction

This tool is designed to perform time-course expression analysis of count data arising from RNA-seq technology. Based on the maSigPro program, this application allows the detection of genomic features (e.g. genes) with significant temporal expression changes and significant differences between experimental groups. The software package maSigPro, which belongs to the Bioconductor project, implements a two steps regression strategy to find genes for which there are significant expression profile differences in time course RNA-seq experiments.

Please cite maSigPro as:
Conesa A, Nueda MJ, Ferrer A, Talón M. maSigPro: a method to identify significantly differential expression profiles in time-course microarray experiments. Bioinformatics. 2006 May 1;22(9):1096-102. doi:10.1093/bioinformatics/btl056

Nueda MJ, Tarazona S, Conesa A. Next maSigPro: updating maSigPro bioconductor package for RNA-seq time series. Bioinformatics. 2014;30(18):2598-2602. doi:10.1093/bioinformatics/btu333

**Figure 1:** Time Course Expression Interface

Expression Data

The pairwise differential expression analysis application expects gene expression levels in a count table. In OmicsBox, count tables can be generated via the Create Count Table application.

Count tables can also be imported from a text file. Go to transcriptomics → Load → Load RNA-Seq Count Table (Figure 2) and select your .txt file containing the count table.

Run Analysis

Go to transcriptomics → Differential Expression Analysis. If there’s no count table project opened, the first wizard page (Figure 3) will ask to upload either a Count Table Project (.box file) or a Count Table File (.txt, .csv, or .tsv file). On the second wizard page, choose the "Time Course Expression Analysis" option (Figure 4).

If a count table is already loaded in OmicsBox (see above section), this one will be used to perform the analysis. In this case, the analysis can be run by both clicking on the "Diff. Expression Analysis" in the Side Panel or by going to transcriptomics → Differential Expression Analysis. Now the first wizard page will ask to select the type of differential expression analysis (Figure 4).

In the next pages, it is possible to specify different analysis parameters, which are divided into three distinct sections: Preprocessing Data (Figure 5), Experimental Design (Figure 6), and Analysis Options (Figure 7).

**Figure 4:** Differential Expression Analysis Options wizard page.

Preprocessing Data Page

Filter low count genes:
CPM Filter: Establish a filter to exclude genes with low counts across libraries, as those genes may interfere with the subsequent statistical approximations. Filtering is performed on a count-per-million (CPM) basis to account for differences in library size between samples (e.g. a CPM of 1 corresponds to a count of 6 in a sample with 6 million reads).
Samples reaching CPM Filter: Set a minimum number of samples in which the gene's CPM is above the filter level (is expressed). If this value is set to e.g. five, at least 5 of the samples have to be above the given CPM. The number of samples of the smallest group is usually taken (e.g. in an experiment that has two replicates for each condition (or group), a gene should be expressed in at least two samples). Set value to 0 if no filter is desired.
Normalization procedure:
Normalization Method: Normalization is an important step to make the samples comparable and to remove possible biases (as sequencing depth bias) in count data. You can select the normalization method to be used:
- TMM: Weighted trimmed mean of M-values. In this method, weights are obtained from the delta method on Binomial Data (this method is recommended).
- RPKM: Reads Per Kilobase per Million mapped reads. This method corrects for gene length and the number of sequencing reads (gene length is required).
- Upper-quartile: 75% quantile for the counts for each library is used to calculate the scale factors for normalization.
- None: No normalization method is applied.
- Feature Length File: For RPKM normalization load a tab-delimited file (or ID-Value object) with two columns containing the name and length of each gene or genomic feature.

Experimental Design Page

Experimental design file: Select your .txt file containing your experiment descriptors associated with each sample in tab-delimited format. As shown below, rows correspond to samples and columns to experimental descriptors. A column must contain the associated time points for each sample, and another column should show the assignment of samples to experimental groups. Make sure that the names in the first column of the experimental design table are exactly the same as the sample names in the count table header. If your experimental design file has fewer samples than the count table, only the samples contained in this file will be analyzed.

Click here to expand ...

Sample  Time    Group
B12_A6_06hpi_1  6   A6
B12_A6_06hpi_2  6   A6
B12_A6_06hpi_3  6   A6
B12_A6_12hpi_1  12  A6
B12_A6_12hpi_2  12  A6
B12_A6_12hpi_3  12  A6
B12_A6_18hpi_1  18  A6
B12_A6_18hpi_2  18  A6
B12_A6_18hpi_3  18  A6
B12_A6_24hpi_1  24  A6
B12_A6_24hpi_2  24  A6
B12_A6_24hpi_3  24  A6
B12_K1_06hpi_1  6   K1
B12_K1_06hpi_2  6   K1
B12_K1_06hpi_3  6   K1
B12_K1_12hpi_1  12  K1
B12_K1_12hpi_2  12  K1
B12_K1_12hpi_3  12  K1
B12_K1_18hpi_1  18  K1
B12_K1_18hpi_2  18  K1
B12_K1_18hpi_3  18  K1
B12_K1_24hpi_1  24  K1
B12_K1_24hpi_2  24  K1
B12_K1_24hpi_3  24  K1
pps_A6_06hpi_1  6   A6
pps_A6_06hpi_2  6   A6
pps_A6_06hpi_3  6   A6
pps_A6_12hpi_1  12  A6
pps_A6_12hpi_2  12  A6
pps_A6_12hpi_3  12  A6
pps_A6_18hpi_1  18  A6
pps_A6_18hpi_2  18  A6
pps_A6_18hpi_3  18  A6
pps_A6_24hpi_1  24  A6
pps_A6_24hpi_2  24  A6
pps_A6_24hpi_3  24  A6
pps_K1_06hpi_1  6   K1
pps_K1_06hpi_2  6   K1
pps_K1_06hpi_3  6   K1
pps_K1_12hpi_1  12  K1
pps_K1_12hpi_2  12  K1
pps_K1_12hpi_3  12  K1
pps_K1_18hpi_1  18  K1
pps_K1_18hpi_2  18  K1
pps_K1_18hpi_3  18  K1
pps_K1_24hpi_1  24  K1
pps_K1_24hpi_2  24  K1
pps_K1_24hpi_3  24  K1

Analysis Options

Design Type: Choose the design type to adjust the analysis.
Single Series Time Course: Detects genes that show significant expression changes over time. You only have to select the time factor of your experimental design in "Targets".
Multiple Series Time Course: Find genes with significant temporal expression changes and significant differences between experimental groups. You have to establish the time and experimental factors, and select the control condition of your experimental design in "Targets".
Statistical Settings:
Significance Level (Alfa): The level of FDR control used for variable selection in the stepwise regression.
R-squared Cutoff: Cutoff value for the R-squared of the regression model.
Visualization of Results:
Number of Clusters: Establish a number of clusters to group genes by similar expression profiles.
Clustering Method: Choose a clustering method for data partitioning.
- Hierarchical Clustering: Performs a hierarchical cluster analysis using a set of dissimilarities for the features being clustered.
- K-Means Clustering: This is intended to divide the points into K clusters such that the sum of squares of the points to the centers of the clusters assigned is minimized.
- Model-Based Clustering: The optimal model according to BIC for EM is initialized by hierarchical clustering for Gaussian mixture models. This method computes an optimal number of clusters. Keep in mind that this method requires more time.

Results

Once the input counts have been processed and analyzed via the "Time Course Expression Analysis" tool, a new tab is opened containing statistical results obtained by the stepwise regression statistical test (Figure 8):

Tags: genes labeled with a tag are found significantly expressed over time and/or between conditions (R-squared ≥ R-squared Cutoff).
Cluster tags. There will be as many cluster tags as the number specified in the "Number of Clusters" parameter. Each cluster is assigned a different color. Genes found significantly expressed over time are then grouped by their expression profile by the clustering algorithm. Thus, genes labeled with i.e. "Cluster-1" tag, change their expression over time and, moreover, with a similar trend.
Condition tags. Only available when the "Multiple Series Time Course" option is chosen. Genes labeled with this tag are found to be differentially expressed between conditions. The condition tags are always red and the text is the comparison made. For example, if we have three conditions A (control), B, and C, we will have two read condition tags: "AvsB" and "AvsC".
Feature: feature or gene name.
P-Value: if it’s significant, it indicates that the gene expression changes over time.
P-Value_beta0: if it’s significant, it means that the gene’s expression at time point 0 is different from 0.
P-Value_Time: if it’s significant, it means that the gene expression follows a linear trend, especially at the beginning. That is, that it increases or decreases linearly.
P-Value_Time2: if it’s significant, that means that the gene expression profile has a curvature. That is, it changes the expression behavior at some point (i.e. first the expression increases and then decreases).
R-squared: how well the data fits the model obtained for that gene’s expression.

For the "Multiple Series Time Course", additional p-values are calculated, one for each control vs condition combination. For two conditions named "A" (control) and "B" :

P-Value_BvsA: if it’s significant, it means that the gene’s expression at time point 0 in condition B is different than in A.
P-Value_TimexB: if it’s significant, it means that the linear gene expression is different between conditions A and B. That is, that the gene expression in one condition increases or decreases more than in the other condition.
P-Value_Time2xB: if it’s significant, that means that the change in expression is different for A and B (i.e. in condition A it increases and then decreases, but in condition B it never decreases).

Only the genes that have passed the established Significance Level are shown in the new tab. For further details please refer to the maSigPro User's Guide.

There could be missing p-values. That means that this characteristic is not significant in the gene expression profile. So it is not considered for constructing the gene’s expression model, and thus its value is not stored.

Additionally, a report will be open that shows a summary of the time-course expression analysis results, including the cluster of features with similar expression profiles (Figure 9).

Side Panel Options

Actions

Summary Report

Generate a summary of the results as shown in Figure 9.

Rename Features

This option allows modifying the sequence IDs in the Feature column using different methods:

Add: Add a prefix or sufix to all IDs in the table.
Replace: Replace specific text within the IDs. The text to be replaced must be defined in the Find parameter using a regular expression (regex).
Mapping: Use a mapping file to rename features. The mapping file must be a tab-separated text file with two columns: the first column contains the original feature IDs from the dataset, and the second column contains the new feature names. If duplicate IDs occur during renaming, you can define how they are handled:
- Sum Rows: Combine counts for all matching features.
- First Row: Retain only the counts of the first occurrence.

Fisher’s Exact Test

Fisher’s Exact Test (FET) can be used to find biological functions (represented by GO terms or other annotations) over and under-represented in a set of genes (test set) with respect to a reference group (reference set). Roughly speaking, it tests if the proportion of genes annotated with a specific biological function in the test set is significantly higher or lower than the proportion in the reference set. For more details about the analysis and the results, please visit the Fisher’s Exact Test section on the Functional Analysis module. In this case, the test set is made with the genes labeled with a certain tag(s).

Test Set.
Groups. Select the tags to test for functional enrichment. The genes labeled with the selected tag(s) will be used as the test set.
Reference Set.
Remaining Genes. If checked, use the remaining genes in the count table, that are not part of the test set, as the reference set.
Groups. Only available if the "Remaining Genes" option is unchecked. The genes labeled with the selected tag(s) will be used as the reference set. They can’t overlap with the tags selected in the test set.

If a gene is present both in the test and in the reference sets, it will be removed from the reference.

**Figure 10**: Fisher’s Exact Test wizard from timecourse results.

Charts

Different statistics charts can be generated for a global visualization of the results. These charts can be found under the Side Panelof the TimeCourse Results viewer.

MDS Plot

Generates a two-dimensional scatterplot in which the distances represent the typical log2 fold changes between samples. You can select an experimental factor by which you want to color the MDS graphic (Figure 11).

Experiment-wide Expression Profiles

Plot showing the expression level levels across samples for each cluster of genes (Figure 12).

**Figure 12:** Experiment-wide Expression Profile

Summary Expression Profiles

Plot showing the median level expression of each cluster of genes across time (Figure 13).

**Figure 13:** Summary Expression Profile

Export

Besides the generic Export Table, this object contains the following export options.

Export Raw Counts

Export the raw counts to a text file. It will not contain the genes discarded during the filtering step.

Export Normalized Counts

Export the normalized counts to a text file. It will not contain the genes discarded during the filtering step.

Export Experimental Desing

Esport the experimental design to a tab-separated file. The first column will contain the samples, whereas the rest will be the experimental factors.

Besides the generic context menu options, the available actions for this object depend on whether one or multiple rows are selected.

With one row selected:

Show Expression Profile: Generates a plot with the average expression across samples of the given gene over time (Figure 14).

With multiple rows selected:

Extract Selection to New Tab: Extract the data from the selected rows and open it in a new tab.