Combining Transcriptomes with TAMA Merge

Introduction

TAMA Merge is a computational tool which can merge multiple transcriptomes into a combined one. It does this by merging transcript models within a given range of similarity. This tool may be useful for the following purposes:

Extending a reference annotation with novel transcript models, e.g. as defined by FLAIR.
Comparing two transcriptomes by merging them, thereby creating a joint set of transcript model identities.
On data sets with biological replicates, merging multiple transcriptomes defined on individual samples into a common transcriptome ("Call & Join" approach to isoform identification for biological replicates).

Please cite TAMA Merge as:

Kuo, Richard I., et al. "Illuminating the dark side of the human transcriptome with long read transcript sequencing." BMC genomics 21 (2020): 1-22.

TODO: add images

Run TAMA Merge

TAMA Merge can be found under Transcriptomics → Long-Reads Analysis → Combining Transcriptomes with TAMA Merge. The wizard consists of 3 pages and facilitates the definition of the input and output options as well as the merging parameters.

Input

TAMA Merge receives the following inputs:

Transcriptome: Transcript annotations in BED12 or GTF format. At least one file needs to be provided.
Give Priority to…: Out of the provided input files, one may be chosen as a to give priority to. Doing this will give its transcription start and end sites, as well as splice sites priority over the other files. This is recommended e.g. for combining a reference transcriptome annotation with a custom generated one. It will also cause the transfer of gene and transcript IDs into the merged transcriptome. If "None" is selected here, all provided transcriptomes will have equal priorities.
Note that supplied BED12 files have to use the following format in their 4th column ("name"): "gene_id;transcript_id". As this format is not very commonly used, we generally recommend the use of .gtf files as inputs, which will automatically be converted into a suitable .bed file.
Though generally intended to merge multiple transcriptomes, running TAMA Merge on only one transcriptome is also possible and may make sense if the goal is to collapse similar transcript models.

Algorithm Options

This page provides some more detailed options to configure the algorithm:

Capped: Defines whether transcript start sites in the provided transcriptomes can be trusted, or whether shorter transcripts should always be merged into longer transcripts. This is generally recommended for merging transcriptomes created from tools such as FLAIR or IsoQuant, as these already implement their own logic for determining transcription start sites.
Exon Ends: Whether the last exons (start and end) of transcript models should be chosen based on the most common or the longest exon.
5' Threshold: The threshold in base pairs for the five prime end within which transcript models should be merged.
Splice Junction Threshold: The threshold in base pairs for the splice junctions of transcript models.
3' Threshold: The threshold in base pairs for the three prime end within which transcript models should be merged.

Note that transcript models within one transcriptome which fall within the given thresholds will also be combined, even for the selected reference.

Output

Output File Prefix: Set a name which will serve as a prefix for all output files.
Output Directory: Define a directory (existing or new) in which to save the output files.
Save Merged Transcriptome as .bed: Whether to save the merged transcriptome as a .bed file.
Save Merged Transcriptome as .gtf: Whether to save the merged transcriptome as a .gtf file.

Results

TAMA Merge has the following outputs:

Merged Transcriptome as .gtf and/or .bed file: The main output of TAMA Merge is the set of transcript annotations produced by the merge.
Summary Report with information on the number of genes as well as the number of transcripts before and after the merging process.
Merge Report .txt file which maps the transcript IDs of the input files to the transcript IDs in the merged transcriptome.
Gene Report .txt file which contains information about each gene, e.g. how many transcripts it had before and after the merge.
Transcript Report .txt file which contains information about each transcript, e.g. which source transcripts from which files were merged to create it.