Subcellular Localization Prediction with PSORTb

Introduction

The PSORT principle uses the amino acid sequence information to generate an overall prediction of the protein localization sites. These rules are derived from experimental observations. For example, when analysing a gram-negative organism, possible localization sites are cytoplasm, cytoplasmic membrane, periplasm, outer membrane, and extracellular space.

OmicsBox allows assigning sub-cellular localization sites to proteins based on their amino acid sequence via PSORTb. PSORTb is an algorithm that can be applied to bacteria or archaea protein sequences and uses a probabilistic system to predict the most probable localization. Once sites are predicted, their corresponding cellular component GO terms can be merged with the already existing annotations.

Run

Starting with a previously loaded .box/.b2g project with PROTEIN sequences, the PSORTb tool can be found under Functional Analysis → Subcellular Localization Prediction with PSORTb.

If the loaded project contains nucleotide sequences, the "Translate Longest ORF" tool can help to obtain the predicted protein sequences and be able to run PSORTb.

**Figure 1.** Run PSORTb in the Functional Analysis menu.

Wizard and parameters

The wizard allows adjusting the algorithm parameters (Figure 2).

It performs different analyses depending on the Organism Type and the Gram Stain. It can be used with bacteria positive and negative gram strains or archaea organism sequences. For more details of the core algorithm, visit psortb.org.

The algorithm returns score values between 0 and 10 for each localization site, the Cutoff parameter allows setting a minimum value of each localization above which the value can be considered as possible localization.

**Figure 2.** PSORTb wizard where the user can adjust the parameters.

Results

The tool will iterate over the input sequences and analyze each of them with the PSORTb 3. The process will open a new tab and as the results come back, they are shown in a table format.

The table contains one row for each sequence. The table columns are:

Sequence name: shows each sequence identifier.
Final localization: contains the predicted localization name.
Final score: represents the prediction score for the localization.
GO ID: the Gene Ontology ID associated to the location.
Secondary Localization: a possible secondary localization when there is more than one score above the cutoff.
The next 6 columns, hidden by default, show the score for all possible localizations.

Merge GO information

The GO IDs from the prediction can be merged into the original Blast2GO project as cellular component characterization of the sequences.

The merge option is available in the right-side panel of the PSORTb results (Figure 3).

The merge wizard asks for the OmicsBox project file where to merge the GO results and will add the GO information to the project, matching the Sequence Name. Note: The initial OmicsBox project must be saved as a file before running the Merge GOs option.

For more information regarding PSORTb, visit the psortb.org documentation page.