Functional Annotation with EggNOG

EggNOG-Mapper

Eggnog-mapper is a tool for fast functional annotation of novel sequences (genes or proteins) using precomputed eggNOG-based orthology assignments. Obvious examples include the annotation of novel genomes, transcriptomes or even metagenomic gene catalogs. The use of orthology predictions for functional annotation is considered more precise than traditional homology searches, as it avoids transferring annotations from paralogs (duplicate genes with a higher chance of being involved in functional divergence). (Figures 1 and 2)

Details and methodology about the tool and its database are best explained on their website: http://eggnogdb.embl.de/#/app/methods

Input

Genes or Proteins: A multi-fasta file containing genes or proteins.

Configuration

Taxonomic Scope:Fix the taxonomic scope used for annotation, so only orthologs from a particular clade are used for functional transfer. By default, this is automatically adjusted for every query sequence.
Target Orthologs: Define what type of orthologs should be used for functional transfer.
GO Evidence: Defines what type of GO terms should be used for annotation:
experimental = Use only terms inferred from experimental evidence
non-electronic = Use only non-electronically curated terms

Results

The result table (figure 3) summarizes all annotations that could be transferred with EggNOG Mapper. Besides ordering and filtering, the context menu allows to take a closer look at certain results.

The annotation details (figure 4) provides link-outs, where possible, and give detailed information about annotated GOs.

**Figure 4.**EggNOG Mapper annotation details.

References

Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Jaime Huerta-Cepas, Damian Szklarczyk, Lars Juhl Jensen, Christian von Mering and Peer Bork. Submitted (2016).

Huerta-Cepas J et al. (2019). eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic acids research, 47(D1), D309-D314.

PfamScan

Pfam is a database of protein families. Briefly, each Pfam database entry is comprised of a seed alignment, which forms the basis to build a profile hidden Markov model (HMM) using the HMMER software (http://hmmer.org/ ). The profile HMM is then queried against a sequence database called pfamseq, and all matches scoring above the curated threshold (carefully chosen to avoid the inclusion of any known false positives), are aligned back to the profile HMM to generate the full alignment. Where possible, each entry is annotated with functional information derived from literature. To improve sustainability, especially with regard to scaling of the resource, pfamseq is derived only from the UniProt Knowledgebase (UniProtKB) sequences that belong to Reference Proteomes, rather than the entirety of UniProtKB. (Figure 5)

Input

Genes or Proteins: A multi-fasta file containing genes or proteins.

Results

The result table (figure 6) summarizes all PfamScan annotations. Besides ordering and filtering, the context menu allows to take a closer look at certain results.

The annotation details (figure 7) provide link-outs, where possible, and give detailed information about annotated GOs.

**Figure 7.**PfamScan annotation details.

References

The Pfam protein families database in 2019: S. El-Gebali, J. Mistry, A. Bateman, S.R. Eddy, A. Luciani, S.C. Potter, M. Qureshi, L.J. Richardson, G.A. Salazar, A. Smart, E.L.L. Sonnhammer, L. Hirsh, L. Paladin, D. Piovesan, S.C.E. Tosatto, R.D. FinnNucleic Acids Research (2019) doi: 10.1093/nar/gky995