Tools for mining secondary metabolite biosynthesis gene clusters¶

2metDB¶

2metDB / SecmetDB is a standalone tool (webserver locally installed on the user's machine) that offers the possibility to mine for PKS and NRPS biosynthetic gene clusters in whole genome protein fasta files. The Algorithms used are the same than at the PKS/NRPS web server / Predictive Blast Server.

Reference (PKS/NRPS web server):

Bachmann, B. O. and Ravel, J., 2009, Methods Enzymol. 458:181-217

Link:

http://secmetdb.sourceforge.net/

antiSMASH¶

see antiSMASH page.

ARTS¶

ARTS (Antibiotic Resistant Target Seeker) uses antiSMASH to predict biosynthetic gene clusters (BGCs), prioritizes detected BGCs and identifies drug targets.

Reference:

Link:

https://arts.ziemertlab.com

BAGEL¶

BAGEL is a mining tool and database for ribosomally synthesized and post-translationally modified peptides (RIPPs) like for example lanthipeptides, bacteriocins or . BAGEL can identify RIPP biosynthetic gene clusters in (meta)genomic data, and classify and analyze the putative the products.

References:

Link:

BAGEL 3: http://bagel.molgenrug.nl/
BAGEL 4: http://bagel4.molgenrug.nl/

BiG-SCAPE¶

The Biosynthetic Gene Similaryty Clustering and Prospecting Engine BiG-SCAPE is a tool that uses distances between gene clusters (e.g. identified with antiSMASH) to build sequence similarity networks, which then are used to build gene cluster families. By mapping known gene clusters from the MIBiG dataset these data can be used for sequence-based dereplication of gene clusters.

Links:

CASSIS and SMIPS¶

Toolkit consiting of the tools CASSIS (Cluster Assignment by Islands of Sites) and SMIPS (Secondary Metabolites by InterProScan). SMIPS uses domain annotation provided by InterProScan to predict anchor genes encoding core biosynthetic enzymes (PKS, NRPS, DMATS) in eukaryotic genomic sequences. The data obtaines with SMIPS then serves as input for CASSIS, which uses an automated motif-based search for transcription factors to predict other genes associated with the "anchor" gene, i.e. gene clusters. The tool is available as webserver and for download.

Reference:

Wolf, T., et al., 2015, Bioinformatics 32:1138-43

Links:

CLUSEAN¶

CLUSEAN // CLuster SEquence ANalyzer

CLUSEAN is a Bioperl based annotation pipeline for secondary metabolite biosynthetic gene clusters. It allows automated homology searches, identification of conserved protein domains in PKS and NRPS gene clusters, classification of enzymes, and specificity predictions for NRPS A-domains. The CLUSEAN results are annotated in EMBL files but also can be exported in MS Excel.

Reference:

Weber, T., et al., 2009, J. Biotechnol. 140:13-7

Link:

https://bitbucket.org/tilmweber/clusean

ClusterFinder¶

ClusterFinder uses an probabilistic approach to detect putative secondary metabolite gene clusters in genomic and metagenomic data. Clusterfinder is available as standalone software and also integrated into antiSMASH and IMG-ABC.

Reference:

Cimermancic, P., et al., 2014, Cell 158:412-21

Download source code:

https://github.com/petercim/ClusterFinder

ClustScan Professional¶

see ClustScan Professional entry in the PKS/NRPS tools section.

eSNaPD // environmental Surveyor of Natural Product Diversity¶

eSNaPD is a tool to survey secondary metabolite gene cluster diversity in metagenomic DNA sequences, also taking into account metadata of the data, i.e. geographic sampling location.

References:

Link:

http://esnapd2.rockefeller.edu/

EvoMining¶

EvoMining uses phylogenomics to identify secondary metabolite biosynthetic gene clusters (BGCs) that encode duplicates of primary metabolism enzymes, but display a divergent phylogeny. This is based on an observation that such primary metabolic isoezymes are often included in secondary metabolite BGCs.

Reference:

Cruz-Morales, P., et al., bioRxiv, doi: http://dx.doi.org/10.1101/020503

Link:

http://148.247.230.39/newevomining/new/evomining_web/index.html (currently offline)
https://github.com/nselem/EvoMining/wiki

FunGeneClusterS¶

Prediction of fungal gene clusters based on genome and transcriptome data. R-based webserver and offline-version available.

References:

Andersen, M. R., et al., 2013, Proc. Natl. Acad. Sci. U. S. A. 110:E99-107
Vesth, T.C., et al., 2016, Synth. Syst. Biotechnol, in press

Link:

https://fungiminions.shinyapps.io/FunGeneClusterS

MIDDAS-M¶

MIDDAS-M (a motif-independent de novo detection algorithm for SMB gene clusters) is a gene cluster mining tool that uses genome and transcriptome data to identify gene clusters in fungal genomes.

Reference:

Umemura, M., et al., 2013, PLoS One 8:e84028

Link:

http://133.242.13.217/MIDDAS-M (currently offline)

MIPS-CG¶

MIPS-CG (motif-independent prediction without core genes) attempts to identify completely novel secondary metabolite biosynthetic gene clusters using only genome data. It does not use known sequences (or motifs) of core genes and transcriptome data.

References:

Link (Note: currently offline):

http://www.fung-metb.net/

NaPDoS // Natural Products Domain Seeker¶

see NapDoS entry in the PKS/NRPS tools section.

PhytoClust¶

PhytoClust is dedicated to detection of biosynthetic gene clusters for secondary metabolites in plant genomes.

References:

Topfer, N. et al., 2017, Nucleic Acids Res. doi: 10.1093/nar/gkx404

Link:

http://phytoclust.weizmann.ac.il/

PKMiner¶

PKMiner is a domain classifier predicting novel biosynthetic gene clusters of type II PKSs and aromatic polyketide chemotypes based on their annotated aromatase and cyclase domains.

References:

Kim, J., Yi, G. S., 2012, BMC Microbiol. 12:169

Link:

http://pks.kaist.ac.kr/pkminer

plantiSMASH¶

plantiSMASH is a version of antiSMASH dedicated to plant genomes.

References:

Kautsar, S. A. et al., 2017, Nucleic Acids Res. doi: 10.1093/nar/gkx305

Link:

http://plantismash.secondarymetabolites.org

PRISM / GNP¶

GNP (Genes to Natural Products) is an integrated platform to link gene cluster data (for PKS/NRPS clusters) to LC-MS/MS data. Within the genome mining modules of GNP, gene clusters can be detected and putative biosynthetic products predicted. These prediction can be used in a second step to identify corresponding peaks in LC-MS/MS data of the strains within the GNP / iSNAP Database module. The PRISM component provides a web-based genome mining tool for nonribosomal peptides and type I and II polyketides, providing a very good structure prediction. After its initial release, PRISM was extended with support for RiPP cluster detection and analysis, and very recently, PRISM 3 was released, which now enables prediction of structures of a greater range of secondary metabolites.

References:

Link:

GNP Genome: http://magarveylab.ca/gnp/#!/genome
PRISM: http://magarveylab.ca/prism

Source code:

PRISM: https://github.com/magarveylab/prism-releases

RiPPMiner¶

RiPPMiner predicts chemical structures of ribosomally synthesized and post-translationally modified peptides (RiPPs).

Reference:

Agrawal, P., et al., 2017, Nucleic Acids Res. doi: 10.1093/nar/gkx408

Link:

http://www.nii.ac.in/rippminer.html

RODEO¶

RODEO (Rapid ORF Description and Evaluation Online) detects biosynthetic gene clusters (BGCs) encoding ribosomally synthesized and post-translationally modified peptides (RiPPs). RODEO is available as a standalone application, and also is integrated into antiSMASH.

References:

Link:

Main page: http://www.ripp.rodeo/
Webtool: http://rodeo.scs.illinois.edu/
antiSMASH

SANDPUMA¶

SANDPUMA (Specificity of AdenylatioN Domain Prediction Using Multiple Algorithms) predicts substrate specificities of adenylation domains of NRPS. Sandpuma is integrated into antiSMASH 4.

Reference:

Link:

https://bitbucket.org/chevrm/sandpuma

SBSPKS¶

SBSPKS (Structure based sequence analysis of PKS and NRPS) allows various chemical analyses for experimentally characterized biosynthetic gene clusters (BGCs) encoding PKS/NRPS. Recently, its version 2 was released.

Reference:

Khater, S., et al., 2017, Nucleic Acids Res. doi: 10.1093/nar/gkx344

Link:

http://www.nii.ac.in/sbspks2.html

SeMPI¶

SeMPI (Secondary Metabolite Prediction and identification) predicts structures of secondary metabolites biosynthesized by type I modular PKS. It uses antiSMASH and StreptomeDB 2.0 as backend engines. SeMPI can also be considered as a dereplication tool.

Reference:

Zierep, P. F., et al., 2017, Nucleic Acids Res. doi: 10.1093/nar/gkx289

Link:

http://www.pharmaceutical-bioinformatics.de/sempi/

SMURF / Secondary Metabolite Unknown Region Finder¶

SMURF is a web-based search platform to mine secondary metabolite biosynthetic gene clusters in fungi. SMURF employs a HMM based search strategy to identify conserved domains in PKS, NRPS, hybrid-PKS/NRPS and terpenoid gene clusters.

Reference:

Khaldi, N., et al., 2010, Fungal Genet. Biol. 47:736-41

Link:

http://jcvi.org/smurf/index.php