Tools for mining secondary metabolite biosynthesis gene clusters¶
- Tools for mining secondary metabolite biosynthesis gene clusters
- CASSIS and SMIPS
- ClustScan Professional
- eSNaPD // environmental Surveyor of Natural Product Diversity
- NaPDoS // Natural Products Domain Seeker
- PRISM / GNP
- SMURF / Secondary Metabolite Unknown Region Finder
2metDB / SecmetDB is a standalone tool (webserver locally installed on the user's machine) that offers the possibility to mine for PKS and NRPS biosynthetic gene clusters in whole genome protein fasta files. The Algorithms used are the same than at the PKS/NRPS web server / Predictive Blast Server.
Reference (PKS/NRPS web server):
see antiSMASH page.
ARTS (Antibiotic Resistant Target Seeker) uses antiSMASH to predict biosynthetic gene clusters (BGCs), prioritizes detected BGCs and identifies drug targets.
BAGEL is a mining tool and database for ribosomally synthesized and post-translationally modified peptides (RIPPs) like for example lanthipeptides, bacteriocins or . BAGEL can identify RIPP biosynthetic gene clusters in (meta)genomic data, and classify and analyze the putative the products.
- de Jong, A., et al., 2006, Nucleic Acids Res. 34:W273-9
- de Jong, A., et al., 2010, Nucleic Acids Res. 38:W647-51
- van Heel, A. J., et al., 2013, Nucleic Acids Res. 41:W448-53
CASSIS and SMIPS¶
Toolkit consiting of the tools CASSIS (Cluster Assignment by Islands of Sites) and SMIPS (Secondary Metabolites by InterProScan). SMIPS uses domain annotation provided by InterProScan to predict anchor genes encoding core biosynthetic enzymes (PKS, NRPS, DMATS) in eukaryotic genomic sequences. The data obtaines with SMIPS then serves as input for CASSIS, which uses an automated motif-based search for transcription factors to predict other genes associated with the "anchor" gene, i.e. gene clusters. The tool is available as webserver and for download.
CLUSEAN // CLuster SEquence ANalyzer
CLUSEAN is a Bioperl based annotation pipeline for secondary metabolite biosynthetic gene clusters. It allows automated homology searches, identification of conserved protein domains in PKS and NRPS gene clusters, classification of enzymes, and specificity predictions for NRPS A-domains. The CLUSEAN results are annotated in EMBL files but also can be exported in MS Excel.
ClusterFinder uses an probabilistic approach to detect putative secondary metabolite gene clusters in genomic and metagenomic data. Clusterfinder is available as standalone software and also integrated into antiSMASH and IMG-ABC.
Download source code:
see ClustScan Professional entry in the PKS/NRPS tools section.
eSNaPD // environmental Surveyor of Natural Product Diversity¶
eSNaPD is a tool to survey secondary metabolite gene cluster diversity in metagenomic DNA sequences, also taking into account metadata of the data, i.e. geographic sampling location.
- Reddy, B. V., et al., 2012, Appl. Environ. Microbiol. 78:3744-52
- Owen, J. G., et al., 2013, Proc. Natl. Acad. Sci. U. S. A. 110:11797-802
- Charlop-Powers, Z., et al., 2014, Proc. Natl. Acad. Sci. U. S. A. 111:3757-62
EvoMining uses phylogenomics to identify secondary metabolite biosynthetic gene clusters (BGCs) that encode duplicates of primary metabolism enzymes, but display a divergent phylogeny. This is based on an observation that such primary metabolic isoezymes are often included in secondary metabolite BGCs.
Prediction of fungal gene clusters based on genome and transcriptome data. R-based webserver and offline-version available.
- Andersen, M. R., et al., 2013, Proc. Natl. Acad. Sci. U. S. A. 110:E99-107
- Vesth, T.C., et al., 2016, Synth. Syst. Biotechnol, in press
MIDDAS-M (a motif-independent de novo detection algorithm for SMB gene clusters) is a gene cluster mining tool that uses genome and transcriptome data to identify gene clusters in fungal genomes.
MIPS-CG (motif-independent prediction without core genes) attempts to identify completely novel secondary metabolite biosynthetic gene clusters using only genome data. It does not use known sequences (or motifs) of core genes and transcriptome data.
Link (Note: currently offline):
NaPDoS // Natural Products Domain Seeker¶
see NapDoS entry in the PKS/NRPS tools section.
PhytoClust is dedicated to detection of biosynthetic gene clusters for secondary metabolites in plant genomes.
PKMiner is a domain classifier predicting novel biosynthetic gene clusters of type II PKSs and aromatic polyketide chemotypes based on their annotated aromatase and cyclase domains.
plantiSMASH is a version of antiSMASH dedicated to plant genomes.
PRISM / GNP¶
GNP (Genes to Natural Products) is an integrated platform to link gene cluster data (for PKS/NRPS clusters) to LC-MS/MS data. Within the genome mining modules of GNP, gene clusters can be detected and putative biosynthetic products predicted. These prediction can be used in a second step to identify corresponding peaks in LC-MS/MS data of the strains within the GNP / iSNAP Database module. The PRISM component provides a web-based genome mining tool for nonribosomal peptides and type I and II polyketides, providing a very good structure prediction. After its initial release, PRISM was extended with support for RiPP cluster detection and analysis, and very recently, PRISM 3 was released, which now enables prediction of structures of a greater range of secondary metabolites.
- Skinnider, M. A., et al., 2015, Nucleic Acids Res. 43:9645-62
- Johnston, C. W., et al., 2015, Nat. Commun. 6:8421
- Skinnider, M.A., et al., 2016, Proc. Nat. Acad. Sci. 113: E6343-6351
- Skinnider, M.A., et al., 2017, Nucleic Acids Res. doi: 10.1093/nar/gkx320
RiPPMiner predicts chemical structures of ribosomally synthesized and post-translationally modified peptides (RiPPs).
RODEO (Rapid ORF Description and Evaluation Online) detects biosynthetic gene clusters (BGCs) encoding ribosomally synthesized and post-translationally modified peptides (RiPPs). RODEO is available as a standalone application, and also is integrated into antiSMASH 4.
- Tietz, J. I., et al., 2017, Nat. Chem. Biol. 13:470-478
- Blin, K., et al., 2017, Nucleic Acids Res. doi: 10.1093/nar/gkx319
- Main page: http://www.ripprodeo.org/
- Webtool: http://rodeo.scs.illinois.edu/
SANDPUMA (Specificity of AdenylatioN Domain Prediction Using Multiple Algorithms) predicts substrate specificities of adenylation domains of NRPS. Sandpuma is integrated into antiSMASH 4.
- Chevrette, M. G., Aicheler, F., Kohlbacher, O., Currie, C. R. and Medema, M. H., 2017, Bioinformatics 33: 3202-3210
- Blin, K., et al., 2017, Nucleic Acids Res. doi: 10.1093/nar/gkx319
SBSPKS (Structure based sequence analysis of PKS and NRPS) allows various chemical analyses for experimentally characterized biosynthetic gene clusters (BGCs) encoding PKS/NRPS. Recently, its version 2 was released.
SeMPI (Secondary Metabolite Prediction and identification) predicts structures of secondary metabolites biosynthesized by type I modular PKS. It uses antiSMASH and StreptomeDB 2.0 as backend engines. SeMPI can also be considered as a dereplication tool.
SMURF / Secondary Metabolite Unknown Region Finder¶
SMURF is a web-based search platform to mine secondary metabolite biosynthetic gene clusters in fungi. SMURF employs a HMM based search strategy to identify conserved domains in PKS, NRPS, hybrid-PKS/NRPS and terpenoid gene clusters.