Tools for mining secondary metabolite biosynthesis gene clusters¶
- Tools for mining secondary metabolite biosynthesis gene clusters
2metDB / SecmetDB is a standalone tool (webserver locally installed on the user's machine) that offers the possibility to mine for PKS and NRPS biosynthetic gene clusters in whole genome protein fasta files. The Algorithms used are the same than at the PKS/NRPS web server / Predictive Blast Server.
Reference (PKS/NRPS web server):
see antiSMASH page.
BAGEL is a mining tool and database for ribosomally synthesized and post-translationally modified peptides (RIPPs) like for example lanthipeptides, bacteriocins or . BAGEL can identify RIPP biosynthetic gene clusters in (meta)genomic data, and classify and analyze the putative the products.
- de Jong, A., et al., 2006, Nucleic Acids Res. 34:W273-9
- de Jong, A., et al., 2010, Nucleic Acids Res. 38:W647-51
- van Heel, A. J., et al., 2013, Nucleic Acids Res. 41:W448-53
CASSIS and SMIPS¶
Toolkit consiting of the tools CASSIS (Cluster Assignment by Islands of Sites) and SMIPS (Secondary Metabolites by InterProScan). SMIPS uses domain annotation provided by InterProScan to predict anchor genes encoding core biosynthetic enzymes (PKS, NRPS, DMATS) in eukaryotic genomic sequences. The data obtaines with SMIPS then serves as input for CASSIS, which uses an automated motif-based search for transcription factors to predict other genes associated with the "anchor" gene, i.e. gene clusters. The tool is available as webserver and for download.
CLUSEAN // CLuster SEquence ANalyzer
CLUSEAN is a Bioperl based annotation pipeline for secondary metabolite biosynthetic gene clusters. It allows automated homology searches, identification of conserved protein domains in PKS and NRPS gene clusters, classification of enzymes, and specificity predictions for NRPS A-domains. The CLUSEAN results are annotated in EMBL files but also can be exported in MS Excel.
ClusterFinder uses an probabilistic approach to detect putative secondary metabolite gene clusters in genomic and metagenomic data. Clusterfinder is available as standalone software and also integrated into antiSMASH and IMG-ABC.
Download source code:
see ClustScan Professional entry in the PKS/NRPS tools section.
eSNaPD // environmental Surveyor of Natural Product Diversity¶
eSNaPD is a tool to survey secondary metabolite gene cluster diversity in metagenomic DNA sequences, also taking into account metadata of the data, i.e. geographic sampling location.
- Reddy, B. V., et al., 2012, Appl. Environ. Microbiol. 78:3744-52
- Owen, J. G., et al., 2013, Proc. Natl. Acad. Sci. U. S. A. 110:11797-802
- Charlop-Powers, Z., et al., 2014, Proc. Natl. Acad. Sci. U. S. A. 111:3757-62
EvoMining uses phylogenomics to identify secondary metabolite biosynthetic gene clusters (BGCs) that encode duplicates of primary metabolism enzymes, but display a divergent phylogeny. This is based on an observation that such primary metabolic isoezymes are often included in secondary metabolite BGCs.
Prediction of fungal gene clusters based on genome and transcriptome data. R-based webserver and offline-version available.
- Andersen, M. R., et al., 2013, Proc. Natl. Acad. Sci. U. S. A. 110:E99-107
- Vesth, T.C., et al., 2016, Synth. Syst. Biotechnol, in press
MIDDAS-M (a motif-independent de novo detection algorithm for SMB gene clusters) is a gene cluster mining tool that uses genome and transcriptome data to identify gene clusters in fungal genomes.
MIPS-CG (motif-independent prediction without core genes) attempts to identify completely novel secondary metabolite biosynthetic gene clusters using only genome data. It does not use known sequences (or motifs) of core genes and transcriptome data.
Link (Note: currently offline):
NaPDoS // Natural Products Domain Seeker¶
see NapDoS entry in the PKS/NRPS tools section.
PRISM / GNP¶
GNP (Genes to Natural Products) is an integrated platform to link gene cluster data (for PKS/NRPS clusters) to LC-MS/MS data. Within the genome mining modules of GNP, gene clusters can be detected and putative biosynthetic products predicted. These prediction can be used in a second step to identify corresponding peaks in LC-MS/MS data of the strains within the GNP / iSNAP Database module. The PRISM component provides a web-based genome mining tool for nonribosomal peptides and type I and II polyketides, providing a very good structure prediction. Recently, PRISM was extended with support for RiPP cluster detection and analysis
- Skinnider, M. A., et al., 2015, Nucleic Acids Res. 43:9645-62
- Johnston, C. W., et al., 2015, Nat. Commun. 6:8421
- Skinnider, M.A., et al., 2016, Proc. Nat. Acad. Sci. 113: E6343-6351
SMURF / Secondary Metabolite Unknown Region Finder¶
SMURF is a web-based search platform to mine secondary metabolite biosynthetic gene clusters in fungi. SMURF employs a HMM based search strategy to identify conserved domains in PKS, NRPS, hybrid-PKS/NRPS and terpenoid gene clusters.