GenomeDepot: microbial genomic data management system
About | Installation | User guide | Administration guide | Developer guide |
The data management system for comparative genomics (GenomeDepot) is an open-source web-based platform for annotation, management and comparative analysis of microbial genomic sequences and associated data including ortholog families, protein domains, operons, regulatory interactions, metagenomic samples, strains taxonomy and metadata.
GenomeDepot is a tool developed to create web portals for microbial genome collections each containing hundreds and thousands of genomes. The web portals are built on the Django framework and backed by a MySQL database that aggregates gene annotations generated by various bioinformatic tools. The genome annotation tools are installed in separate Conda environments and run by the GenomeDepot annotation pipeline. GenomeDepot employs Django Q, a multiprocessing task queue, for scheduling and executing pipeline jobs. In addition to the pipeline-generated data, administrators of a GenomeDepot-based portal can import gene annotations from text files or enter them manually in the site administration panel.
Demo GenomeDepot-based genome collection portal
Background image: B. burgdorferi bacteria. Photo by Jamice Haney Carr, Claudia Molins, USCDCP on Pixnio
Main page image: Streptococcus pneumoniae bacterial colonies that were grown on primary isolation medium. Photo by Dr. Richard Facklam, USCDCP on Pixnio
Earth spinning rotating animation. Amirabbaszakavi, CC BY-SA 4.0, via Wikimedia Commons
GenomeDepot employs a wide variety of genome data transformation and analysis tools:
eggNOG-mapper https://github.com/eggnogdb/eggnog-mapper
License: GNU GPL v3.0
References: [1] eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Carlos P. Cantalapiedra, Ana Hernandez-Plaza, Ivica Letunic, Peer Bork, Jaime Huerta-Cepas. 2021. Molecular Biology and Evolution, msab293, https://doi.org/10.1093/molbev/msab293
[2] eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Jaime Huerta-Cepas, Damian Szklarczyk, Davide Heller, Ana Hernández-Plaza, Sofia K Forslund, Helen Cook, Daniel R Mende, Ivica Letunic, Thomas Rattei, Lars J Jensen, Christian von Mering, Peer Bork Nucleic Acids Res. 2019 Jan 8; 47(Database issue): D309–D314. doi: 10.1093/nar/gky1085
AMRFinderPlus https://github.com/ncbi/amr
License: Public Domain
Reference: Feldgarden M, Brover V, Gonzalez-Escalona N, Frye JG, Haendiges J, Haft DH, Hoffmann M, Pettengill JB, Prasad AB, Tillman GE, Tyson GH, Klimke W. AMRFinderPlus and the Reference Gene Catalog facilitate examination of the genomic links among antimicrobial resistance, stress response, and virulence. Sci Rep. 2021 Jun 16;11(1):12728. doi: 10.1038/s41598-021-91456-0. PMID: 34135355; PMCID: PMC8208984. https://pubmed.gov/31427293
antiSMASH https://github.com/antismash/antismash
License: GNU GPL v3.0
Reference: antiSMASH 6.0: improving cluster detection and comparison capabilities Kai Blin, Simon Shaw, Alexander M Kloosterman, Zach Charlop-Powers, Gilles P van Weezel, Marnix H Medema, & Tilmann Weber Nucleic Acids Research (2021) doi: 10.1093/nar/gkab335.
PhiSpy https://github.com/linsalrob/PhiSpy
License: MIT License
Reference: Sajia Akhter, Ramy K. Aziz, Robert A. Edwards; PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucl Acids Res 2012; 40 (16): e126. doi: 10.1093/nar/gks406
eCIS-screen https://github.com/ipb-jianyang/eCIS-screen
License: GPL-3.0 license
Reference: Chen L, Song N, Liu B, Zhang N, Alikhan NF, Zhou Z, Zhou Y, Zhou S, Zheng D, Chen M, Hapeshi A, Healey J, Waterfield NR, Yang J, Yang G. Genome-wide Identification and Characterization of a Superfamily of Bacterial Extracellular Contractile Injection Systems. Cell Rep. 2019 Oct 8;29(2):511-521.e2. doi: 10.1016/j.celrep.2019.08.096. PMID: 31597107; PMCID: PMC6899500.
Fama https://github.com/aekazakov/Fama
License: LBNL BSD 3-clause
Reference: Kazakov A, Novichkov P. Fama: a computational tool for comparative analysis of shotgun metagenomic data. Great Lakes Bioinformatics conference (poster presentation). 2019. , https://iseq.lbl.gov/mydocs/fama_glbio2019_poster.pdf
GapMind https://github.com/morgannprice/PaperBLAST
License: GPL-3.0 license
Reference: Price MN, Deutschbauer AM, Arkin AP. GapMind: Automated Annotation of Amino Acid Biosynthesis. mSystems. 2020 Jun 23;5(3):e00291-20. doi: 10.1128/mSystems.00291-20. PMID: 32576650; PMCID: PMC7311316.
DefenseFinder https://github.com/mdmparis/defense-finder
License: GPL-3.0 license
Reference: “Systematic and quantitative view of the antiviral arsenal of prokaryotes” Nature Communication, 2022, Tesson F., Hervé A. , Mordret E., Touchon M., d’Humières C., Cury J., Bernheim A.
MacSyFinder https://github.com/gem-pasteur/macsyfinder
License: GPL-3.0 license
Reference: “MacSyFinder: A Program to Mine Genomes for Molecular Systems with an Application to CRISPR-Cas Systems.” PloS one 2014 Abby S., Néron B.,Ménager H., Touchon M. Rocha EPC.
geNomad https://github.com/apcamargo/genomad
License: ACADEMIC, INTERNAL, RESEARCH & DEVELOPMENT, NON-COMMERCIAL USE ONLY, LICENSE
Reference: Camargo, A.P., Roux, S., Schulz, F. et al. Identification of mobile genetic elements with geNomad. Nat Biotechnol 42, 1303–1312 (2024). https://doi.org/10.1038/s41587-023-01953-y
HMMER https://github.com/EddyRivasLab/hmmer
License: BSD 3-clause
Reference: S. R. Eddy. Accelerated profile HMM searches. PLOS Comp. Biol., 7:e1002195, 2011
Muscle https://github.com/rcedgar/muscle
License: GPL-3.0 license
Reference: Edgar RC., Muscle5: High-accuracy alignment ensembles enable unbiased assessments of sequence homology and phylogeny. Nature Communications 13.1 (2022): 6968. https://www.nature.com/articles/s41467-022-34630-w.pdf
NCBI BLAST+ https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST
License: Public Domain
Reference: Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. BLAST+: architecture and applications. BMC Bioinformatics. 2009 Dec 15;10:421. doi: 10.1186/1471-2105-10-421. PMID: 20003500; PMCID: PMC2803857.
Django https://github.com/django/django
License: BSD 3-Clause
Reference: Django Software Foundation, 2019. Django, Available at: https://djangoproject.com.
parasail https://github.com/jeffdaily/parasail
License: Battelle BSD-style
Reference: Daily, Jeff. (2016). Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments. BMC Bioinformatics, 17(1), 1-11. doi:10.1186/s12859-016-0930-z
Biopython https://doi.org/10.1093/bioinformatics/btp163
License: BSD 3-Clause
Reference: Cock, P.J.A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 2009 Jun 1; 25(11) 1422-3 pmid:19304878
Jbrowse v1 https://jbrowse.org/jbrowse1.html
License: GNU LGPL
Reference: Buels R et al. JBrowse: a dynamic web platform for genome visualization and analysis.Genome Biology (2016).
samtools https://github.com/samtools/samtools
License: The MIT/Expat License
Reference: Twelve years of SAMtools and BCFtools Petr Danecek, James K Bonfield, Jennifer Liddle, John Marshall, Valeriu Ohan, Martin O Pollard, Andrew Whitwham, Thomas Keane, Shane A McCarthy, Robert M Davies, Heng Li GigaScience, Volume 10, Issue 2, February 2021, giab008, https://doi.org/10.1093/gigascience/giab008
tabix https://doi.org/10.1093/gigascience/giab007
License: The MIT/Expat License
Reference: HTSlib: C library for reading/writing high-throughput sequencing data James K Bonfield, John Marshall, Petr Danecek, Heng Li, Valeriu Ohan, Andrew Whitwham, Thomas Keane, Robert M Davies GigaScience, Volume 10, Issue 2, February 2021, giab007,
POEM https://github.com/Rinoahu/POEM_py3k
License: GPL-3.0 license
Reference: Identifying Core Operons in Metagenomic Data Xiao Hu, Iddo Friedberg bioRxiv 2019.12.20.885269; doi: https://doi.org/10.1101/2019.12.20.885269
Xiao R. (2019). POEM py3k: GitHub repository. Available online at: https://github.com/Rinoahu/POEM_py3k (accessed December 16, 2020).
Scribl https://github.com/chmille4/Scribl
License: MIT License
Reference: Miller CA, Anthony J, Meyer MM, Marth G. Scribl: an HTML5 Canvas-based graphics library for visualizing genomic data over the web. Bioinformatics. 2013 Feb 1;29(3):381-3. doi: 10.1093/bioinformatics/bts677. Epub 2012 Nov 19. PMID: 23172864; PMCID: PMC3562066.
Continue to installation and configuration…
About | Installation | User guide | Administration guide | Developer guide |