GenomeDepot documentation

GenomeDepot: microbial genomic data management system

About Installation User guide Administration guide Developer guide

Return to start page

Introduction

The data management system for comparative genomics (GenomeDepot) is an open-source web-based platform for annotation, management and comparative analysis of microbial genomic sequences and associated data including ortholog families, protein domains, operons, regulatory interactions, metagenomic samples, strains taxonomy and metadata.

GenomeDepot is a tool developed to create web portals for microbial genome collections each containing hundreds and thousands of genomes. The web portals are built on the Django framework and backed by a MySQL database that aggregates gene annotations generated by various bioinformatic tools. The genome annotation tools are installed in separate Conda environments and run by the GenomeDepot annotation pipeline. GenomeDepot employs Django Q, a multiprocessing task queue, for scheduling and executing pipeline jobs. In addition to the pipeline-generated data, administrators of a GenomeDepot-based portal can import gene annotations from text files or enter them manually in the site administration panel.

Demo GenomeDepot-based genome collection portal

Image Credits

Background image: B. burgdorferi bacteria. Photo by Jamice Haney Carr, Claudia Molins, USCDCP on Pixnio

Main page image: Streptococcus pneumoniae bacterial colonies that were grown on primary isolation medium. Photo by Dr. Richard Facklam, USCDCP on Pixnio

Earth spinning rotating animation. Amirabbaszakavi, CC BY-SA 4.0, via Wikimedia Commons

Software credits

GenomeDepot employs a wide variety of genome data transformation and analysis tools:

eggNOG-mapper https://github.com/eggnogdb/eggnog-mapper

License: GNU GPL v3.0

References: [1] eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Carlos P. Cantalapiedra, Ana Hernandez-Plaza, Ivica Letunic, Peer Bork, Jaime Huerta-Cepas. 2021. Molecular Biology and Evolution, msab293, https://doi.org/10.1093/molbev/msab293

[2] eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Jaime Huerta-Cepas, Damian Szklarczyk, Davide Heller, Ana Hernández-Plaza, Sofia K Forslund, Helen Cook, Daniel R Mende, Ivica Letunic, Thomas Rattei, Lars J Jensen, Christian von Mering, Peer Bork Nucleic Acids Res. 2019 Jan 8; 47(Database issue): D309–D314. doi: 10.1093/nar/gky1085

AMRFinderPlus https://github.com/ncbi/amr

License: Public Domain

Reference: Feldgarden M, Brover V, Gonzalez-Escalona N, Frye JG, Haendiges J, Haft DH, Hoffmann M, Pettengill JB, Prasad AB, Tillman GE, Tyson GH, Klimke W. AMRFinderPlus and the Reference Gene Catalog facilitate examination of the genomic links among antimicrobial resistance, stress response, and virulence. Sci Rep. 2021 Jun 16;11(1):12728. doi: 10.1038/s41598-021-91456-0. PMID: 34135355; PMCID: PMC8208984. https://pubmed.gov/31427293

antiSMASH https://github.com/antismash/antismash

License: GNU GPL v3.0

Reference: antiSMASH 6.0: improving cluster detection and comparison capabilities Kai Blin, Simon Shaw, Alexander M Kloosterman, Zach Charlop-Powers, Gilles P van Weezel, Marnix H Medema, & Tilmann Weber Nucleic Acids Research (2021) doi: 10.1093/nar/gkab335.

PhiSpy https://github.com/linsalrob/PhiSpy

License: MIT License

Reference: Sajia Akhter, Ramy K. Aziz, Robert A. Edwards; PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucl Acids Res 2012; 40 (16): e126. doi: 10.1093/nar/gks406

eCIS-screen https://github.com/ipb-jianyang/eCIS-screen

License: GPL-3.0 license

Reference: Chen L, Song N, Liu B, Zhang N, Alikhan NF, Zhou Z, Zhou Y, Zhou S, Zheng D, Chen M, Hapeshi A, Healey J, Waterfield NR, Yang J, Yang G. Genome-wide Identification and Characterization of a Superfamily of Bacterial Extracellular Contractile Injection Systems. Cell Rep. 2019 Oct 8;29(2):511-521.e2. doi: 10.1016/j.celrep.2019.08.096. PMID: 31597107; PMCID: PMC6899500.

Fama https://github.com/aekazakov/Fama

License: LBNL BSD 3-clause

Reference: Kazakov A, Novichkov P. Fama: a computational tool for comparative analysis of shotgun metagenomic data. Great Lakes Bioinformatics conference (poster presentation). 2019. , https://iseq.lbl.gov/mydocs/fama_glbio2019_poster.pdf

GapMind https://github.com/morgannprice/PaperBLAST

License: GPL-3.0 license

Reference: Price MN, Deutschbauer AM, Arkin AP. GapMind: Automated Annotation of Amino Acid Biosynthesis. mSystems. 2020 Jun 23;5(3):e00291-20. doi: 10.1128/mSystems.00291-20. PMID: 32576650; PMCID: PMC7311316.

DefenseFinder https://github.com/mdmparis/defense-finder

License: GPL-3.0 license

Reference: “Systematic and quantitative view of the antiviral arsenal of prokaryotes” Nature Communication, 2022, Tesson F., Hervé A. , Mordret E., Touchon M., d’Humières C., Cury J., Bernheim A.

MacSyFinder https://github.com/gem-pasteur/macsyfinder

License: GPL-3.0 license

Reference: “MacSyFinder: A Program to Mine Genomes for Molecular Systems with an Application to CRISPR-Cas Systems.” PloS one 2014 Abby S., Néron B.,Ménager H., Touchon M. Rocha EPC.

geNomad https://github.com/apcamargo/genomad

License: ACADEMIC, INTERNAL, RESEARCH & DEVELOPMENT, NON-COMMERCIAL USE ONLY, LICENSE

Reference: Camargo, A.P., Roux, S., Schulz, F. et al. Identification of mobile genetic elements with geNomad. Nat Biotechnol 42, 1303–1312 (2024). https://doi.org/10.1038/s41587-023-01953-y

HMMER https://github.com/EddyRivasLab/hmmer

License: BSD 3-clause

Reference: S. R. Eddy. Accelerated profile HMM searches. PLOS Comp. Biol., 7:e1002195, 2011

Muscle https://github.com/rcedgar/muscle

License: GPL-3.0 license

Reference: Edgar RC., Muscle5: High-accuracy alignment ensembles enable unbiased assessments of sequence homology and phylogeny. Nature Communications 13.1 (2022): 6968. https://www.nature.com/articles/s41467-022-34630-w.pdf

NCBI BLAST+ https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST

License: Public Domain

Reference: Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. BLAST+: architecture and applications. BMC Bioinformatics. 2009 Dec 15;10:421. doi: 10.1186/1471-2105-10-421. PMID: 20003500; PMCID: PMC2803857.

Django https://github.com/django/django

License: BSD 3-Clause

Reference: Django Software Foundation, 2019. Django, Available at: https://djangoproject.com.

parasail https://github.com/jeffdaily/parasail

License: Battelle BSD-style

Reference: Daily, Jeff. (2016). Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments. BMC Bioinformatics, 17(1), 1-11. doi:10.1186/s12859-016-0930-z

Biopython https://doi.org/10.1093/bioinformatics/btp163

License: BSD 3-Clause

Reference: Cock, P.J.A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 2009 Jun 1; 25(11) 1422-3 pmid:19304878

Jbrowse v1 https://jbrowse.org/jbrowse1.html

License: GNU LGPL

Reference: Buels R et al. JBrowse: a dynamic web platform for genome visualization and analysis.Genome Biology (2016).

samtools https://github.com/samtools/samtools

License: The MIT/Expat License

Reference: Twelve years of SAMtools and BCFtools Petr Danecek, James K Bonfield, Jennifer Liddle, John Marshall, Valeriu Ohan, Martin O Pollard, Andrew Whitwham, Thomas Keane, Shane A McCarthy, Robert M Davies, Heng Li GigaScience, Volume 10, Issue 2, February 2021, giab008, https://doi.org/10.1093/gigascience/giab008

tabix https://doi.org/10.1093/gigascience/giab007

License: The MIT/Expat License

Reference: HTSlib: C library for reading/writing high-throughput sequencing data James K Bonfield, John Marshall, Petr Danecek, Heng Li, Valeriu Ohan, Andrew Whitwham, Thomas Keane, Robert M Davies GigaScience, Volume 10, Issue 2, February 2021, giab007,

POEM https://github.com/Rinoahu/POEM_py3k

License: GPL-3.0 license

Reference: Identifying Core Operons in Metagenomic Data Xiao Hu, Iddo Friedberg bioRxiv 2019.12.20.885269; doi: https://doi.org/10.1101/2019.12.20.885269

Xiao R. (2019). POEM py3k: GitHub repository. Available online at: https://github.com/Rinoahu/POEM_py3k (accessed December 16, 2020).

Scribl https://github.com/chmille4/Scribl

License: MIT License

Reference: Miller CA, Anthony J, Meyer MM, Marth G. Scribl: an HTML5 Canvas-based graphics library for visualizing genomic data over the web. Bioinformatics. 2013 Feb 1;29(3):381-3. doi: 10.1093/bioinformatics/bts677. Epub 2012 Nov 19. PMID: 23172864; PMCID: PMC3562066.

Continue to installation and configuration…

About Installation User guide Administration guide Developer guide