Biocluster Mirrors: Difference between revisions

Latest revision as of 04:00, 18 March 2026

Application	Installed Versions	Description
alphafold-db	20210917 20220118 20230405 20240213	Alphafold Databases
alphafold3-db	20250520	Alphafold3 Databases
bart2-db	20240302	Databases for Bart2
BUSCO-db	4 5	Based on evolutionarily-informed expectations of gene content of near-universal single-copy orthologs, BUSCO metric is complementary to technical metrics like N50.
card-prevalence	3.0.6 4.0.1	November 2019 release - 85 pathogens, 116914 resistomes, and 182532 AMR allele sequences based on sequence data acquired from NCBI on July 31, 2019, analyzed using RGI 5.0.0 (DIAMOND homolog detection) and CARD 3.0.7. Includes pre-compiled k-mer classifier data for pathogen-of-origin prediction.
checkm-db	20150116	CheckM Database
checkm2-db	20230511	CheckM2 Database
checkv-db	1.5	The main change in v1.5 was to remove putatively non-viral sequences to minimize high-confidence matches to the CheckV database for non-viral sequences.
chocophlan	0.1.1
clusterblast	20170105
dorado-db	20240501	Databases for dorado
foldseek-db	20230921	Foldseek computes for each match a simple estimate for the probability that the match is a true positive match given its structural bit score. Here, hits within the same superfamily are TP, hits to another fold are FP, and hits to the same family or to another superfamily are ignored.
funannotate-db	20220428 20240515	funannotate is a pipeline for genome annotation (built specifically for fungi, but will also work with higher eukaryotes). Installation, usage, and more information can be found at http://funannotate.readthedocs.io
gatkbundle	20191118	he GATK resource bundle is a collection of standard files for working with human resequencing data with the GATK. We provide several versions of the bundle corresponding to the various reference builds,
genbank	250	GenBank ® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences
gtdb	207 214 226	GENOME TAXONOMY DATABASE
humann-db	201901b	Databases for HUMAnN
interpro	68.0 101.0 102.0 103.0 104.0 105.0	InterPro provides functional analysis of proteins by classifying them into families and predicting domains and important sites.
KneadData-db	20230405	Databases for KneadData
kraken2-db	20220327 20230314 20240112 20250714	Kraken 2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.
megares-db	3.00	The MEGARes V3.0 database contains sequence data for nearly 9,000 hand-curated resistance genes for antimicrobial drugs, biocides and metals, with an annotation structure that is optimized for use with high throughput sequencing.
ncbi-blastdb	20220318 20230124 20250211	BLAST search pages under the Basic BLAST section of the NCBI BLAST home page(http://blast.ncbi.nlm.nih.gov/) use a standard set of BLAST databases for nucleotide, protein, and translated BLAST searches.
pfam	32.0 35.0 37.0 37.3	The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs)
pgap-db	2021-07-01.build5508	Databases for pgap
refseq-db	211 230	The Reference Sequence (RefSeq) collection provides a comprehensive, integrated, non-redundant, well-annotated set of sequences, including genomic DNA, transcripts, and proteins.
silva	138.1	SILVA provides comprehensive, quality checked and regularly updated datasets of aligned small (16S/18S, SSU) and large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya).
tailseeker-db	20161215	Database Indexes for Tailseeker
uniprot	2024_06 2025_01 2025_02	The mission of UniProt is to provide the scientific community with a comprehensive, high-quality and freely accessible resource of protein sequence and functional information.

@@ Line 6: / Line 6: @@
 |-
 |[https://github.com/deepmind/alphafold/ alphafold-db]
-|20210917<br>20220118
+|20210917<br>20220118<br>20230405<br>20240213
 |Alphafold Databases
+|-
+|[https://github.com/google-deepmind/alphafold3 alphafold3-db]
+|20250520
+|Alphafold3 Databases
+|-
+|[https://github.com/zanglab/bart2 bart2-db]
+|20240302
+|Databases for Bart2
 |-
 |[https://busco.ezlab.org/ BUSCO-db]
-|4
+|4<br>5
 |Based on evolutionarily-informed expectations of gene content of near-universal single-copy orthologs, BUSCO metric is complementary to technical metrics like N50.
 |-
 |[https://card.mcmaster.ca card-prevalence]
-|3.0.6
+|3.0.6<br>4.0.1
 |November 2019 release - 85 pathogens, 116914 resistomes, and 182532 AMR allele sequences based on sequence data acquired from NCBI on July 31, 2019, analyzed using RGI 5.0.0 (DIAMOND homolog detection) and CARD 3.0.7. Includes pre-compiled k-mer classifier data for pathogen-of-origin prediction.
 |-
@@ Line 20: / Line 28: @@
 |20150116
 |CheckM Database
+|-
+|[https://github.com/chklovski/CheckM2 checkm2-db]
+|20230511
+|CheckM2 Database
+|-
+|[https://portal.nersc.gov/CheckV/ checkv-db]
+|1.5
+|The main change in v1.5 was to remove putatively non-viral sequences to minimize high-confidence matches to the CheckV database for non-viral sequences.
 |-
 |[http://huttenhower.sph.harvard.edu/humann2 chocophlan]
@@ Line 29: / Line 45: @@
 |
 |-
-|[https://www.ebi.ac.uk/ena ena]
+|[https://github.com/nanoporetech/dorado dorado-db]
-|20230208
+|20240501
-|The European Nucleotide Archive (ENA) captures and presents information relating to experimental workflows that are based around nucleotide sequencing.
+|Databases for dorado
+|-
+|[https://github.com/steineggerlab/foldseek foldseek-db]
+|20230921
+|Foldseek computes for each match a simple estimate for the probability that the match is a true positive match given its structural bit score. Here, hits within the same superfamily are TP, hits to another fold are FP, and hits to the same family or to another superfamily are ignored.
 |-
 |[https://github.com/nextgenusfs/funannotate funannotate-db]
-|20220428
+|20220428<br>20240515
 |funannotate is a pipeline for genome annotation (built specifically for fungi, but will also work with higher eukaryotes). Installation, usage, and more information can be found at http://funannotate.readthedocs.io
 |-
@@ Line 46: / Line 66: @@
 |-
 |[https://gtdb.ecogenomic.org/ gtdb]
-|207
+|207<br>214<br>226
 |GENOME TAXONOMY DATABASE
 |-
@@ Line 54: / Line 74: @@
 |-
 |[https://www.ebi.ac.uk/interpro/ interpro]
-|68.0
+|68.0<br>101.0<br>102.0<br>103.0<br>104.0<br>105.0
 |InterPro provides functional analysis of proteins by classifying them into families and predicting domains and important sites.
 |-
@@ Line 61: / Line 81: @@
 |Databases for KneadData
 |-
-|[https://ccb.jhu.edu/software/kraken/ kraken2-db]
+|[https://benlangmead.github.io/aws-indexes/k2 kraken2-db]
-|20220327
+|20220327<br>20230314<br>20240112<br>20250714
-|Kraken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.
+|Kraken 2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.
+|-
+|[https://www.meglab.org/megares/ megares-db]
+|3.00
+|The MEGARes V3.0 database contains sequence data for nearly 9,000 hand-curated resistance genes for antimicrobial drugs, biocides and metals, with an annotation structure that is optimized for use with high throughput sequencing.
 |-
 |[ftp://ftp.ncbi.nlm.nih.gov/blast/db/ ncbi-blastdb]
-|20170702<br>20180404<br>20190320<br>20190808<br>20201212<br>20220318<br>20230124
+|20220318<br>20230124<br>20250211
 |BLAST search pages under the Basic BLAST section of the NCBI BLAST home page(http://blast.ncbi.nlm.nih.gov/) use a standard set of BLAST databases for nucleotide, protein, and translated BLAST searches.
 |-
 |[https://pfam.xfam.org/ pfam]
-|32.0<br>35.0
+|32.0<br>35.0<br>37.0<br>37.3
 |The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs)
 |-
-|pgap-db
+|[https://github.com/ncbi/pgap pgap-db]
 |2021-07-01.build5508
-|
+|Databases for pgap
 |-
 |[https://www.ncbi.nlm.nih.gov/refseq/ refseq-db]
-|211
+|211<br>230
 |The Reference Sequence (RefSeq) collection provides a comprehensive, integrated, non-redundant, well-annotated set of sequences, including genomic DNA, transcripts, and proteins.
 |-
@@ Line 84: / Line 108: @@
 |138.1
 |SILVA provides comprehensive, quality checked and regularly updated datasets of aligned small (16S/18S, SSU) and large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya).
+|-
+|[https://github.com/hyeshik/tailseeker tailseeker-db]
+|20161215
+|Database Indexes for Tailseeker
 |-
 |[https://www.uniprot.org/ uniprot]
-|2018_04<br>2020_06<br>2021_02<br>2022_05
+|2024_06<br>2025_01<br>2025_02
 |The mission of UniProt is to provide the scientific community with a comprehensive, high-quality and freely accessible resource of protein sequence and functional information.
 |}

Biocluster Mirrors: Difference between revisions

Latest revision as of 04:00, 18 March 2026

Navigation menu

Search