Difference between revisions of "Biocluster Mirrors"
Jump to navigation
Jump to search
(Created page with "{| style="width: 900px; margin: 0 auto" class="wikitable" border="1" cellpadding="1" cellspacing="1" |- ! Data Set ! Website ! Update Schedule ! File System Path ! |- | Bioc...") |
Mediawikiapi (talk | contribs) |
||
(52 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | {| style= | + | {| style='margin: 0 auto' class='wikitable' border='1' cellpadding='1' cellspacing='1' |
|- | |- | ||
− | ! | + | !Application |
− | ! | + | !Installed Versions |
− | + | !Description | |
− | |||
− | ! | ||
|- | |- | ||
− | + | |[https://github.com/deepmind/alphafold/ alphafold-db] | |
− | | [ | + | |20210917<br>20220118<br>20230405<br>20240213 |
− | | | + | |Alphafold Databases |
− | | | ||
|- | |- | ||
− | | | + | |[https://github.com/zanglab/bart2 bart2-db] |
− | | [http:// | + | |20240302 |
− | | | + | |Databases for Bart2 |
− | | / | + | |- |
+ | |[https://busco.ezlab.org/ BUSCO-db] | ||
+ | |4 | ||
+ | |Based on evolutionarily-informed expectations of gene content of near-universal single-copy orthologs, BUSCO metric is complementary to technical metrics like N50. | ||
+ | |- | ||
+ | |[https://card.mcmaster.ca card-prevalence] | ||
+ | |3.0.6 | ||
+ | |November 2019 release - 85 pathogens, 116914 resistomes, and 182532 AMR allele sequences based on sequence data acquired from NCBI on July 31, 2019, analyzed using RGI 5.0.0 (DIAMOND homolog detection) and CARD 3.0.7. Includes pre-compiled k-mer classifier data for pathogen-of-origin prediction. | ||
+ | |- | ||
+ | |[https://ecogenomics.github.io/CheckM/ checkm-db] | ||
+ | |20150116 | ||
+ | |CheckM Database | ||
+ | |- | ||
+ | |[https://github.com/chklovski/CheckM2 checkm2-db] | ||
+ | |20230511 | ||
+ | |CheckM2 Database | ||
+ | |- | ||
+ | |[https://portal.nersc.gov/CheckV/ checkv-db] | ||
+ | |1.5 | ||
+ | |The main change in v1.5 was to remove putatively non-viral sequences to minimize high-confidence matches to the CheckV database for non-viral sequences. | ||
+ | |- | ||
+ | |[http://huttenhower.sph.harvard.edu/humann2 chocophlan] | ||
+ | |0.1.1 | ||
+ | | | ||
+ | |- | ||
+ | |[http://www.molgen.ua.ac.be/bioinfo/projects/clusterblast/clusterblast.htm clusterblast] | ||
+ | |20170105 | ||
+ | | | ||
+ | |- | ||
+ | |[https://github.com/nanoporetech/dorado dorado-db] | ||
+ | |20240501 | ||
+ | |Databases for dorado | ||
+ | |- | ||
+ | |[https://www.ebi.ac.uk/ena ena] | ||
+ | |20241008 | ||
+ | |The European Nucleotide Archive (ENA) captures and presents information relating to experimental workflows that are based around nucleotide sequencing. | ||
+ | |- | ||
+ | |[https://github.com/steineggerlab/foldseek foldseek-db] | ||
+ | |20230921 | ||
+ | |Foldseek computes for each match a simple estimate for the probability that the match is a true positive match given its structural bit score. Here, hits within the same superfamily are TP, hits to another fold are FP, and hits to the same family or to another superfamily are ignored. | ||
+ | |- | ||
+ | |[https://github.com/nextgenusfs/funannotate funannotate-db] | ||
+ | |20220428<br>20240515 | ||
+ | |funannotate is a pipeline for genome annotation (built specifically for fungi, but will also work with higher eukaryotes). Installation, usage, and more information can be found at http://funannotate.readthedocs.io | ||
+ | |- | ||
+ | |[https://software.broadinstitute.org/gatk/download/bundle gatkbundle] | ||
+ | |20191118 | ||
+ | |he GATK resource bundle is a collection of standard files for working with human resequencing data with the GATK. We provide several versions of the bundle corresponding to the various reference builds, | ||
+ | |- | ||
+ | |[https://www.ncbi.nlm.nih.gov/genbank/ genbank] | ||
+ | |250 | ||
+ | |GenBank ® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences | ||
+ | |- | ||
+ | |[https://gtdb.ecogenomic.org/ gtdb] | ||
+ | |207<br>214 | ||
+ | |GENOME TAXONOMY DATABASE | ||
+ | |- | ||
+ | |[https://github.com/biobakery/humann/ humann-db] | ||
+ | |201901b | ||
+ | |Databases for HUMAnN | ||
+ | |- | ||
+ | |[https://www.ebi.ac.uk/interpro/ interpro] | ||
+ | |68.0<br>101.0<br>102.0 | ||
+ | |InterPro provides functional analysis of proteins by classifying them into families and predicting domains and important sites. | ||
+ | |- | ||
+ | |[https://github.com/biobakery/kneaddata KneadData-db] | ||
+ | |20230405 | ||
+ | |Databases for KneadData | ||
+ | |- | ||
+ | |[https://benlangmead.github.io/aws-indexes/k2 kraken2-db] | ||
+ | |20220327<br>20230314<br>20240112 | ||
+ | |Kraken 2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies. | ||
+ | |- | ||
+ | |[ftp://ftp.ncbi.nlm.nih.gov/blast/db/ ncbi-blastdb] | ||
+ | |20220318<br>20230124 | ||
+ | |BLAST search pages under the Basic BLAST section of the NCBI BLAST home page(http://blast.ncbi.nlm.nih.gov/) use a standard set of BLAST databases for nucleotide, protein, and translated BLAST searches. | ||
+ | |- | ||
+ | |[https://pfam.xfam.org/ pfam] | ||
+ | |32.0<br>35.0<br>37.0 | ||
+ | |The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs) | ||
+ | |- | ||
+ | |pgap-db | ||
+ | |2021-07-01.build5508 | ||
+ | | | ||
+ | |- | ||
+ | |[https://www.ncbi.nlm.nih.gov/refseq/ refseq-db] | ||
+ | |211 | ||
+ | |The Reference Sequence (RefSeq) collection provides a comprehensive, integrated, non-redundant, well-annotated set of sequences, including genomic DNA, transcripts, and proteins. | ||
+ | |- | ||
+ | |[https://www.arb-silva.de silva] | ||
+ | |138.1 | ||
+ | |SILVA provides comprehensive, quality checked and regularly updated datasets of aligned small (16S/18S, SSU) and large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya). | ||
+ | |- | ||
+ | |[https://github.com/hyeshik/tailseeker tailseeker-db] | ||
+ | |20161215 | ||
+ | |Database Indexes for Tailseeker | ||
+ | |- | ||
+ | |[https://www.uniprot.org/ uniprot] | ||
+ | |2022_05<br>2023_04<br>2024_04<br>2024_05 | ||
+ | |The mission of UniProt is to provide the scientific community with a comprehensive, high-quality and freely accessible resource of protein sequence and functional information. | ||
|} | |} |
Latest revision as of 03:00, 10 October 2024
Application | Installed Versions | Description |
---|---|---|
alphafold-db | 20210917 20220118 20230405 20240213 |
Alphafold Databases |
bart2-db | 20240302 | Databases for Bart2 |
BUSCO-db | 4 | Based on evolutionarily-informed expectations of gene content of near-universal single-copy orthologs, BUSCO metric is complementary to technical metrics like N50. |
card-prevalence | 3.0.6 | November 2019 release - 85 pathogens, 116914 resistomes, and 182532 AMR allele sequences based on sequence data acquired from NCBI on July 31, 2019, analyzed using RGI 5.0.0 (DIAMOND homolog detection) and CARD 3.0.7. Includes pre-compiled k-mer classifier data for pathogen-of-origin prediction. |
checkm-db | 20150116 | CheckM Database |
checkm2-db | 20230511 | CheckM2 Database |
checkv-db | 1.5 | The main change in v1.5 was to remove putatively non-viral sequences to minimize high-confidence matches to the CheckV database for non-viral sequences. |
chocophlan | 0.1.1 | |
clusterblast | 20170105 | |
dorado-db | 20240501 | Databases for dorado |
ena | 20241008 | The European Nucleotide Archive (ENA) captures and presents information relating to experimental workflows that are based around nucleotide sequencing. |
foldseek-db | 20230921 | Foldseek computes for each match a simple estimate for the probability that the match is a true positive match given its structural bit score. Here, hits within the same superfamily are TP, hits to another fold are FP, and hits to the same family or to another superfamily are ignored. |
funannotate-db | 20220428 20240515 |
funannotate is a pipeline for genome annotation (built specifically for fungi, but will also work with higher eukaryotes). Installation, usage, and more information can be found at http://funannotate.readthedocs.io |
gatkbundle | 20191118 | he GATK resource bundle is a collection of standard files for working with human resequencing data with the GATK. We provide several versions of the bundle corresponding to the various reference builds, |
genbank | 250 | GenBank ® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences |
gtdb | 207 214 |
GENOME TAXONOMY DATABASE |
humann-db | 201901b | Databases for HUMAnN |
interpro | 68.0 101.0 102.0 |
InterPro provides functional analysis of proteins by classifying them into families and predicting domains and important sites. |
KneadData-db | 20230405 | Databases for KneadData |
kraken2-db | 20220327 20230314 20240112 |
Kraken 2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies. |
ncbi-blastdb | 20220318 20230124 |
BLAST search pages under the Basic BLAST section of the NCBI BLAST home page(http://blast.ncbi.nlm.nih.gov/) use a standard set of BLAST databases for nucleotide, protein, and translated BLAST searches. |
pfam | 32.0 35.0 37.0 |
The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs) |
pgap-db | 2021-07-01.build5508 | |
refseq-db | 211 | The Reference Sequence (RefSeq) collection provides a comprehensive, integrated, non-redundant, well-annotated set of sequences, including genomic DNA, transcripts, and proteins. |
silva | 138.1 | SILVA provides comprehensive, quality checked and regularly updated datasets of aligned small (16S/18S, SSU) and large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya). |
tailseeker-db | 20161215 | Database Indexes for Tailseeker |
uniprot | 2022_05 2023_04 2024_04 2024_05 |
The mission of UniProt is to provide the scientific community with a comprehensive, high-quality and freely accessible resource of protein sequence and functional information. |