Biocluster Applications

From Carl R. Woese Institute for Genomic Biology - University of Illinois Urbana-Champaign
Revision as of 15:15, 2 March 2015 by Mediawikiapi (talk | contribs)
Jump to navigation Jump to search
Application Installed Versions Description
454 2.6, 2.7, 2.8 The GS Data Analysis Software package includes the tools to investigate complex genomic variation in samples including de novo assembly, reference guided alignment and variant calling, and low abundance variant identification and quantification.
AbundantOTU+ 0.91b AbundantOTU+ is the successor of AbundantOTU with additional functionality. AbundantOTU+ deals with sequences from rare species as well, compared to AbundantOTU!!
abyss 1.2.5, 1.3.3, 1.3.4 ABySS is a de novo, parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes.
afni 2011_12_21_2014 AFNI (which might be an acronym for Analysis of Functional NeuroImages) is a set of C programs for processing, analyzing, and displaying functional MRI (FMRI) data - a technique for mapping human brain activity. It runs on Unix+X11+Motif systems, including SGI, Solaris, Linux, and Mac OS X. It is available free (in C source code format, and some precompiled binaries) for research purposes.
allpathslg 42911, 49856, 50095 ALLPATHS-LG is our original short read assembler and it works on both small and large (mammalian size) genomes. To use it, you should first generate ~100 base Illumina reads from two libraries: one from ~180 bp fragments, and one from ~3000 bp fragments, both at about 45x coverage. Sequence from longer fragments will enable longer-range continuity.
amos 3.1.0 The AMOS consortium is committed to the development of open-source whole genome assembly software
AmpliconNoise 1.2.7 AmpliconNoise is a collection of programs for the removal of noise from 454 sequenced PCR amplicons. It involves two steps the removal of noise from the sequencing itself and the removal of PCR point errors. This project also includes the Perseus algorithm for chimera removal.
art 20110922 ART is a set of simulation tools to generate synthetic next-generation sequencing reads. ART simulates sequencing reads by mimicking real sequencing process with empirical error models or quality profiles summarized from large recalibrated sequencing data. ART can also simulate reads using user own read error model or quality profiles.
artemis 14.0, 16.0 Artemis is a free genome browser and annotation tool that allows visualisation of sequence features, next generation data and the results of analyses within the context of the sequence, and also its six-frame translation.
aspera-connect 3.5.1 Aspera Connect is a self-installing web browser plug-in that powers high-speed uploads and downloads with the Aspera Connect Server, also enabling web-based transfers for Aspera web application faspexTM and Shares.
asymptote 2.32 Asymptote is a powerful descriptive vector graphics language that provides a natural coordinate-based framework for technical drawing. Labels and equations are typeset with LaTeX, for high-quality PostScript output.
augustus 2.6.1 AUGUSTUS is a program that predicts genes in eukaryotic genomic sequences.
b2g4pipe 2.5 adds environmental variables to blast2go pipe
babraham bioinformatics Description of the module
bali-phy 2.1.1 BAli-Phy is a Bayesian posterior sampler that employs Markov chain Monte Carlo to explore the joint space of alignment and phylogeny given molecular sequence data. Simultaneous estimation eliminates bias toward inaccurate alignment guide-trees, employs more sophisticated substitution models during alignment and automatically utilizes information in shared insertion/deletions to help infer phylogenies.
bam2fastq 1.1.0 There are a growing number of general-purpose SAM/BAM manipulation programs, including SAMtools, Picard, and Bamtools. This tool is not intended to duplicate the complex suite of tasks those programs perform. Rather, it is simply intended to extract raw sequences (with qualities). We envision this tool being primarily useful to those wishing to duplicate or extend previous analyses.
bamtools 0.9.0, 2.3.0 BamTools is a project that provides both a C++ API and a command-line toolkit for reading, writing, and manipulating BAM (genome alignment) files.
batik 1.7 Batik is a Java-based toolkit for applications or applets that want to use images in the Scalable Vector Graphics (SVG) format for various purposes, such as display, generation or manipulation.
bcbio-nextgen 0.6.5 A python toolkit providing best-practice pipelines for fully automated high throughput sequencing analysis. You write a high level configuration file specifying your inputs and analysis parameters. This input drives a parallel pipeline that handles distributed execution, idempotent processing restarts and safe transactional steps. The goal is to provide a shared community resource that handles the data processing component of sequencing analysis, providing researchers with more time to focus on the downstream biology.
bcftools 1.0 reading/writing BCF2/VCF/gVCF files and calling/filtering/summarising SNP and short indel sequence variants
beast 2.1.3 Bayesian evolutionary analysis by sampling trees
bedops 2.4.2 BEDOPS is an open-source command-line toolkit that performs highly efficient and scalable Boolean and other set operations, statistical calculations, archiving, conversion and other management of genomic data of arbitrary scale. Tasks can be easily split by chromosome for distributing whole-genome analyses across a computational cluster.
bedtools 2.10.0, 2.10.1, 2.17.0, 2.20.1, 2.21.0 Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome. For example, bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF.
biodatabase 1.0 Scripts to create, delete, and manage mysql databases on IGB's biodatabase server
bioperl 1.6.924 BioPerl project is an international association of developers of open source Perl tools for bioinformatics, genomics and life science research.
biopieces 0.48 adds environmental variables, software, and bins to biopieces
bismark 0.13.0 A bisulfite read mapper and methylation caller
blast+ 2.2.25+, 2.2.28+ adds environmental variables to qiime and qiime
blast-intel 2.2.26 The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.
blast 2.2.26 adds environmental variables to qiime and qiime
blast2go 2.5 Description of the module
blat 0.34 Blat is an alignment tool like BLAST, but it is structured differently. On DNA, Blat works by keeping an index of an entire genome in memory.
boost-intel 1.54 Boost provides free peer-reviewed portable C++ source libraries.
boost 1.54, 1.55 Boost provides free peer-reviewed portable C++ source libraries.
bowtie 0.12.8, 0.12.9, 1.0.0 Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: typically about 2.2 GB for the human genome (2.9 GB for paired-end).
bowtie2 2.0.0-beta6, 2.0.2, 2.0.5, 2.1.0 adds environmental variables to Bowtie2
breseq 0.24 {Determine mutations in evolved microbes from next-generation sequencing data.
bs-seeker2 may-28-2014 A versatile aligning pipeline for bisulfite sequencing data
bwa 0.5.9, 0.7.5a, 0.7.10 BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome
bzip2 1.0.6 bzip2 is a freely available, patent free (see below), high-quality data compressor. It typically compresses files to within 10% to 15% of the best available techniques (the PPM family of statistical compressors), whilst being around twice as fast at compression and six times faster at decompression.
cafe 2.2, 3.0 adds environmental variables to cafe
casava 1.8.2 adds environmental variables to casava
cd-hit 4.6, 4.6.1 CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences. CD-HIT was originally developed by Dr. Weizhong Li at Dr. Adam Godzik's Lab at the Burnham Institute (now Sanford-Burnham Medical Research Institute)
cdbfasta 0.99 fast indexing and retrieval of fasta records from flat file databases.
cgal 4.4 Homepage:
cgat 0.2.3 The CGAT code collection contains scripts and pipelines developed by CGAT. The collection contains scripts for genomics and next-generation sequencing analysis, but also general purpose scripts.
chance 1.0 CHANCE - CHip-seq ANalytics and Confidence Estimation. songlab.ucsf.edu/Software.html
chimera 1.5.3, 1.6.2 adds environmental variables for chimera
chlorop 1.1 predicts the presence of chloroplast transit peptides (cTP) in protein sequences and the location of potential cTP cleavage sites.
ChromHMM 1.10 ChromHMM is software for learning and characterizing chromatin states. ChromHMM can integrate multiple chromatin datasets such as ChIP-seq data of various histone modifications to discover de novo the major re-occuring combinatorial and spatial patterns of marks. ChromHMM is based on a multivariate Hidden Markov Model that explicitly models the presence or absence of each chromatin mark. The resulting model can then be used to systematically annotate a genome in one or more cell types. By automatically computing state enrichments for large-scale functional and annotation datasets ChromHMM facilitates the biological characterization of each state. ChromHMM also produces files with genome-wide maps of chromatin state annotations that can be directly visualized in a genome browser.
Circleator 1.0.0rc4 The Charm City Circleator–or Circleator for short–is a Perl-based visualization tool developed at the Institute for Genome Sciences in the University of Maryland’s School of Medicine.
circos 0.67 Circos is a software package for visualizing data and information. It visualizes data in a circular layout . this makes Circos ideal for exploring relationships between objects or positions. There are other reasons why a circular layout is advantageous, not the least being the fact that it is attractive.
cisa 1.3 Homepage:
cisgenome 2.0 An integrated tool for tiling array, ChIP-seq, genome and cis-regulatory element analysis.
cisMetalysis 1.3 New meta-analysis tools reveal common transcriptional regulatory basis for multiple determinants of behavior
ClonalFrame 1.2 in a nutshell, ClonalFrame identifies the clonal relationships between the members of a sample, while also estimating the chromosomal position of homologous recombination events that have disrupted the clonal inheritance.
clustal-omega 1.2.1 Clustal Omega is the latest addition to the Clustal family. It offers a significant increase in scalability over previous versions, allowing hundreds of thousands of sequences to be aligned in only a few hours.
clustalw 2.1 adds environmental variables to clustalw
cmake 2.8.12.2 CMake, the cross-platform, open-source build system
cufflinks 1.1.0, 1.3.0, 2.0.2, 2.1.1, 2.2.0, 2.2.1 Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one, taking into account biases in library preparation protocols.
cutadapt 1.2 cutadapt removes adapter sequences from high-throughput sequencing data. This is usually necessary when the read length of the sequencing machine is longer than the molecule that is sequenced, for example when sequencing microRNAs.
cython 0.21.1 Cython is an optimising static compiler for both the Python programming language and the extended Cython programming language (based on Pyrex). It makes writing C extensions for Python as easy as Python itself.
cytoscape 2.8.1, 2.8.3 Cytoscape is an open source software platform for visualizing complex networks and integrating these with any type of attribute data. A lot of Apps are available for various kinds of problem domains, including bioinformatics, social network analysis, and semantic web.
deconseq 0.4.3 The DeconSeq tool can be used to automatically detect and efficiently remove sequence contaminations from genomic and metagenomic datasets. It is easily configurable and provides a user-friendly interface.
dendropy 3.12.0 DendroPy is a Python library for phylogenetic computing. It provides classes and functions for the simulation, processing, and manipulation of phylogenetic trees and character matrices, and supports the reading and writing of phylogenetic data in a range of formats, such as NEXUS, NEWICK, NeXML, Phylip, FASTA, etc. Application scripts for performing some useful phylogenetic operations, such as data conversion and tree posterior distribution summarization, are also distributed and installed as part of the libary. DendroPy can thus function as a stand-alone library for phylogenetics, a component of more complex multi-library phyloinformatic pipelines, or as a scripting .glue. that assembles and drives such pipelines.
diamond 0.6.13 DIAMOND is a new high-throughput program for aligning a file of short reads against a protein reference database such as NR, at 20,000 times the speed of BLASTX, with high sensitivity
EagleView 2.2 adds environmental variables to EagleView
eeglab 13.3.2b EEGLAB is an interactive Matlab toolbox for processing continuous and event-related EEG, MEG and other electrophysiological data incorporating independent component analysis (ICA), time/frequency analysis, artifact rejection, event-related statistics, and several useful modes of visualization of the averaged and single-trial data.
efiest alpha, beta, devel adds environmental variables for EFI QUEST development branch
efignn 0.2.0 adds environmental variables for EFI QUEST development branch
emacs 24.3 Homepage:
EMBOSS 6.5.7 adds environmental variables to EMBOSS
emirge july-21-2014 EMIRGE reconstructs full length ribosomal genes from short read sequencing data.
erange 3.2.1, 3.3.0 A package of python scripts designed to analyze ultra-high-througphut sequencing data from the Illumina/Solexa platform for RNA-seq and ChIP-seq in metazoan genomes. The RNA-seq portion of ERANGE is described in our Nature Methods paper 'Mapping and quantifying mammalian transcriptomes by RNA-Seq' (Mortazavi, 2008). ERANGE is built on top of Cistematic.
erplab 4.0.2.3 ERPLAB Toolbox is a free, open-source Matlab package for analyzing ERP data. It is tightly integrated with EEGLAB Toolbox, extending EEGLAB’s capabilities to provide robust, industrial-strength tools for ERP processing, visualization, and analysis. A graphical user interface makes it easy for beginners to learn, and Matlab scripting provides enormous power for intermediate and advanced users.
est-precompute 0, 48, 49 Homepage:
estscan 3.0.3 ESTScan is a program that can detect coding regions in DNA sequences, even if they are of low quality. ESTScan will also detect and correct sequencing errors that lead to frameshifts.
ete2 2.2.1072 ETE is a python programming toolkit that assists in the automated manipulation, analysis and visualization of phylogenetic trees. It provides a wide range of tree handling options, node annotation features, programmatic access to the phylomeDB database (containing thousands of pre-calculated phylogenetic trees), and automatic orthology and paralogy detection. In addition, ETE implements an interactive tree visualization system as well as a highly customizable tree drawing engine to create PDF and SVG tree images. Note that, although ETE is mainly developed as a tool for phylogenetic analysis, it can also be used to deal with clustering trees or any other data that can be represented as a hierarchical tree.
fasta 36.3.5d adds environmental variables to qiime and qiime
fasta_splitter 1 Description of the module
fastqc 0.10.1, 0.11.2 FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses which you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis.
fastsimcoal2 1.1.1 fast sequential Markov coalescent simulation of genomic data under complex evolutionary models.
fasttree 2.1.7 FastTree infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences. FastTree can handle alignments with up to a million of sequences in a reasonable amount of time and memory. For large alignments, FastTree is 100-1,000 times faster than PhyML 3.0 or RAxML 7
fastx_toolkit 0.0.13, 0.0.14 adds environmental variables to fastx_toolkit
ffmpeg 2.1.3 FFmpeg is a complete, cross-platform solution to record, convert and stream audio and video. It includes libavcodec - the leading audio/video codec library
findpeaks 3.1.9.2 adds environmental variables to findpeaks
flexbar 2.5-beta Flexbar — flexible barcode and adapter removal, version 2.4
freebayes 0.9.6 adds environmental variables to freebayes
gams 24.2 The General Algebraic Modeling System (GAMS) is a high-level modeling system for mathematical programming and optimization. It consists of a language compiler and a stable of integrated high-performance solvers. GAMS is tailored for complex, large scale modeling applications, and allows you to build large maintainable models that can be adapted quickly to new situations.
gatk 1.6-5, 1.6-13, 2.5-2, 2.6-4, 3.2-2, 3.3-0 The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute to analyse next-generation resequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
gcc 4.8.1 The GNU Compiler Collection includes front ends for C, C++, Objective-C, Fortran, Java, Ada, and Go, as well as libraries for these languages (libstdc++, libgcj,...). GCC was originally written as the compiler for the GNU operating system. The GNU system was developed to be 100% free software, free in the sense that it respects the user's freedom.
gdal 1.10.1 is a translator library for raster geospatial data formats that is released under an X/MIT style Open Source license by the Open Source Geospatial Foundation. As a library, it presents a single abstract data model to the calling application for all supported formats.
geneid 1.0 adds environmental variables to geneid
genomer 0.0.10 adds environmental variables to usearch
genometools 1.5.1, 1.5.2 The GenomeTools genome analysis system is a free collection of bioinformatics tools (in the realm of genome informatics) combined into a single binary named gt. It is based on a C library named “libgenometools” which consists of several modules.
gerp jun-11-2014 Homepage:
gff 2.1 Description of the module
glimmer 3.02 adds environmental variables to bedtools
globusconnect 1.6 Allows you to use globus connect 1.6 to transfer files on globus
gmap 2011-09-14, 2013-03-31 adds environmental variables to gmap and gsnap
gmes v2.3e adds environmental variables to gmes
gnuplot 4.6.3 Description of the module: Gnuplot is a portable command-line driven graphing utility for Linux, OS/2, MS Windows, OSX, VMS, and many other platforms
graphviz 2.32.0 Graphviz is open source graph visualization software. Graph visualization is a way of representing structural information as diagrams of abstract graphs and networks. It has important applications in networking, bioinformatics, software engineering, database and web design, machine learning, and in visual interfaces for other technical domains.
groopm 0.2.10.19 GroopM is a metagenomic binning toolset. It leverages spatio-temoral dynamics (differential coverage) to accurately (and almost automatically) extract population genomes from multi-sample metagenomic datasets.
gsl 1.16 The GNU Scientific Library (GSL) is a numerical library for C and C++ programmers. It is free software under the GNU General Public License. The library provides a wide range of mathematical routines such as random number generators, special functions and least-squares fitting. There are over 1000 functions in total with an extensive test suite.
hapcompass 0.7.5 HAPCOMPASS: A fast cycle basis algorithm for accurate haplotype assembly of sequence data
hdf4 4.2.10 HDF (also known as HDF4) is a library and multi-object file format for storing and managing data between machines. There are two versions of HDF: HDF4 and HDF5. HDF4 is the first HDF format. Although HDF4 is still funded, new users that are not constrained to using HDF4, should use HDF5 .
hdf5 1.8.11 HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data. HDF5 is portable and is extensible, allowing applications to evolve in their use of HDF5. The HDF5 Technology suite includes tools and applications for managing, manipulating, viewing, and analyzing data in the HDF5 format.
hmmer-mpi 2.32-MPI-0.92 adds environmental variables to hmmer-mpi
hmmer 2.3.2, 3.0, 3.1b1 HMMER is used for searching sequence databases for homologs of protein sequences, and for making protein sequence alignments. It implements methods using probabilistic models called profile hidden Markov models (profile HMMs).
HOMER 4.7.2 HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis. It is a collection of command line programs for unix-style operating systems written in Perl and C++. HOMER was primarily written as a de novo motif discovery algorithm and is well suited for finding 8-20 bp motifs in large scale genomics data. HOMER contains many useful tools for analyzing ChIP-Seq, GRO-Seq, RNA-Seq, DNase-Seq, Hi-C and numerous other types of functional genomics sequencing data sets.
htseq 0.5.4, 0.6.1 HTSeq is a Python package that provides infrastructure to process data from high-throughput sequencing assays.
htslib 1.0 a C library for reading/writing high-throughput sequencing data
humann 0.99 HUMAnN is a pipeline for efficiently and accurately determining the presence/absence and abundance of microbial pathways in a community from metagenomic data.
icc 2013.5.192 Intel® Parallel Studio XE 2013 SP1 provides C/C++ and Fortran developers cutting edge performing compilers and libraries, the right parallel programming models, and complementary and compatible analysis tools.
idba-mt 20140307 IDBA-MT is an iterative De Bruijn Graph De Novo short read assembler for meta-transcriptome. It is purely de novo assembler based on paired-end RNA sequencing reads only. IDBA-MT is a post-processing software for IDBA-UD contigs for removing chimeria contigs and extending contig length using paired-end reads information.
idba 1.1.0, 1.1.1 IDBA is a practical iterative De Bruijn Graph De Novo Assembler for sequence assembly in bioinfomatics. Most assemblers based on de Bruijn graph build a de Bruijn graph with a specific k to perform the assembling task.
IGV 2.1.24, 2.3.40 The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations.
IGVTools 2.1.24 adds environmental variables to qiime and qiime
IMAGE 2.33 adds environmental variables to IMAGE
ImageMagick 6.7.8-9 ImageMagick® is a software suite to create, edit, compose, or convert bitmap images. It can read and write images in a variety of formats (over 100) including DPX, EXR, GIF, JPEG, JPEG-2000, PDF, PhotoCD, PNG, Postscript, SVG, and TIFF. Use ImageMagick to resize, flip, mirror, rotate, distort, shear and transform images, adjust image colors, apply various special effects, or draw text, lines, polygons, ellipses and Bézier curves.
infernal 1.1, 1.1.1, 1.1rc1, 1.1rc2 Infernal ('INFERence of RNA ALignment') is for searching DNA sequence databases for RNA structure and sequence similarities. It is an implementation of a special case of profile stochastic context-free grammars called covariance models (CMs).
inparanoid 4.1 adds environmental variables for inparanoid
iprscan 4.8-45, 5.2-44, 5.2-45, 5.4-47, 5.7-48 InterProScan is a bioinformatics tool that provides a one-stop-shop for automated sequence analysis of both protein and nucleic acid, the latter via a full six-frame translation. It offers the ability to identify both structural and functional regions of interest, based upon methods and models that have been generated by a large number of member groups ('member databases').
isaac-aligner 01.14.04.17, 01.14.11.07 Isaac: Ultra-fast whole genome secondary analysis on Illumina sequencing platform
isaac-variantcaller 1.0.6 Isaac: Ultra-fast whole genome secondary analysis on Illumina sequencing platform
JAGS 3.4.0 JAGS is Just Another Gibbs Sampler. It is a program for analysis of Bayesian hierarchical models using Markov Chain Monte Carlo (MCMC) simulation not wholly unlike BUGS. JAGS was written with three aims in mind:
java-32 1.7.0_71 Java is a programming language and computing platform first released by Sun Microsystems in 1995. There are lots of applications and websites that will not work unless you have Java installed, and more are created every day. Java is fast, secure, and reliable. From laptops to datacenters, game consoles to scientific supercomputers, cell phones to the Internet, Java is everywhere!
java 1.6.0_41, 1.7.0_07, 1.7.0_21, 1.7.0_55, 1.8.0_25 Java is a programming language and computing platform first released by Sun Microsystems in 1995. There are lots of applications and websites that will not work unless you have Java installed, and more are created every day. Java is fast, secure, and reliable. From laptops to datacenters, game consoles to scientific supercomputers, cell phones to the Internet, Java is everywhere!
jellyfish 1.1.6, 1.1.11 JELLYFISH is a tool for fast, memory-efficient counting of k-mers in DNA. A k-mer is a substring of length k, and counting the occurrences of all such substrings is a central step in many analyses of DNA sequence. JELLYFISH can count k-mers using an order of magnitude less memory and an order of magnitude faster than other k-mer counting packages by using an efficient encoding of a hash table and by exploiting the 'compare-and-swap' CPU instruction to increase parallelism.
khmer 1.3 khmer is a library and suite of command line tools for working with DNA sequence. It is primarily aimed at short-read sequencing data such as that produced by the Illumina platform. khmer takes a k-mer-centric approach to sequence analysis, hence the name.
kmergenie 1.5854, 1.6950 Description of the module
kraken 0.10.5b Kraken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies. Previous attempts by other bioinformatics software to accomplish this task have often used sequence alignment or machine learning techniques that were quite slow, leading to the development of less sensitive but much faster abundance estimation programs. Kraken aims to achieve high sensitivity and high speed by utilizing exact alignments of k-mers and a novel classification algorithm.
krona 2.2 adds environmental variables to qiime and qiime
lamarc 2.1.8 adds environmental variables to cufflinks
lapack 3.5.0 Homepage:
last 278, 545 Finds similar regions between sequences.
lastz 1.02.00 adds environmental variables to lastz
libstree 0.4.2 libstree is a generic suffix tree implementation, written in C. It can handle arbitrary data structures as elements of a string. Unlike most demo implementations, it is not limited to simple ASCII character strings.
libxml2 2.9.1 Libxml2 is the XML C parser and toolkit developed for the Gnome project (but usable outside of the Gnome platform), it is free software available under the MIT License.
libxslt 1.1.28 Libxslt is the XSLT C library developed for the GNOME project. XSLT itself is a an XML language to define transformation for XML.
LR-TRIRLS 20060531 LR-TRIRLS stands for Logistic Regression with Truncated Regularized Iteratively Re-weighted Least Squares. This is our contribution to LR computation.
lua 5.1.4, 5.2.2 adds environmental variables to lua
MACS 1.4.2, 1.4.2-1, 2.0.10, 2.1.0 Next generation parallel sequencing technologies made chromatin immunoprecipitation followed by sequencing (ChIP-Seq) a popular strategy to study genome-wide protein-DNA interactions, while creating challenges for analysis algorithms. We present Model-based Analysis of ChIP-Seq (MACS) on short reads sequencers such as Genome Analyzer (Illumina / Solexa). MACS empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites.
mafft 6.953, 7.130 adds environmental variables to mafft
MaryGold 0.2 The package enables detection of sequence variation between metagenomic samples.
matlab r2010b, r2013b, r2014b MATLAB® is a high-level language and interactive environment for numerical computation, visualization, and programming. Using MATLAB, you can analyze data, develop algorithms, and create models and applications.
mauve 2.3.1 Mauve is a system for efficiently constructing multiple genome alignments in the presence of large-scale evolutionary events such as rearrangement and inversion. Multiple genome alignment provides a basis for research into comparative genomics and the study of evolutionary dynamics. Aligning whole genomes is a fundamentally different problem than aligning short sequences.
mblast 1.4.2 Homepage:
mcl 12-068 adds environmental variables to orthomcl
MEGAN 4.70.4, 5.1.0 In metagenomics, the aim is to understand the composition and operation of complex microbial consortia in environmental samples through sequencing and analysis of their DNA. Similarly, metatranscriptomics and metaproteomics target the RNA and proteins obtained from such samples.
meme 4.10.0-1 Motif-based sequence analysis tools
MetaGeneMark 2010 adds environmental variables to MetaGeneMark
metAMOS 1.2 metAMOS is an integrated assembly and analysis pipeline for metagenomic data. It is built around the Bambus2 metagenomic scaffolder and includes many current tools for assembly, gene finding, and taxonomic classification.
metaphlan 1.7.8 MetaPhlAn is a computational tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data. MetaPhlAn relies on unique clade-specific marker genes identified from 3,000 reference genomes
metatrans 11-4-2014 Translate metagenomic reads
MetaVelvet 1.2.02, 1.2.02-kmer245 adds environmental variables to MetaVelvet
mfinder 1.2 adds environmental variables to mfinder
mira 3.2.1, 4.0.2 MIRA is a whole genome shotgun and EST sequence assembler for Sanger, 454, Solexa (Illumina), IonTorrent data and PacBio (the later at the moment only CCS and error-corrected CLR reads)
MOCAT 1.1 adds environmental variables for MOCAT
ModalClust 0.2, 0.3 adds environmental variables to ModalClust
mothur 1.8.0, 1.28.0, 1.31.2, 1.32.0, 1.32.1, 1.33.3 This project seeks to develop a single piece of open-source, expandable software to fill the bioinformatics needs of the microbial ecology community. We have incorporated the functionality of dotur, sons, treeclimber, s-libshuff, unifrac, and much more. In addition to improving the flexibility of these algorithms, we have added a number of other features including calculators and visualization tools.
MotifScan 6 Swan Motif Scanning Software
mpich 3.0.4 MPICH is a high performance and widely portable implementation of the Message Passing Interface (MPI) standard.
mpich2 1.4.1p1, 1.5 MPICH is a high performance and widely portable implementation of the Message Passing Interface (MPI) standard.
mrmr 20090420 mRMR algorithm (minimum Redundancy Maximum Relavance
msort 1.0 adds environmental variables to msort
MultiMarkdown 4.6 MultiMarkdown, or MMD, is a tool to help turn minimally marked-up plain text into well formatted documents, including HTML, PDF
multiz 2009-jan-21 Alignment software
MUMmer 3.23 adds environmental variables to MUMmer
muscle 3.8.31 adds environmental variables to velvet
mytaxa 0.1.0 MyTaxa represents a new algorithm that extends the Average Amino Acid Identity (AAI) concept (Konstantinidis and Tiedje, PNAS 2005) to identify the taxonomic affiliation of a query genome sequence or a sequence of a contig assembled from a metagenome, including short sequences (e.g., 100-1,000nt long), and to classify sequences representing novel taxa at three levels (whenever possible), i.e., species, genus and phylum.
NAMD 2.10b2 AMD, recipient of a 2002 Gordon Bell Award and a 2012 Sidney Fernbach Award, is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. Based on Charm++ parallel objects, NAMD scales to hundreds of cores for typical simulations and beyond 200,000 cores for the largest simulations. NAMD uses the popular molecular graphics program VMD for simulation setup and trajectory analysis, but is also file-compatible with AMBER, CHARMM, and X-PLOR. NAMD is distributed free of charge with source code. You can build NAMD yourself or download binaries for a wide variety of platforms. Our tutorials show you how to use NAMD and VMD for biomolecular modeling.
nco 4.4.2 NCO manipulates data stored in netCDF-accessible formats, including DAP, HDF4, and HDF5. It also exploits the geophysical expressivity of many CF (Climate & Forecast) metadata conventions, the flexible description of physical dimensions translated by UDUnits, the network transparency of OPeNDAP, the storage features (e.g., compression, chunking, groups) of HDF (the Hierarchical Data Format), and many powerful mathematical and statistical algorithms of GSL (the GNU Scientific Library). NCO is fast, powerful, and free.
netcdf 4.3.1.1 NetCDF is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data.
networkx 1.9.1 networkX is a Python language software package for the creation, manipulation, and study of the structure, dynamics, and function of complex networks.
node 0.10.28 Node.js is a platform built on Chrome's JavaScript runtime for easily building fast, scalable network applications.
novocraft 2.08, 2.08.02, 2.08.03, 3.00.01, 3.00.02, 3.00.05, 3.02 Powerful tool designed for mapping of short reads onto a reference genome from Illumina, Ion Torrent, and 454 NGS platforms.
oases-kmer245 0.2.8 Oases is a de novo transcriptome assembler designed to produce transcripts from short read sequencing technologies, such as Illumina, SOLiD, or 454 in the absence of any genomic assembly.
oases 0.2.8 Oases is a de novo transcriptome assembler designed to produce transcripts from short read sequencing technologies, such as Illumina, SOLiD, or 454 in the absence of any genomic assembly.
ocaml 4.01 Ocaml Compiler
OLB 1.9.4 adds environmental variables to qiime and qiime
oligotyping 1.4 a novel computational method that can help microbial ecologists to investigate concealed diversity at an extremely precise level within their closely related organisms by utilizing very subtle variations among 16S Ribosomal RNA gene pyro-tag sequences.
opam 1.1.1 Package manager for OCaml
openmpi-intel 1.6.3 The Open MPI Project is an open source MPI-2 implementation that is developed and maintained by a consortium of academic, research, and industry partners.
openmpi 1.4.3, 1.6.3 The Open MPI Project is an open source MPI-2 implementation that is developed and maintained by a consortium of academic, research, and industry partners.
orthomcl 2.0.2, 2.0.7 adds environmental variables to orthomcl
pagit 1.64 Tools to generate automatically high quality sequence by ordering contigs, closing gaps, correcting sequence errors and transferring annotation.
paml 4.4 adds environmental variables to paml
pandaseq 2.8 a program to align Illumina reads, optionally with PCR primers embedded in the sequence, and reconstruct an overlapping sequence.
panther 1.03 score protein sequences against the entire PANTHER HMM library and analyze your sequences
parallel-netcdf 1.4.1 Parallel netCDF (PnetCDF) is an I/O library that supports data access to netCDF files in parallel, a collaborative work of Northwestern University and Argonne National Laboratory.
parallel 20121122 adds environmental variables to orthomcl
pathway-tools 16.5 adds environmental variables to pathway-tools
PBJelly 12.9.14 adds environmental variables to PBJelly with smrtanalysis-1.4.0
peaksplitter 1.0 adds environmental variables to peaksplitter
pear 0.9.5 an ultrafast, memory-efficient and highly accurate pair-end read merger. It is fully parallelized and can run with as low as just a few kilobytes of memory.
perfsuite 1.0.0 adds environmental variables to perfsuite
perl 5.16.1, 5.20.1 Perl 5 is a highly capable, feature-rich programming language with over 25 years of development. Perl 5 runs on over 100 platforms from portables to mainframes and is suitable for both rapid prototyping and large scale development projects.
pfamscan 1.0 PFamscan
PfamScan 1.5 search a FASTA file against a library of Pfam HMMs
phast 1.3 Homepage:
phred 0.020425.c adds environmental variables to qiime and qiime
phylip 3.69 adds environmental variables to phylip
phylocsf may-16-2014 Phylogenetic analysis of multi-species genome sequence alignments to identify conserved protein-coding region
phymmbl 4.0 adds environmental variables to phymmbl
picard-tools 1.34, 1.73, 1.90 A set of Java command line tools for manipulating high-throughput sequencing data (HTS) data and formats.
picrust 1.0.0, devel PICRUSt (pronounced 'pie crust') is a bioinformatics software package designed to predict metagenome functional content from marker gene (e.g., 16S rRNA) surveys and full genomes.
platanus 1.2.1 Genome Assembler
plink 1.90beta This is a comprehensive update to Shaun Purcell's PLINK command-line program, developed by Christopher Chang with support from the NIH-NIDDK's Laboratory of Biological Modeling, the Purcell Lab at Mount Sinai School of Medicine, and others.
polyphen 2.2.2 PolyPhen-2 (Polymorphism Phenotyping v2) is a software tool which predicts possible impact of amino acid substitutions on the structure and function of human proteins using straightforward physical and evolutionary comparative considerations.
pplacer 1.0, 1.1 adds environmental variables to pplacer
prank 121002 adds environmental variables to prank
prinseq 0.20.4 PRINSEQ can be used to filter, reformat, or trim your genomic and metagenomic sequence data. It generates summary statistics of your sequences in graphical and tabular format. It is easily configurable and provides a user-friendly interface.
prodigal 2.0, 2.2 adds environmental variables to prodigal
proj 4.8.0 Program proj (release 3) is a standard Unix filter function which converts ge-ographic longitude and latitude coordinates into cartesian coordinates, (λ,φ)→(x,y), by means of a wide variety of cartographic projection functions.
prokka 1.10 Prokka is a software tool for the rapid annotation of prokaryotic genomes. A typical 4 Mbp genome can be fully annotated in less than 10 minutes on a quad-core computer, and scales well to 32 core SMP systems. It produces GFF3, GBK and SQN files that are ready for editing in Sequin and ultimately submitted to Genbank/DDJB/ENA.
pydnase 0.1.7 a library for analyzing DNase-seq data
pygraphviz 1.3rc1 PyGraphviz is a Python interface to the Graphviz graph layout and visualization package. With PyGraphviz you can create, edit, read, write, and draw graphs using Python to access the Graphviz graph data structure and layout algorithms.
pyqt 4.11.2 PyQt is the Python bindings for Digia's Qt cross-platform application development framework.
pysam 0.8.0 Pysam is a python module for reading and manipulating Samfiles. It's a lightweight wrapper of the samtools C-API
python 2.6.6, 2.7.3, 2.7.3-sqlite, 2.7.5-galaxy, 3.2.3, 3.4.1 Python is a programming language that lets you work quickly and integrate systems more effectively.
qiime 1.3.0, 1.5.0, 1.6.0, 1.7.0, 1.8 Loads qiimes environmental variables
qt 4.8.6 Qt is a cross-platform application and UI framework for developers using C++ or QML, a CSS & JavaScript like language
quake 0.3.4 Quake is a package to correct substitution sequencing errors in experiments with deep coverage (e.g. >15X), specifically intended for Illumina sequencing reads.
quast 2.2, 2.3 QUAST evaluates genome assemblies. It can works both with and without a given reference genome. The tool accepts multiple assemblies, thus is suitable for comparison.
quest alpha, devel adds environmental variables for EFI QUEST development branch
R-experimental experimental R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.
R 2.15.0, 2.15.1, 2.15.2, 3.0.0, 3.0.2, 3.1.0, 3.1.1, experimental R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.
rapsearch2 2.22 Reduced Alphabet based Protein similarity Search
RAxML 7.3.0 RAxML (Randomized Axelerated Maximum Likelihood) is a program for sequential and parallel Maximum Likelihood based inference of large phylogenetic trees. It has originally been derived from fastDNAml which in turn was derived from Joe Felsentein’s dnaml which is part of the PHYLIP package.
ray 2.20, 2.30 Ray -- Parallel genome assemblies for parallel DNA sequencing
rdp_classifier 2.5 The RDP Classifier is a naive Bayesian classifier that can rapidly and accurately provides taxonomic assignments from domain to genus, with confidence estimates for each assignment.
rdxplorer 3.2 adds environmental variables rdexplorer. use 'python rdxplorer.py to run'
repeatmasker 3.28, 4.0.5 RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences
rockhopper 1.2, 2.0.2 Rockhopper is a comprehensive and user-friendly system for computational analysis of bacterial RNA-seq data. As input, Rockhopper takes RNA sequencing reads output by high-throughput sequencing technology (FASTQ, QSEQ, FASTA, SAM, or BAM files)
rsa-tools 2012-10-09 adds environmental variables to qiime and qiime
rstudio 0.97.312 adds environmental variables to R
ruby 1.9.3 Ruby is a language of careful balance. Its creator, Yukihiro “Matz” Matsumoto, blended parts of his favorite languages (Perl, Smalltalk, Eiffel, Ada, and Lisp) to form a new language that balanced functional programming with imperative programming.
samtools 0.1.16, 0.1.18, 0.1.19, 1.0 Samtools is a suite of programs for interacting with high-throughput sequencing data. It consists of three separate repositories
sam_comp 0.7 adds environmental variables Python 2.7.3
scaffold-builder 2.1 The abundance of repeat elements in genomes can impede the assembly of a single sequence. The tool Scaffold_builder was designed to generate scaffolds (super contigs of sequences joined by N-bases) using the homology provided by a closely related reference sequence.
scipy 0.15.1 Python-based ecosystem of open-source software for mathematics, science, and engineering.
scons 2.3.4 SCons is an Open Source software construction tool.that is, a next-generation build tool.
shotmap 11-4-2014 Shotmap is a software workflow that functionally annotates and compares shotgun metagenomes
SICER 1.1 A clustering approach for identification of enriched domains from histone modification ChIP-Seq data
signalp 4.1 adds environmental variables to perl
sip 4.16.3 SIP is a tool that makes it very easy to create Python bindings for C and C++ libraries. It was originally developed to create PyQt, the Python bindings for the Qt toolkit, but can be used to create bindings for any C or C++ library.
siphy 0.5 SiPhy implements rigorous statistical tests to detect bases under selection from a multiple alignment data
smrtanalysis 1.4.0, 2.0.1 SMRT Analysis is a bioinformatics software suite for analyzing single molecule, real-time DNA sequencing data from Pacific Biosciences. This repository contains a link to download and view the SMRT Analysis source code. It is provided here for reference only and is currently not buildable.
SnpEff 3.2, 3.3e Genetic variant annotation and effect prediction toolbox. It annotates and predicts the effects of variants on genes (such as amino acid changes).
SOAP2 2.2 adds environmental variables to SOAP2
SOAPdenovo-Trans 1.02 SOAPdenovo-Trans is a de novo transcriptome assembler basing on the SOAPdenovo framework, adapt to alternative splicing and different expression level among transcripts.The assembler provides a more accurate, complete and faster way to construct the full-length transcript sets.
SOAPdenovo 2.04 adds environmental variables to SOAPdenovo
SOAPdenovo2 r240 adds environmental variables to SOAPdenovo2
SOAPec 2.01, 2.02 http://soap.genomics.org.cn/about.html
SOAPErrorCorrection 0.04 adds environmental variables to SOAPErrorCorrection
SOAPGapCloser 1.12 adds environmental variables to SOAPGapCloser
SOAPprepare 0.1 adds environmental variables to SOAPdenovo
SolexaQA 3.1.2 SolexaQA calculates sequence quality statistics and creates visual representations of data quality for second-generation sequencing data.
spades 3.0.0, 3.1.1, 3.5.0 SPAdes . St. Petersburg genome assembler . is intended for both standard isolates and single-cell MDA bacteria assemblies.
speciateit 184 This is a beta version of a speciation pipeline for 16S rRNA amplicon data. In principle, the pipeline can be used for speciation based on any highly preserved gene.
speciate_it 184 This is a beta version of a speciation pipeline for 16S rRNA amplicon data. In principle, the pipeline can be used for speciation based on any highly preserved gene.
sratoolkit 2.1.16, 2.3.5-2 The NCBI SRA Toolkit enables reading (dumping) of sequencing files from the SRA database and writing (loading) files into the .sra format (Note that this is not required for submission).
srna-tools 20130118 adds environmental variables to srna-tools
ssaha2 2.5.5 SSAHA is a software tool for very fast matching and alignment of DNA sequences. It achieves its fast search speed by converting sequence information into a 'hash table' data structure, which can then be searched very rapidly for matches.
SSPACE-premium 2.3.1 SSPACE is a script able to extend and scaffold pre-assembled contigs using one or more mate pairs or paired end libraries, or even a combination.
SSPACE 1.1 SSPACE is a script able to extend and scaffold pre-assembled contigs using one or more mate pairs or paired end libraries, or even a combination.
stacks 0.9996, 0.99994, 1.2.0 Stacks is a software pipeline for building loci from short-read sequences, such as those generated on the Illumina platform.
STAR 2.4.0h1 aligner
structure 2.3.4 The program structure is a free software package for using multi-locus genotype data to investigate population structure. Its uses include inferring the presence of distinct populations, assigning individuals to populations, studying hybrid zones, identifying migrants and admixed individuals, and estimating population allele frequencies in situations where many individuals are migrants or admixed.
stubb 2.1
subread 1.3.6, 1.4.5-p1, 1.4.6-p1 The Subread package comprises a suite of software programs for processing next-gen sequencing read data including:
supfam 1.75 SUPERFAMILY is a database of structural and functional annotation for all proteins and genomes.
tabix 0.2.6 Tabix indexes a TAB-delimited genome position file in.tab.bgz and creates an index file in.tab.bgz.tbi when region is absent from the command-line.
targetp 1.1 Secretory signal peptides, mitochondrial targeting peptides and chloroplast transit peptides in eukaryotes.
tassel 3.0, 4.0, 5.0 While TASSEL has changed considerably since its initial public release in 2001, its primary function continues to be providing tools to investigate the relationship between phenotypes and genotypes
tbb 4.2-3 Intel® Threading Building Blocks (Intel® TBB) lets you easily write parallel C++ programs that take full advantage of multicore performance, that are portable and composable, and that have future-proof scalability.
tcoffee 10-r1613 A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures
test amos-3.1.0, gcc-4.8.1, isaac-aligner-1.14.04.17, netcdf-4.3.1.1, perl-5.20.1 Perl 5 is a highly capable, feature-rich programming language with over 25 years of development. Perl 5 runs on over 100 platforms from portables to mainframes and is suitable for both rapid prototyping and large scale development projects.
tmhmm 2.0 Prediction of transmembrane helices in proteins
tophat 1.4.1 TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.
tophat2 2.0.4, 2.0.5, 2.0.6, 2.0.7, 2.0.8, 2.0.10, 2.0.13 TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.
treemix 1.12 TreeMix is a method for inferring the patterns of population splits and mixtures in the history of a set of populations. In the underlying model, the modern-day populations in a species are related to a common ancestor via a graph of ancestral populations. We use the allele frequencies in the modern populations to infer the structure of this graph.
trf 4.04 Tandem Repeats Finder is a program to locate and display tandem repeats in DNA sequences.
trimmomatic 0.22, 0.27, 0.30, 0.32 Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data.
trim_galore 0.3.7 A wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files, with some extra functionality for MspI-digested RRBS-type (Reduced Representation Bisufite-Seq) libraries.
trinityrnaseq-intel r2013-02-25, r2014-04-13, r2014-07-17 Trinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data.
trinityrnaseq r2012-06-08, r2013-02-25, r2013-08-14, r2014-04-13, r2014-07-17 Trinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data.
trinotate r20130826, r20140708 Trinotate is a comprehensive annotation suite designed for automatic functional annotation of transcriptomes, particularly de novo assembled transcriptomes, from model or non-model organisms.
truesight 0.06 Self-training Algorithm for Splice Junction Detection using RNA-seq.
uchime 4.2.40 UCHIME is an algorithm for detecting chimeric sequences. It is implemented in the uchime_ref and uchime_denovo commands.
ucsc 20130806, v312 This directory contains Genome Browser and Blat application binaries built for standalone
udunits 2.1.24 The UDUNITS package supports units of physical quantities. Its C library provides for arithmetic manipulation of units and for conversion of numeric values between compatible units. The package contains an extensive unit database, which is in XML format and user-extendable. The package also contains a command-line utility for investigating units and converting values.
usearch 4.2.66, 6.0.307, 6.1.544, 7.0.959, 7.0.1090, 8.0.1517 USEARCH is a unique sequence analysis tool with thousands of users world-wide. USEARCH offers search and clustering algorithms that are often orders of magnitude faster than BLAST.
USeq 8.5.1 USeq is a collection of software tools for for both low and high level analysis of next generation, ultra high throughput signature sequencing data from the Solexa, SOLiD, and 454 platforms.
vcftools 0.1.7, 0.1.11, 0.1.12b VCFtools is a program package designed for working with VCF files, such as those generated by the 1000 Genomes Project. The aim of VCFtools is to provide easily accessible methods for working with complex genetic variation data in the form of VCF files.
velvet-kmer245 1.2.10 Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454, developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI), near Cambridge, in the United Kingdom.
velvet 1.1.04, 1.2.08, 1.2.10 Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454, developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI), near Cambridge, in the United Kingdom.
VelvetOptimiser 2.2.5 VelvetOptimiser is a multi-threaded Perl script for automatically optimising the three primary parameter options (K, -exp_cov, -cov_cutoff) for the Velvet de novo sequence assembler.
vim 7.4 Vim is an advanced text editor that seeks to provide the power of the de-facto Unix editor 'Vi', with a more complete feature set. It's useful whether you're already using vi or using a different editor.
vsearch 1.0.7 VSEARCH stands for vectorized search, as the tool takes advantage of parallelism in the form of SIMD vectorization as well as multiple threads to perform accurate alignments at high speed.
weblogo 2.8.2, 3.3 WebLogo is a web based application designed to make the generation of sequence logos as easy and painless as possible.
wessim 1.0 A whole-exome sequencing simulator based on in silico exome capture
wgs 7.0 This tool provides sequence similarity searching against the EMBL (WGS) database using the FASTA suite of programs.
wise 2.2.3-rc7 Wise2 has four main executable programs using sequence inputs which are designed to provide access to the main algorithms sensibly. The algorithms you are interested in is genewise - compare protein information to genomic DNA and estwise - compare protein information to EST/cDNA DNA.
yasm 1.2.0 Yasm is a complete rewrite of the NASM assembler under the 'new' BSD License (some portions are under other licenses, see COPYING for details). Yasm currently supports the x86 and AMD64 instruction sets, accepts NASM and GAS assembler syntaxes, outputs binary, ELF32, ELF64, 32 and 64-bit Mach-O, RDOFF2, COFF, Win32, and Win64 object formats, and generates source debugging information in STABS, DWARF 2, and CodeView 8 formats
zlib 1.2.8 zlib is designed to be a free, general-purpose, legally unencumbered -- that is, not covered by any patents -- lossless data-compression library for use on virtually any computer hardware and operating system. The zlib data format is itself portable across platforms.