Difference between revisions of "Biocluster Applications"

From Carl R. Woese Institute for Genomic Biology - University of Illinois Urbana-Champaign
Jump to navigation Jump to search
Line 486: Line 486:
 
|-
 
|-
 
|efiest_v2
 
|efiest_v2
|beta, devel, devlocal, shared, sharedbeta, sharedlocal
+
|beta, devel, devlocal, prod, shared, sharedbeta, sharedlocal
 
|adds environmental variables for EFI shared library development branch
 
|adds environmental variables for EFI shared library development branch
 
|-
 
|-
Line 494: Line 494:
 
|-
 
|-
 
|efignn_v2
 
|efignn_v2
|beta, devel, devlocal
+
|beta, devel, devlocal, prod
|adds environmental variables for EFI QUEST development branch
+
|adds environmental variables for EFI GNT prod branch
 
|-
 
|-
 
|[http://eigen.tuxfamily.org/ eigen]
 
|[http://eigen.tuxfamily.org/ eigen]

Revision as of 05:00, 16 August 2017

Application Installed Versions Description
454 2.6, 2.7, 2.8 The GS Data Analysis Software package includes the tools to investigate complex genomic variation in samples including de novo assembly, reference guided alignment and variant calling, and low abundance variant identification and quantification.
a2ps 4.14 GNU a2ps is an Any to PostScript filter. Of course it processes plain text files, but also pretty prints quite a few popular languages.
AbundantOTU+ 0.91b AbundantOTU+ is the successor of AbundantOTU with additional functionality. AbundantOTU+ deals with sequences from rare species as well, compared to AbundantOTU!!
abyss 1.2.5, 1.3.3, 1.3.4 ABySS is a de novo, parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes.
AEGeAn 0.14.1 The AEGeAn Toolkit is designed for the Analysis and Evaluation of Genome Annotations. The toolkit includes a variety of analysis programs as well as a C library whose API provides access to AEGeAn's core functions and data structures.
afni 2011_12_21_2014 AFNI (which might be an acronym for Analysis of Functional NeuroImages) is a set of C programs for processing, analyzing, and displaying functional MRI (FMRI) data - a technique for mapping human brain activity. It runs on Unix+X11+Motif systems, including SGI, Solaris, Linux, and Mac OS X. It is available free (in C source code format, and some precompiled binaries) for research purposes.
agent 0.1.3 AGEnt performs in silico subtractive hybridization of core genome sequences, such as those produced by Spine, against a query genomic sequence to identify accessory genomic sequences (AGEs) in the query genome. Sequences are aligned using Nucmer, outputting sequences and sequence characteristics of those regions in the query genome that are not found in the core genome. If gene coordinate information is provided, a list of accessory genes in the query genome will also be provided
AlignGraph 20160714 AlignGraph is a software that extends and joins contigs or scaffolds by reassembling them with help provided by a reference genome of a closely related organism.
allpathslg 42911, 49856, 50095 ALLPATHS-LG is our original short read assembler and it works on both small and large (mammalian size) genomes. To use it, you should first generate ~100 base Illumina reads from two libraries: one from ~180 bp fragments, and one from ~3000 bp fragments, both at about 45x coverage. Sequence from longer fragments will enable longer-range continuity.
amos 3.1.0 The AMOS consortium is committed to the development of open-source whole genome assembly software
amphora 2.0 An Automated Phylogenomic Inference Pipeline for Bacterial and Archaeal Sequences.
AmpliconNoise 1.2.7 AmpliconNoise is a collection of programs for the removal of noise from 454 sequenced PCR amplicons. It involves two steps the removal of noise from the sequencing itself and the removal of PCR point errors. This project also includes the Perseus algorithm for chimera removal.
anaconda2 4.1.1 The Anaconda platform provides an enterprise-ready data analytics platform that empowers companies to adopt a modern open data science analytics architecture. Processing multi-workload data analytics – from batch through interactive to real-time – the platform is used for both ad hoc and production deployments.
ART-MountRainer 20160605 ART is a set of simulation tools to generate synthetic next-generation sequencing reads. ART simulates sequencing reads by mimicking real sequencing process with empirical error models or quality profiles summarized from large recalibrated sequencing data.
art 20110922 ART is a set of simulation tools to generate synthetic next-generation sequencing reads. ART simulates sequencing reads by mimicking real sequencing process with empirical error models or quality profiles summarized from large recalibrated sequencing data. ART can also simulate reads using user own read error model or quality profiles.
artemis 14.0, 16.0 Artemis is a free genome browser and annotation tool that allows visualisation of sequence features, next generation data and the results of analyses within the context of the sequence, and also its six-frame translation.
aspera-connect 3.5.1 Aspera Connect is a self-installing web browser plug-in that powers high-speed uploads and downloads with the Aspera Connect Server, also enabling web-based transfers for Aspera web application faspexTM and Shares.
ASTRAL 4.7.8 ASTRAL is a Java program for estimating a species tree given a set of unrooted gene trees. ASTRAL is statistically consistent under multi-species coalescent model (and thus is useful for handling ILS). It finds the tree that maximizes the number of induced quartet trees in the set of gene trees that are shared by the species tree.
ASTRID 20151120 ASTRID is a ILS-aware tool for species tree estimation.
asymptote 2.32 Asymptote is a powerful descriptive vector graphics language that provides a natural coordinate-based framework for technical drawing. Labels and equations are typeset with LaTeX, for high-quality PostScript output.
atlas 3.10.2 The ATLAS (Automatically Tuned Linear Algebra Software) project is an ongoing research effort focusing on applying empirical techniques in order to provide portable performance. At present, it provides C and Fortran77 interfaces to a portably efficient BLAS implementation, as well as a few routines from LAPACK.
augustus 2.6.1, 3.2.1 AUGUSTUS is a program that predicts genes in eukaryotic genomic sequences.
autoconf 2.65 Autoconf is an extensible package of M4 macros that produce shell scripts to automatically configure software source code packages. These scripts can adapt the packages to many kinds of UNIX-like systems without manual user intervention. Autoconf creates a configuration script for a package from a template file that lists the operating system features that the package can use, in the form of M4 macro calls.
automake 1.15 Automake is a tool for automatically generating Makefile.in files compliant with the GNU Coding Standards.
awscli 1.7.36 The AWS Command Line Interface (CLI) is a unified tool to manage your AWS services. With just one tool to download and configure, you can control multiple AWS services from the command line and automate them through scripts.
b2g4pipe 2.5 Blast2Go without GUI frontend for command line use
babraham bioinformatics Variety of projects made available by the Babraham Bioinformatics group
bali-phy 2.1.1 BAli-Phy is a Bayesian posterior sampler that employs Markov chain Monte Carlo to explore the joint space of alignment and phylogeny given molecular sequence data. Simultaneous estimation eliminates bias toward inaccurate alignment guide-trees, employs more sophisticated substitution models during alignment and automatically utilizes information in shared insertion/deletions to help infer phylogenies.
bam2fastq 1.1.0 There are a growing number of general-purpose SAM/BAM manipulation programs, including SAMtools, Picard, and Bamtools. This tool is not intended to duplicate the complex suite of tasks those programs perform. Rather, it is simply intended to extract raw sequences (with qualities). We envision this tool being primarily useful to those wishing to duplicate or extend previous analyses.
BamM 20150521 The primary motivation for building BamM was to replaace PySam in GroopM.
bamtools 0.9.0, 2.3.0, 2.4.0 BamTools is a project that provides both a C++ API and a command-line toolkit for reading, writing, and manipulating BAM (genome alignment) files.
barrnap 0.5, 0.7 BAsic Rapid Ribosomal RNA Predictor
batik 1.7 Batik is a Java-based toolkit for applications or applets that want to use images in the Scalable Vector Graphics (SVG) format for various purposes, such as display, generation or manipulation.
bazel 0.4.4 Bazel is Google's own build tool, now publicly available in Beta. Bazel has built-in support for building both client and server software, including client applications for both Android and iOS platforms. It also provides an extensible framework that you can use to develop your own build rules.
bcbio-nextgen 0.6.5 A python toolkit providing best-practice pipelines for fully automated high throughput sequencing analysis. You write a high level configuration file specifying your inputs and analysis parameters. This input drives a parallel pipeline that handles distributed execution, idempotent processing restarts and safe transactional steps. The goal is to provide a shared community resource that handles the data processing component of sequencing analysis, providing researchers with more time to focus on the downstream biology.
bcftools 1.0, 1.2, 1.3.1 reading/writing BCF2/VCF/gVCF files and calling/filtering/summarising SNP and short indel sequence variants
bcl2fastq 2.17 bcl2fastq Conversion Software both demultiplexes data and converts BCL files generated by Illumina sequencing systems to standard FASTQ file formats for downstream analysis.
beagle 4.1 Beagle is a software package that performs genotype calling, genotype phasing, imputation of ungenotyped markers, and identity-by-descent segment detection.
beast 2.1.3 Bayesian evolutionary analysis by sampling trees
bedops 2.4.2 BEDOPS is an open-source command-line toolkit that performs highly efficient and scalable Boolean and other set operations, statistical calculations, archiving, conversion and other management of genomic data of arbitrary scale. Tasks can be easily split by chromosome for distributing whole-genome analyses across a computational cluster.
bedtools 2.10.0, 2.10.1, 2.17.0, 2.20.1, 2.21.0, 2.24.0, 2.25.0 Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome. For example, bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF.
BETA 1.0.7 Binding and Expression Target Analysis (BETA) is a software package that integrates ChIP-seq of transcription factors or chromatin regulators with differential gene expression data to infer direct target genes.
biodatabase 1.1 Scripts to create, delete, and manage mysql databases on IGB's biodatabase server
biom-format 2.1.5 The BIOM file format (canonically pronounced biome) is designed to be a general-use format for representing biological sample by observation contingency tables. BIOM is a recognized standard for the Earth Microbiome Project and is a Genomics Standards Consortium supported project.
bioperl 1.6.924 BioPerl project is an international association of developers of open source Perl tools for bioinformatics, genomics and life science research.
biopieces 0.48 The Biopieces are a collection of bioinformatics tools that can be pieced together in a very easy and flexible manner to perform both simple and complex tasks. The Biopieces work on a data stream in such a way that the data stream can be passed through several different Biopieces, each performing one specific task: modifying or adding records to the data stream, creating plots, or uploading data to databases and web services.
biopsy 0.3.0 Biopsy is a framework for optimising the settings of any program or pipeline which produces a measurable output. It is particularly intended for bioinformatics, where computational pipelines take a long time to run, making optimisation of parameters using crude methods unfeasible. Biopsy will use a range of discrete optimisation strategies to rapidly find the settings that perform the best.
bismark 0.13.0, 0.14.5, 0.15.0, 0.16.1 A bisulfite read mapper and methylation caller
blast+ 2.2.25+, 2.2.28+, 2.2.31, 2.3.0, 2.5.0, 2.6.0 BLAST+ is a new suite of BLAST tools that utilizes the NCBI C++ Toolkit. The BLAST+ applications have a number of performance and feature improvements over the legacy BLAST applications.
blast-intel 2.2.26 The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.
blast 2.2.26 The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.
blast2go 2.5 Blast2GO is an ALL in ONE tool for functional annotation of (novel) sequences and the analysis of annotation data.
blat 0.34 Blat is an alignment tool like BLAST, but it is structured differently. On DNA, Blat works by keeping an index of an entire genome in memory.
blobtools 0.9.19.5 Application for the visualisation of (draft) genome assemblies and general assembly QC using TAGC (Taxon-annotated Gc-Coverage) plots
boost-intel 1.54 Boost provides free peer-reviewed portable C++ source libraries.
boost 1.54, 1.55, 1.59.0 Boost provides free peer-reviewed portable C++ source libraries.
bowtie 0.12.8, 0.12.9, 1.0.0, 1.1.2 Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: typically about 2.2 GB for the human genome (2.9 GB for paired-end).
bowtie2 2.0.0-beta6, 2.0.2, 2.0.5, 2.1.0, 2.2.5, 2.2.6 Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.
BRAKER1 1.9 BRAKER1: Unsupervised RNA-Seq-based genome annotation with GeneMark-ET and AUGUSTUS
breakdancer 1.1 BreakDancer-1.1, released under GPLv3, is a Perl/Cpp package that provides genome-wide detection of structural variants from next generation paired-end sequencing reads.
breseq 0.24, 0.25d, 0.28.1 breseq is a computational pipeline for finding mutations relative to a reference sequence in short-read DNA re-sequencing data for microbial sized genomes. breseq is a command line tool implemented in C++ and R.
bridger r2014-12-01 Bridger is an efficient de novo transcriptome assembler for RNA-Seq data. It expects as input RNA-Seq reads (single or paired) in fasta or fastq format, outputs all transcripts in fasta format, without using a reference genome.
bs-seeker2 may-28-2014 A versatile aligning pipeline for bisulfite sequencing data
busco 2.0, 3.0 BUSCO v3 provides quantitative measures for the assessment of genome assembly, gene set, and transcriptome completeness, based on evolutionarily-informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB v9.
bwa 0.5.9, 0.7.5a, 0.7.10, 0.7.15 BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome
BWASP 20160106 Bisulfite-seq data Workflow Automation Software and Protocols. The BWASP repository encompasses code we developed in the Brendel Group for scalable, reproducible analyses of bisulfite sequencing data.
bzip2 1.0.6 bzip2 is a freely available, patent free (see below), high-quality data compressor. It typically compresses files to within 10% to 15% of the best available techniques (the PPM family of statistical compressors), whilst being around twice as fast at compression and six times faster at decompression.
cafe 2.2, 3.0 Computational analysis of (gene) family evolution.
canu 1.2, 1.3, 1.4, 1.5 Canu is a fork of the Celera Assembler designed for high-noise single-molecule sequencing (such as the PacBio RSII or Oxford Nanopore MinION). The software is currently alpha level, feel free to use and report issues encountered.
casava 1.8.2 The CASAVA 1.8.2 package processes sequencing reads provided by RTA or OLB.
cctools 5.2.0 The Cooperative Computing Tools (cctools) contains Parrot, Chirp, Makeflow, Work Queue, SAND, and other software.
cd-hit 4.5.8, 4.6, 4.6.1, 4.6.1a CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences. CD-HIT was originally developed by Dr. Weizhong Li at Dr. Adam Godzik's Lab at the Burnham Institute (now Sanford-Burnham Medical Research Institute)
cdbfasta 0.99 fast indexing and retrieval of fasta records from flat file databases.
cegma v2.4.010312 a computational method, CEGMA (Core Eukaryotic Genes Mapping Approach), for building a highly reliable set of gene annotations in the absence of experimental data.
cgal 4.4 CGAL is a software project that provides easy access to efficient and reliable geometric algorithms in the form of a C++ library. CGAL is used in various areas needing geometric computation, such as geographic information systems, computer aided design, molecular biology, medical imaging, computer graphics, and robotics. The library offers data structures and algorithms like triangulations, Voronoi diagrams, Boolean operations on polygons and polyhedra, point set processing, arrangements of curves, surface and volume mesh generation, geometry processing, alpha shapes, convex hull algorithms, shape analysis, AABB and KD trees...
cgat 0.2.3 The CGAT code collection contains scripts and pipelines developed by CGAT. The collection contains scripts for genomics and next-generation sequencing analysis, but also general purpose scripts.
cgfp 20161205 This repository contains a protocol and several scripts to assist users in combining Sequence Similarity Network (SSN) information with ShortBRED output. This approach, called 'chemically-guided functional profiling,' is useful for studying large protein families in metagenomic and metatranscriptomic datasets.
chance 1.0 CHANCE - CHip-seq ANalytics and Confidence Estimation.
CheckM 0.9.7 CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes.
ChIA-PET2 0.9.2.1 ChIA-PET2 is a versatile and flexible pipeline for analysing different variants of ChIA-PET data from raw sequencing reads to chromatin loops.
chimera 1.5.3, 1.6.2 UCSF Chimera is a highly extensible program for interactive visualization and analysis of molecular structures and related data, including density maps, supramolecular assemblies, sequence alignments, docking results, trajectories, and conformational ensembles. High-quality images and animations can be generated.
chlorop 1.1 predicts the presence of chloroplast transit peptides (cTP) in protein sequences and the location of potential cTP cleavage sites.
ChromHMM 1.10 ChromHMM is software for learning and characterizing chromatin states. ChromHMM can integrate multiple chromatin datasets such as ChIP-seq data of various histone modifications to discover de novo the major re-occuring combinatorial and spatial patterns of marks. ChromHMM is based on a multivariate Hidden Markov Model that explicitly models the presence or absence of each chromatin mark. The resulting model can then be used to systematically annotate a genome in one or more cell types. By automatically computing state enrichments for large-scale functional and annotation datasets ChromHMM facilitates the biological characterization of each state. ChromHMM also produces files with genome-wide maps of chromatin state annotations that can be directly visualized in a genome browser.
circlator 1.4.1 A tool to circularize genome assemblies.
Circleator 1.0.0rc4 The Charm City Circleator–or Circleator for short–is a Perl-based visualization tool developed at the Institute for Genome Sciences in the University of Maryland’s School of Medicine.
circos 0.67 Circos is a software package for visualizing data and information. It visualizes data in a circular layout . this makes Circos ideal for exploring relationships between objects or positions. There are other reasons why a circular layout is advantageous, not the least being the fact that it is attractive.
cisa 1.3 CISA: Contig Integrator for Sequence Assembly
cisgenome 2.0 An integrated tool for tiling array, ChIP-seq, genome and cis-regulatory element analysis.
cisMetalysis 1.3 New meta-analysis tools reveal common transcriptional regulatory basis for multiple determinants of behavior
ClonalFrame 1.2 in a nutshell, ClonalFrame identifies the clonal relationships between the members of a sample, while also estimating the chromosomal position of homologous recombination events that have disrupted the clonal inheritance.
clustal-omega 1.2.1 Clustal Omega is the latest addition to the Clustal family. It offers a significant increase in scalability over previous versions, allowing hundreds of thousands of sequences to be aligned in only a few hours.
clustalw 2.1 Clustal W is a general purpose multiple alignment program for DNA or proteins.
cmake 2.8.12.2, 3.3.1, 3.8.0 CMake, the cross-platform, open-source build system
cnvnator 0.3.2 a tool for CNV discovery and genotyping from depth-of-coverage by mapped reads
CONCOCT 0.4.1 A program for unsupervised binning of metagenomic contigs by using nucleotide composition, coverage data in multiple samples and linkage data from paired end reads.
coreutils 8.27 The GNU Core Utilities are the basic file, shell and text manipulation utilities of the GNU operating system. These are the core utilities which are expected to exist on every operating system.
crass 0.3.12 Crass was designed primarily for short read data like the kind produced from the Illumina sequencing platform. However it has been tested on Sanger and Ion Torrent data as well. The current constraints are that reads shorter than 76bp in length will not be used and reads longer than 2000bp cannot be used
crisprtools 0.1.8 Crisprtools was developed to parse the crispr file format, which is an xml markup for describing Clustered Regularly Interspersed Short Palindromic Repeats. Crisprtools is written in c++ and uses libcrispr for all the heavy lifting.
cuda 8.0 CUDA® is a parallel computing platform and programming model invented by NVIDIA. It enables dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU). - See more at: http://www.nvidia.com/object/cuda_home_new.html#sthash.zHTOOaHX.dpuf
cufflinks 1.1.0, 1.3.0, 2.0.2, 2.1.1, 2.2.0, 2.2.1 Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one, taking into account biases in library preparation protocols.
curl 7.49.1 libcurl is a free and easy-to-use client-side URL transfer library, supporting DICT, FILE, FTP, FTPS, Gopher, HTTP, HTTPS, IMAP, IMAPS, LDAP, LDAPS, POP3, POP3S, RTMP, RTSP, SCP, SFTP, SMTP, SMTPS, Telnet and TFTP.
cutadapt 1.8.1 cutadapt removes adapter sequences from high-throughput sequencing data. This is usually necessary when the read length of the sequencing machine is longer than the molecule that is sequenced for example when sequencing microRNAs.
cython 0.21.1 Cython is an optimising static compiler for both the Python programming language and the extended Cython programming language (based on Pyrex). It makes writing C extensions for Python as easy as Python itself.
cytoscape 2.8.1, 2.8.3 Cytoscape is an open source software platform for visualizing complex networks and integrating these with any type of attribute data. A lot of Apps are available for various kinds of problem domains, including bioinformatics, social network analysis, and semantic web.
DALIGNER 20150421 The commands below permit one to find all significant local alignments between reads encoded in Dazzler database.
DAZZ_DB 20150421 To facilitate the multiple phases of the dazzler assembler, we organize all the read data into what is effectively a database of the reads and their meta-information.
deconseq 0.4.3 The DeconSeq tool can be used to automatically detect and efficiently remove sequence contaminations from genomic and metagenomic datasets. It is easily configurable and provides a user-friendly interface.
deepTools 1.6.0, 2.0 deepTools addresses the challenge of handling the large amounts of data that are now routinely generated from DNA sequencing centers. To do so, deepTools contains useful modules to process the mapped reads data to create coverage files in standard bedGraph and bigWig file formats.
deep_q_rl 20170324 This package provides a Lasagne/Theano-based implementation of the deep Q-learning algorithm described in:
dendropy 3.12.0, 4.0.3 DendroPy is a Python library for phylogenetic computing. It provides classes and functions for the simulation, processing, and manipulation of phylogenetic trees and character matrices, and supports the reading and writing of phylogenetic data in a range of formats, such as NEXUS, NEWICK, NeXML, Phylip, FASTA, etc. Application scripts for performing some useful phylogenetic operations, such as data conversion and tree posterior distribution summarization, are also distributed and installed as part of the libary. DendroPy can thus function as a stand-alone library for phylogenetics, a component of more complex multi-library phyloinformatic pipelines, or as a scripting .glue. that assembles and drives such pipelines.
detonate 1.9 DETONATE (DE novo TranscriptOme rNa-seq Assembly with or without the Truth Evaluation) consists of two component packages, RSEM-EVAL and REF-EVAL. Both packages are mainly intended to be used to evaluate de novo transcriptome assemblies, although REF-EVAL can be used to compare sets of any kinds of genomic sequences.
diamond 0.6.13, 0.7.9, 0.8.5, 0.8.36 DIAMOND is a new high-throughput program for aligning a file of short reads against a protein reference database such as NR, at 20,000 times the speed of BLASTX, with high sensitivity
DISCOVAR-denovo 52488 DISCOVAR de novo can generate de novo assemblies for both large and small genomes.
DosageConvertor 1.0.3 DosageConvertor is a C++ tool to convert dosage files (in VCF format) from Minimac3 to ther formats such as MaCH or PLINK.
EagleView 2.2 EagleView is an information-rich genome assembler viewer with data integration capability. EagleView can display a dozen different types of information including base qualities, machine specific trace signals, and genome feature annotations. It provides an easy way for inspecting visually the quality of a genome assembly and validating polymorphism candidate sites (e.g., SNPs) reported by polymorphism discovery tools. It can also facilitate data interpretation and hypothesis generation.
edirect 20150409 Entrez Direct (EDirect) is an advanced method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window.
eeglab 13.3.2b EEGLAB is an interactive Matlab toolbox for processing continuous and event-related EEG, MEG and other electrophysiological data incorporating independent component analysis (ICA), time/frequency analysis, artifact rejection, event-related statistics, and several useful modes of visualization of the averaged and single-trial data.
efidb 00000000, 20150212, 20150403, 20150522, 20150804, 20151013, 20151209, 20160219, 20160414, 20160609, 20160921, 20161121, 20170412, 20170515 Env variables for EFI database release 64
efidb_v2 ip62, ip63, ip64 Env variables for EFI database release 63
efiest alpha, beta, devel adds environmental variables for EFI QUEST development branch
efiest_v2 beta, devel, devlocal, prod, shared, sharedbeta, sharedlocal adds environmental variables for EFI shared library development branch
efignn 0.0.0, 0.2.0, 0.2.1, 0.2.2, 0.2.4 adds environmental variables for EFI QUEST development branch
efignn_v2 beta, devel, devlocal, prod adds environmental variables for EFI GNT prod branch
eigen 2.0.17, 3.2.4 Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms.
emacs 24.3 GNU Emacs is an extensible, customizable text editor—and more. At its core is an interpreter for Emacs Lisp, a dialect of the Lisp programming language with extensions to support text editing.
EMBOSS 6.5.7 EMBOSS is The European Molecular Biology Open Software Suite. EMBOSS is a free Open Source software analysis package specially developed for the needs of the molecular biology (e.g. EMBnet) user community. The software automatically copes with data in a variety of formats and even allows transparent retrieval of sequence data from the web. Also, as extensive libraries are provided with the package, it is a platform to allow other scientists to develop and release software in true open source spirit. EMBOSS also integrates a range of currently available packages and tools for sequence analysis into a seamless whole. EMBOSS breaks the historical trend towards commercial software packages.
emirge july-21-2014 EMIRGE reconstructs full length ribosomal genes from short read sequencing data.
erange 3.2.1, 3.3.0 A package of python scripts designed to analyze ultra-high-througphut sequencing data from the Illumina/Solexa platform for RNA-seq and ChIP-seq in metazoan genomes. The RNA-seq portion of ERANGE is described in our Nature Methods paper 'Mapping and quantifying mammalian transcriptomes by RNA-Seq' (Mortazavi, 2008). ERANGE is built on top of Cistematic.
erplab 4.0.2.3 ERPLAB Toolbox is a free, open-source Matlab package for analyzing ERP data. It is tightly integrated with EEGLAB Toolbox, extending EEGLAB’s capabilities to provide robust, industrial-strength tools for ERP processing, visualization, and analysis. A graphical user interface makes it easy for beginners to learn, and Matlab scripting provides enormous power for intermediate and advanced users.
est-precompute 0, 48, 49, 50, 51, 52 Homepage:
estscan 3.0.3 ESTScan is a program that can detect coding regions in DNA sequences, even if they are of low quality. ESTScan will also detect and correct sequencing errors that lead to frameshifts.
ete2 2.2.1072 ETE is a python programming toolkit that assists in the automated manipulation, analysis and visualization of phylogenetic trees. It provides a wide range of tree handling options, node annotation features, programmatic access to the phylomeDB database (containing thousands of pre-calculated phylogenetic trees), and automatic orthology and paralogy detection. In addition, ETE implements an interactive tree visualization system as well as a highly customizable tree drawing engine to create PDF and SVG tree images. Note that, although ETE is mainly developed as a tool for phylogenetic analysis, it can also be used to deal with clustering trees or any other data that can be represented as a hierarchical tree.
exonerate 2.2.0 exonerate is a generic tool for pairwise sequence comparison.
fasta 36.3.5d The FASTA programs find regions of local or global similarity between Protein or DNA sequences, either by searching Protein or DNA databases, or by identifying local duplications within a sequence. Other programs provide information on the statistical significance of an alignment. Like BLAST, FASTA can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.
fasta_splitter 0.1.1 This script divides a large FASTA file into a set of smaller, approximately equally sized files. It works with whole sequences, never dividing a sequence in the middle.
fastqc 0.10.1, 0.11.2, 0.11.4, 0.11.5 FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses which you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis.
fastsimcoal2 1.1.1, 2.5.2.21 While preserving all the simulation flexibility of simcoal2, fastsimcoal is now implemented under a faster continous-time sequential Markovian coalescent approximation, allowing it to efficiently generate genetic diversity for different types of markers along large genomic regions, for both present or ancient samples. It includes a parameter sampler allowing its integration into Bayesian or likelihood parameter estimation procedure.
fasttree 2.1.7 FastTree infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences. FastTree can handle alignments with up to a million of sequences in a reasonable amount of time and memory. For large alignments, FastTree is 100-1,000 times faster than PhyML 3.0 or RAxML 7
fastuniq 1.1 FastUniq as an ultrafast de novo tool for removal of duplicates in paired short DNA sequence reads in FASTQ format. FastUniq identifies duplicates by comparing sequences between read pairs and does not require complete genome sequences as prerequisites. FastUniq is capable of simultaneously handling reads with different lengths and results in highly efficient running time.
fastx_toolkit 0.0.13, 0.0.14 The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.
fdupes 1.6.1 FDUPES is a program for identifying duplicate files residing
ffmpeg 2.1.3 FFmpeg is a complete, cross-platform solution to record, convert and stream audio and video. It includes libavcodec - the leading audio/video codec library
fftw 3.3.5 FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data (as well as of even/odd data, i.e. the discrete cosine/sine transforms or DCT/DST).
findpeaks 3.1.9.2 Findpeaks was developed to perform analysis of ChIP-Seq experiments. It uses a naive algorithm for identifying regions of high coverage, which represent Chromatin Immunoprecipitation enrichment of sequence fragments, indicating the location of a bound protein of interest. A minimum height threshold is used to determine which 'peaks' are shown in the UCSC compatible wig file - if no threshold is use, all reads are shown. It collects and sorts the reads along each chromosomes, and identifies areas of enrichment, termed 'Peaks'. These peaks are particularly important in Chromatin Immunoprecipiation experiments (ChIP-Seq or ChIP-Solexa experiments), as they indicate the location of a bound protein of interest.
flexbar 2.5-beta, 2.7 Flexbar — flexible barcode and adapter removal
FragGeneScan 1.19 FragGeneScan is an application for finding (fragmented) genes in short reads.
freebayes 0.9.6 FreeBayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment.
fsl 5.0.8, 5.0.8(41) ERROR:102: Tcl command execution failed: setenv DYLD_LIBRARY_PATH $::env(LD_LIBRARY_PATH)
gams 24.2 The General Algebraic Modeling System (GAMS) is a high-level modeling system for mathematical programming and optimization. It consists of a language compiler and a stable of integrated high-performance solvers. GAMS is tailored for complex, large scale modeling applications, and allows you to build large maintainable models that can be adapted quickly to new situations.
gatk 1.6-5, 1.6-13, 2.5-2, 2.6-4, 3.2-2, 3.3-0, 3.4-0, 3.5, 3.6 The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute to analyse next-generation resequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
gcc 4.8.1, 4.9.3 The GNU Compiler Collection includes front ends for C, C++, Objective-C, Fortran, Java, Ada, and Go, as well as libraries for these languages (libstdc++, libgcj,...). GCC was originally written as the compiler for the GNU operating system. The GNU system was developed to be 100% free software, free in the sense that it respects the user's freedom.
gdal 1.10.1, 1.11.2 is a translator library for raster geospatial data formats that is released under an X/MIT style Open Source license by the Open Source Geospatial Foundation. As a library, it presents a single abstract data model to the calling application for all supported formats.
gdc-client 1.0.1 The raw sequence files, typically stored as BAM or FASTQ, make up the bulk of data. The size for a single file can vary greatly depending on the specific analysis; However, some of the whole genome BAM files in The Cancer Genome Atlas (TCGA) reach sizes of 200-300 GB. In such cases, a high performance data download and submission client is essential.
geneid 1.0, 1.4.4 geneid is a program to predict genes in anonymous genomic sequences designed with a hierarchical structure.
GeneMark 4.32 Novel genomic sequences can be analyzed either by the self-training program GeneMarkS (sequences longer than 50 kb) or by GeneMark.hmm with Heuristic models. For many species pre-trained model parameters are ready and available through the GeneMark.hmm page. Metagenomic sequences can be analyzed by MetaGeneMark , the program optimized for speed.
GeneSeqer 20140226 Gene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus.
genetorrent 3.8.7 CGHub provides binary and source distributions of the GeneTorrent client software for downloading sequence data from CGHub's repository.
gengetopt 2.22.6 Gengetopt is a tool to write command line option parsing code for C programs.
genomer 0.0.10 Genomer is command line glue for genome projects. I wrote this tool to simplify the small but tedious tasks required when finishing a genome. Genomer does not perform assembly, gap closing or genome annotation. Genomer does however make it easy to reorganise contigs in a genome, map annotations on to the genome and generate the files required to submit a genome.
genometools 1.5.1, 1.5.2, 1.5.7 The GenomeTools genome analysis system is a free collection of bioinformatics tools (in the realm of genome informatics) combined into a single binary named gt. It is based on a C library named “libgenometools” which consists of several modules.
gerp jun-11-2014(20) ERROR:102: Tcl command execution failed: set _description "GERP identifies constrained elements in multiple alignments by quantifying substitution deficits. These deficits represent substitutions that would have occurred if the element were neutral DNA, but did not occur because the element has been under functional constraint. We refer to these deficits as "Rejected Substitutions". Rejected substitutions are a natural measure of constraint that reflects the strength of past purifying selection on the element."
gevalt 2.0 GEVALT (GEnotype Visualization and ALgorithmic Tool) is designed to simplify and expedite the process of genotype analysis and disease association tests by providing a common interface to several common tasks relating to such analyses. It is aimed for analysis of unrelated individuals as well as two-generation families.
gff 2.1 GFF-Ex, a Genome Feature extraction package extracts Gene, Exon, Intron, Upstream Region of Gene (Promoters), Intergenic and CDS/cDNA sequences by just tweeting in the Genome Feature File (gff) along with the corresponding genome/chromosome sequence. GFF-Ex. is a fusion of shell and Perl, developed for platforms supporting UNIX based file system.
gffcompare 0.9.7 compare and evaluate the accuracy of RNA-Seq transcript assemblers (Cufflinks, Stringtie). collapse (merge) duplicate transcripts from multiple GTF/GFF3 files (e.g. resulted from assembly of different samples). classify transcripts from one or multiple GTF/GFF3 files as they relate to reference transcripts provided in a annotation file (also in GTF/GFF3 format)
gffutils 0.8.3, 0.8.4 gffutils is a Python package for working with GFF and GTF files in a hierarchical manner. It allows operations which would be complicated or time-consuming using a text-file-only approach.
glibc 2.14, 2.14.1 The GNU C Library project provides the core libraries for the GNU system and GNU/Linux systems
glimmer 3.02 Glimmer is a system for finding genes in microbial DNA, especially the genomes of bacteria, archaea, and viruses. Glimmer (Gene Locator and Interpolated Markov ModelER) uses interpolated Markov models (IMMs) to identify the coding regions and distinguish them from noncoding DNA.
gmap 2011-09-14, 2013-03-31, 2016-09-23 GMAP: A Genomic Mapping and Alignment Program for mRNA and EST Sequences
gmes v2.3e Unsupervised training is an important feature of the GeneMark-ES algorithm that identifies protein coding genes in eukaryotic genomes. This is the only eukaryotic gene finder that can perform gene prediction without curated training sets.
gmp 6.0.0 gmp needed gfor gcc compile Homepage:
gnupg 2.0.22 GnuPG is a complete and free implementation of the OpenPGP standard as defined by RFC4880 (also known as PGP). GnuPG allows to encrypt and sign your data and communication, features a versatile key management system as well as access modules for all kinds of public key directories. GnuPG, also known as GPG, is a command line tool with features for easy integration with other applications. A wealth of frontend applications and libraries are available. GnuPG also provides support for S/MIME and Secure Shell (ssh).
gnuplot 4.6.3 Gnuplot is a portable command-line driven graphing utility for Linux, OS/2, MS Windows, OSX, VMS, and many other platforms
gperf 3.0.4 GNU gperf is a perfect hash function generator. For a given list of strings, it produces a hash function and hash table, in form of C or C++ code, for looking up a value depending on the input string. The hash function is perfect, which means that the hash table has no collisions, and the hash table lookup needs a single string comparison only.
graphlan 1.0 GraPhlAn is a software tool for producing high-quality circular representations of taxonomic and phylogenetic trees. GraPhlAn focuses on concise, integrative, informative, and publication-ready representations of phylogenetically- and taxonomically-driven investigation.
graphviz 2.32.0 Graphviz is open source graph visualization software. Graph visualization is a way of representing structural information as diagrams of abstract graphs and networks. It has important applications in networking, bioinformatics, software engineering, database and web design, machine learning, and in visual interfaces for other technical domains.
groopm 0.2.10.19, 0.3.4 GroopM is a metagenomic binning toolset. It leverages spatio-temoral dynamics (differential coverage) to accurately (and almost automatically) extract population genomes from multi-sample metagenomic datasets.
gsl 1.16 The GNU Scientific Library (GSL) is a numerical library for C and C++ programmers. It is free software under the GNU General Public License. The library provides a wide range of mathematical routines such as random number generators, special functions and least-squares fitting. There are over 1000 functions in total with an extensive test suite.
gtool 0.7.5 GTOOL is a program for transforming sets of genotype data for use with the programs SNPTEST and IMPUTE.
guidance 2.01 GUIDANCE is meant to be used for weighting, filtering or masking unreliably aligned positions in sequence alignments before subsequent analysis. For example, align codon sequences (nucleotide sequences that code for proteins) using PAGAN, remove columns with low GUIDANCE scores, and use the remaining alignment to infer positive selection using the branch-site dN/dS test. Other analyses where GUIDANCE filtering could be useful include phylogeny reconstruction, reconstruction of the history of specific insertion and deletion events, inference of recombination events, etc.
gvcftools 0.16 gvcftools is a set of utilities to help create and analyze Genome VCF (gVCF) files.
hap.py 0.3.0 This is a set of programs based on htslib to compare VCF files by specified haplotype.
hapcompass 0.7.5 HAPCOMPASS: A fast cycle basis algorithm for accurate haplotype assembly of sequence data. HapCompass for polyploid genomes can currently be used to create accurate pairwise SNP phasings. However, it currently only produces an entire haplotype assembly that is consistent with the data. We are currently working on modifying the algorithm to produce the best haplotype assembly given the resolved compass graph.
hdf4 4.2.10 HDF (also known as HDF4) is a library and multi-object file format for storing and managing data between machines. There are two versions of HDF: HDF4 and HDF5. HDF4 is the first HDF format. Although HDF4 is still funded, new users that are not constrained to using HDF4, should use HDF5 .
hdf5 1.8.11, 1.8.16 HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data. HDF5 is portable and is extensible, allowing applications to evolve in their use of HDF5. The HDF5 Technology suite includes tools and applications for managing, manipulating, viewing, and analyzing data in the HDF5 format.
HiC-Pro 2.7.1 HiC-Pro was designed to process Hi-C data, from raw fastq files (paired-end Illumina data) to the normalized contact maps. Since version 2.7.0, HiC-Pro can analyse data from digestion protocols as well as data from protocols that do not require restriction enzyme such as DNase Hi-C.
hisat2 2.0.4 HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes (as well as against a single reference genome).
hmmer-mpi 2.32-MPI-0.92 adds environmental variables to hmmer-mpi
hmmer 2.3.2, 3.0, 3.1b1, 3.1b2 HMMER is used for searching sequence databases for homologs of protein sequences, and for making protein sequence alignments. It implements methods using probabilistic models called profile hidden Markov models (profile HMMs).
HOMER 4.7.2 HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis. It is a collection of command line programs for unix-style operating systems written in Perl and C++. HOMER was primarily written as a de novo motif discovery algorithm and is well suited for finding 8-20 bp motifs in large scale genomics data. HOMER contains many useful tools for analyzing ChIP-Seq, GRO-Seq, RNA-Seq, DNase-Seq, Hi-C and numerous other types of functional genomics sequencing data sets.
htseq 0.7.2 HTSeq is a Python package that provides infrastructure to process data from high-throughput sequencing assays.
htslib 1.0, 1.3.2 a C library for reading/writing high-throughput sequencing data
humann 0.99 HUMAnN is a pipeline for efficiently and accurately determining the presence/absence and abundance of microbial pathways in a community from metagenomic data.
hwloc 1.11.0 The Portable Hardware Locality (hwloc) software package provides a portable abstraction (across OS, versions, architectures, ...) of the hierarchical topology of modern architectures, including NUMA memory nodes, sockets, shared caches, cores and simultaneous multithreading.
icc 2013.5.192 Intel® Parallel Studio XE 2013 SP1 provides C/C++ and Fortran developers cutting edge performing compilers and libraries, the right parallel programming models, and complementary and compatible analysis tools.
idba-mt 20140307 IDBA-MT is an iterative De Bruijn Graph De Novo short read assembler for meta-transcriptome. It is purely de novo assembler based on paired-end RNA sequencing reads only. IDBA-MT is a post-processing software for IDBA-UD contigs for removing chimeria contigs and extending contig length using paired-end reads information.
idba 1.1.0, 1.1.1 IDBA is a practical iterative De Bruijn Graph De Novo Assembler for sequence assembly in bioinfomatics. Most assemblers based on de Bruijn graph build a de Bruijn graph with a specific k to perform the assembling task.
ideas 1.05 This program is designed to segment genomes in multiple cell types simultaneously so to better identify functional elements and epigenomic variation/conservation patterns, both globally and locally, across all cell types.
IGV 2.1.24, 2.3.40, 2.3.68, 2.3.95 The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations.
IGVTools 2.1.24, 2.3.68 The igvtools utility provides a set of tools for pre-processing data files.
IMAGE 2.33 IMAGE stands for Iterative Mapping and Assembly for Gap Elimination. It is a software designed to close gaps in any draft assembly using Illumina paired end reads.
ImageMagick 6.7.8-9 ImageMagick® is a software suite to create, edit, compose, or convert bitmap images. It can read and write images in a variety of formats (over 100) including DPX, EXR, GIF, JPEG, JPEG-2000, PDF, PhotoCD, PNG, Postscript, SVG, and TIFF. Use ImageMagick to resize, flip, mirror, rotate, distort, shear and transform images, adjust image colors, apply various special effects, or draw text, lines, polygons, ellipses and Bézier curves.
impute2 2.3.2 IMPUTE2 is a computer program for phasing observed genotypes and imputing missing genotypes.
infernal 1.1, 1.1.1, 1.1rc1, 1.1rc2 Infernal ('INFERence of RNA ALignment') is for searching DNA sequence databases for RNA structure and sequence similarities. It is an implementation of a special case of profile stochastic context-free grammars called covariance models (CMs).
inparanoid 4.1 The InParanoid program was developed at the Center for Genomics and Bioinformatics to address the need to identify orthologs. Homologs that originate from a speciation event are called orthologs and homologs that originate from a gene duplication event are called paralogs. If a duplication event predates the speciation event the parlogs are called outparalogs, and they can be present in different species. If instead an ortholog undegoes one or several duplication events, the resulting paralogs are called inparalogs, and they are co-orthologs to one or more orthologs in another species. Since an outparalog pair ought to have a more diversified function than inparalogs, it is useful to distinguish between the two. Furthermore, clustering inparalogs together allows proper identification of both one-to-one and many-to-many orthology cases.
interproscan 5.19, 5.22-61.0 InterProScan is the software package that allows sequences (protein and nucleic) to be scanned against InterPro's signatures. Signatures are predictive models, provided by several different databases, that make up the InterPro consortium.
io_lib 1.13.10 A fully developed set of DNA sequence assembly (Gap4 and Gap5), editing and analysis tools (Spin) for Unix, Linux, MacOSX and MS Windows.
iprscan 4.8-45, 5.2-44, 5.2-45, 5.4-47, 5.7-48, iprscan-5.10-50 InterProScan is a bioinformatics tool that provides a one-stop-shop for automated sequence analysis of both protein and nucleic acid, the latter via a full six-frame translation. It offers the ability to identify both structural and functional regions of interest, based upon methods and models that have been generated by a large number of member groups ('member databases').
isaac-aligner 01.14.04.17, 01.14.11.07 Isaac: Ultra-fast whole genome secondary analysis on Illumina sequencing platform
isaac-variantcaller 1.0.6 Isaac: Ultra-fast whole genome secondary analysis on Illumina sequencing platform
ITSx 1.0.11 ITSx is an open source software utility to extract the highly variable ITS1 and ITS2 subregions from ITS sequences, which is commonly used as a molecular barcode for e.g. fungi. As the inclusion of parts of the neighbouring, very conserved, ribosomal genes (SSU, 5S and LSU rRNA sequences) in the sequence identification process can lead to severely misleading results, ITSx identifies and extracts only the ITS regions themselves.
JAGS 3.4.0 JAGS is Just Another Gibbs Sampler. It is a program for analysis of Bayesian hierarchical models using Markov Chain Monte Carlo (MCMC) simulation not wholly unlike BUGS. JAGS was written with three aims in mind:
java-32 1.7.0_71 Java is a programming language and computing platform first released by Sun Microsystems in 1995. There are lots of applications and websites that will not work unless you have Java installed, and more are created every day. Java is fast, secure, and reliable. From laptops to datacenters, game consoles to scientific supercomputers, cell phones to the Internet, Java is everywhere!
java 1.6.0_41, 1.7.0_07, 1.7.0_21, 1.7.0_55, 1.8.0_25, 1.8.0_65, 1.8.0_121 Java is a programming language and computing platform first released by Sun Microsystems in 1995. There are lots of applications and websites that will not work unless you have Java installed, and more are created every day. Java is fast, secure, and reliable. From laptops to datacenters, game consoles to scientific supercomputers, cell phones to the Internet, Java is everywhere!
jellyfish 1.1.6, 1.1.11, 2.2.6 Jellyfish is a tool for fast, memory-efficient counting of k-mers in DNA. A k-mer is a substring of length k, and counting the occurrences of all such substrings is a central step in many analyses of DNA sequence. JELLYFISH can count k-mers quickly by using an efficient encoding of a hash table and by exploiting the 'compare-and-swap' CPU instruction to increase parallelism.
jemalloc 4.0.4 jemalloc is a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support.
kallisto 0.42.4 kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. It is based on the novel idea of pseudoalignment for rapidly determining the compatibility of reads with targets, without the need for alignment.
khmer 1.3, 2.0 khmer is a library and suite of command line tools for working with DNA sequence. It is primarily aimed at short-read sequencing data such as that produced by the Illumina platform. khmer takes a k-mer-centric approach to sequence analysis, hence the name.
kmergenie 1.5854, 1.6950 KmerGenie estimates the best k-mer length for genome de novo assembly. Given a set of reads, KmerGenie first computes the k-mer abundance histogram for many values of k. Then, for each value of k, it predicts the number of distinct genomic k-mers in the dataset, and returns the k-mer length which maximizes this number. Experiments show that KmerGenie's choices lead to assemblies that are close to the best possible over all k-mer lengths.
kraken 0.10.5b Kraken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies. Previous attempts by other bioinformatics software to accomplish this task have often used sequence alignment or machine learning techniques that were quite slow, leading to the development of less sensitive but much faster abundance estimation programs. Kraken aims to achieve high sensitivity and high speed by utilizing exact alignments of k-mers and a novel classification algorithm.
KREATION 20170214 KREATION calculates the contribution of each assembly.
krona 2.2 Krona allows hierarchical data to be explored with zoomable HTML5 pie charts. Krona charts can be created using an Excel template or Krona Tools, which includes support for several bioinformatics tools and raw data formats.
lamarc 2.1.8 LAMARC is a program which estimates population-genetic parameters such as population size, population growth rate, recombination rate, and migration rates. It approximates a summation over all possible genealogies that could explain the observed sample, which may be sequence, SNP, microsatellite, or electrophoretic data. LAMARC and its sister program Migrate are successor programs to the older programs Coalesce, Fluctuate, and Recombine, which are no longer being supported. The programs are memory-intensive but can run effectively on workstations; we support a variety of operating systems.
lapack 3.5.0, 3.6.0 LAPACK is written in Fortran 90 and provides routines for solving systems of simultaneous linear equations, least-squares solutions of linear systems of equations, eigenvalue problems, and singular value problems. The associated matrix factorizations (LU, Cholesky, QR, SVD, Schur, generalized Schur) are also provided, as are related computations such as reordering of the Schur factorizations and estimating condition numbers. Dense and banded matrices are handled, but not general sparse matrices. In all areas, similar functionality is provided for real and complex matrices, in both single and double precision.
last 278, 545 Finds similar regions between sequences.
lastz 1.02.00 LASTZ is a program for aligning DNA sequences, a pairwise aligner. Originally designed to handle sequences the size of human chromosomes and from different species, it is also useful for sequences produced by NGS sequencing technologies such as Roche 454.
libcrispr 1.0.1 A .crispr file is an opensource XML markup for describing clustered regularly interspersed short palindromic repeats (CRISPR). This includes information about the repeats, spacers and their arrangement.
libsbml 5.12.1 Systems Biology Markup Language (SBML), a free and open interchange format for computer models of biological processes.
libsbmlsim 1.1.0 LibSBMLSim is a library for simulating an SBML model which contains Ordinary Differential Equations (ODEs). LibSBMLSim provides simple command-line tool and several APIs to load an SBML model, perform numerical integration (simulate) and export its results. Both explicit and implicit methods are supported on libSBMLSim.
libstree 0.4.2 libstree is a generic suffix tree implementation, written in C. It can handle arbitrary data structures as elements of a string. Unlike most demo implementations, it is not limited to simple ASCII character strings.
libxml2 2.9.1 Libxml2 is the XML C parser and toolkit developed for the Gnome project (but usable outside of the Gnome platform), it is free software available under the MIT License.
libxslt 1.1.28 Libxslt is the XSLT C library developed for the GNOME project. XSLT itself is a an XML language to define transformation for XML.
llvm 4.0.0 The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
lobSTR 3.0.3 lobSTR is a tool for profiling Short Tandem Repeats (STRs) from high throughput sequencing data.
longranger 2.1.0, 2.1.2 Long Ranger is a set of analysis pipelines that processes Chromium sequencing output to align reads and call and phase SNPs, indels, and structural variants. There are five main pipelines, each triggered by a longranger command
loupe 2.1.0 Loupe is a genome browser designed to visualize the Linked-Read data produced by the 10x Chromium Platform. Loupe is named for a jeweler's loupe, which is used to inspect gems.
lp_solve 5.5.2 lp_solve is a free (see LGPL for the GNU lesser general public license) linear (integer) programming solver based on the revised simplex method and the Branch-and-bound method for the integers.
LR-TRIRLS 20060531 LR-TRIRLS stands for Logistic Regression with Truncated Regularized Iteratively Re-weighted Least Squares. This is our contribution to LR computation.
lua 5.1.4, 5.2.2 Lua is a powerful, fast, lightweight, embeddable scripting language. Lua combines simple procedural syntax with powerful data description constructs based on associative arrays and extensible semantics. Lua is dynamically typed, runs by interpreting bytecode for a register-based virtual machine, and has automatic memory management with incremental garbage collection, making it ideal for configuration, scripting, and rapid prototyping.
lumpy-sv 0.2.13 A probabilistic framework for structural variant discovery.
MACS 1.4.2, 1.4.2-1, 2.0.10, 2.1.0 Next generation parallel sequencing technologies made chromatin immunoprecipitation followed by sequencing (ChIP-Seq) a popular strategy to study genome-wide protein-DNA interactions, while creating challenges for analysis algorithms. We present Model-based Analysis of ChIP-Seq (MACS) on short reads sequencers such as Genome Analyzer (Illumina / Solexa). MACS empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites.
mafft 6.953, 7.130 MAFFT is a multiple sequence alignment program for unix-like operating systems. It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment of <∼200 sequences), FFT-NS-2 (fast; for alignment of <∼30,000 sequences), etc.
maker 2.31.9 MAKER is a portable and easily configurable genome annotation pipeline.
mariadb 10.1.24 MariaDB Server is one of the most popular database servers in the world. It’s made by the original developers of MySQL and guaranteed to stay open source. Notable users include Wikipedia, WordPress.com and Google.
MaryGold 0.2 The package enables detection of sequence variation between metagenomic samples.
MaSuRCA 3.2.2_RC1 MaSuRCA is whole genome assembly software. It combines the efficiency of the de Bruijn graph and Overlap-Layout-Consensus (OLC) approaches
matlab r2010b, r2013a, r2013b, r2014b MATLAB® is a high-level language and interactive environment for numerical computation, visualization, and programming. Using MATLAB, you can analyze data, develop algorithms, and create models and applications.
mauve 2.3.1 Mauve is a system for efficiently constructing multiple genome alignments in the presence of large-scale evolutionary events such as rearrangement and inversion. Multiple genome alignment provides a basis for research into comparative genomics and the study of evolutionary dynamics. Aligning whole genomes is a fundamentally different problem than aligning short sequences.
MaxBin 2.0.1 MaxBin2 is the next-generation of MaxBin (https://sourceforge.net/projects/maxbin/) that supports multiple samples at the same time. MaxBin is a software for binning assembled metagenomic sequences based on an Expectation-Maximization algorithm.
maxcutpack 3.0 The software generates a phylogenic tree using the Weighted Quartet MaxCut Algorithm
mblast 1.4.2 MulticoreWare Inc. has implemented the BLAST algorithm with a parallelized, performance-optimized design. This parallelized implementation is called mBLAST.
mcl 12-068 The MCL algorithm is short for the Markov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm for graphs (also known as networks) based on simulation of (stochastic) flow in graphs.
MCR r2013a The MATLAB Runtime is a standalone set of shared libraries that enables the execution of compiled MATLAB applications or components on computers that do not have MATLAB installed.
megahit 0.2.1 MEGAHIT is a single node assembler for large and complex metagenomics NGS reads, such as soil. It makes use of succinct de Bruijn graph (SdBG) to achieve low memory assembly.
MEGAN 4.70.4, 5.1.0 In metagenomics, the aim is to understand the composition and operation of complex microbial consortia in environmental samples through sequencing and analysis of their DNA. Similarly, metatranscriptomics and metaproteomics target the RNA and proteins obtained from such samples.
meme 4.10.0-1, 4.11.4_1 Motif-based sequence analysis tools
metabat 0.26.1 MetaBAT, An Efficient Tool for Accurately Reconstructing Single Genomes from Complex Microbial Communities
metageneannotator 1.0 MetaGeneAnnotator is a gene-finding program for prokaryote and phage.
MetaGeneMark 2010 Gene Prediction in Bacteria, Archaea, Metagenomes and Metatranscriptomes
metAMOS 1.2 metAMOS is an integrated assembly and analysis pipeline for metagenomic data. It is built around the Bambus2 metagenomic scaffolder and includes many current tools for assembly, gene finding, and taxonomic classification.
metaphlan 1.7.8 MetaPhlAn is a computational tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data. MetaPhlAn relies on unique clade-specific marker genes identified from 3,000 reference genomes
metatrans 11-4-2014 Translate metagenomic reads
MetaVelvet-SL 1.0 MetaVelvet identifies shared nodes (named chimeric nodes) between two subgraphs and disconnects two subgraphs by splitting the shared nodes. To identify chimeric nodes, MetaVelvet uses a simple heuristics based on coverage difference and paired-end information.
MetaVelvet 1.2.02, 1.2.02-kmer245 MetaVelvet : An extension of Velvet assembler to de novo metagenome assembly from short sequence reads
mfinder 1.2 mfinder is a software tool for network motifs detection.
microbiome_helper 20170217 An assortment of scripts to help process and automate various microbiome and metagenomic bioinformatic tools. We provide workflows, tutorials and a virtual box image to help researchers process microbial data.
mira 3.2.1, 4.0.2 MIRA is a whole genome shotgun and EST sequence assembler for Sanger, 454, Solexa (Illumina), IonTorrent data and PacBio (the later at the moment only CCS and error-corrected CLR reads)
miRDeep 20151029 miRDeep2 is a completely overhauled tool which discovers microRNA genes by analyzing sequenced RNAs. The tool reports known and hundreds of novel microRNAs with high accuracy in seven species representing the major animal clades. The low consumption of time and memory combined with user-friendly interactive graphic output makes miRDeep2 accessible for straightforward application in current reasearch.
MOCAT 1.1 MOCAT is a package for analyzing metagenomics datasets. Currently MOCAT supports Illumina single- and paired-end reads in raw FastQ format. Using MOCAT you can, for example, generate taxonomic profiles of, and assemble, metagenomes.
ModalClust 0.2, 0.3 Modal Clust is a program that clusters genomic sequences to 3% similarity. Modal clust was originally a command in R and was modified by Maksim Sipos and Carl Yeoman for use in 454 pyrosequencing data processing.
mothur 1.8.0, 1.28.0, 1.31.2, 1.32.0, 1.32.1, 1.33.3, 1.38.1 This project seeks to develop a single piece of open-source, expandable software to fill the bioinformatics needs of the microbial ecology community. We have incorporated the functionality of dotur, sons, treeclimber, s-libshuff, unifrac, and much more. In addition to improving the flexibility of these algorithms, we have added a number of other features including calculators and visualization tools.
MotifScan 6 Swan Motif Scanning Software
MP-EST 1.5 MP-EST estimates species trees from a set of gene trees by maximizing a pseudo-likelihood function. The input data of MP-EST are rooted binary gene trees produced by the maximum likelihood phylogenetic programs RAxML, PHYML, PHYLIP, and PAUP etc. In addition to the gene tree file, a control file must be generated for running MP-EST. The control file contains necessary parameters for running MP-EST.
mpich 3.0.4 MPICH is a high performance and widely portable implementation of the Message Passing Interface (MPI) standard.
mpich2 1.4.1p1, 1.5 MPICH is a high performance and widely portable implementation of the Message Passing Interface (MPI) standard.
mrmr 20090420 mRMR algorithm (minimum Redundancy Maximum Relavance
msmbuilder 3.5.0 MSMBuilder is an application and python library. It builds statistical models for high-dimensional time-series. The particular focus of the package is on the analysis of atomistic simulations of biomolecular dynamics such as protein folding and conformational change.
msort 1.0 Msort: a better sort tool; sort text file rows by multiple fields
MSTMap 20151202 MSTMap is a software tool that is capable of constructing genetic linkage maps efficiently and accurately. It can handle various mapping populations including BC1, DH, Hap, and RIL, among others. The tool builds the genetic linkage map by first constructing a Minimum Spanning Tree (MST), and hence the name MSTMap.
MultiMarkdown 4.6 MultiMarkdown, or MMD, is a tool to help turn minimally marked-up plain text into well formatted documents, including HTML, PDF
multiqc 0.7 MultiQC is a tool to create a single report with interactive plots for multiple bioinformatics analyses across many samples.
multiz 2009-jan-21 Alignment software
MUMmer 3.23 MUMmer is a system for rapidly aligning entire genomes, whether in complete or draft form. For example, MUMmer 3.0 can find all 20-basepair or longer exact matches between a pair of 5-megabase genomes in 13.7 seconds, using 78 MB of memory, on a 2.4 GHz Linux desktop computer. MUMmer can also align incomplete genomes; it can easily handle the 100s or 1000s of contigs from a shotgun sequencing project, and will align them to another set of contigs or a genome using the NUCmer program included with the system. If the species are too divergent for a DNA sequence alignment to detect similarity, then the PROmer program can generate alignments based upon the six-frame translations of both input sequences.
muscle 3.8.31 MUSCLE is one of the best-performing multiple alignment programs according to published benchmark tests, with accuracy and speed that are consistently better than CLUSTALW. MUSCLE can align hundreds of sequences in seconds.
muTect 1.1.4 MuTect is a method developed at the Broad Institute for the reliable and accurate identification of somatic point mutations in next generation sequencing data of cancer genomes.
MutSigCV 1.4 MutSig stands for 'Mutation Significance'. MutSig analyzes lists of mutations discovered in DNA sequencing, to identify genes that were mutated more often than expected by chance given background mutation processes.
mytaxa 0.1.0 MyTaxa represents a new algorithm that extends the Average Amino Acid Identity (AAI) concept (Konstantinidis and Tiedje, PNAS 2005) to identify the taxonomic affiliation of a query genome sequence or a sequence of a contig assembled from a metagenome, including short sequences (e.g., 100-1,000nt long), and to classify sequences representing novel taxa at three levels (whenever possible), i.e., species, genus and phylum.
NAMD 2.10b2 AMD, recipient of a 2002 Gordon Bell Award and a 2012 Sidney Fernbach Award, is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. Based on Charm++ parallel objects, NAMD scales to hundreds of cores for typical simulations and beyond 200,000 cores for the largest simulations. NAMD uses the popular molecular graphics program VMD for simulation setup and trajectory analysis, but is also file-compatible with AMBER, CHARMM, and X-PLOR. NAMD is distributed free of charge with source code. You can build NAMD yourself or download binaries for a wide variety of platforms. Our tutorials show you how to use NAMD and VMD for biomolecular modeling.
nanocorr 20160407 Error correction for oxford nanopore reads
nanocorrect 20150421 A prototype nanopore correction pipeline.
nanopolish 0.5.0, 0.6-dev, 0.6.0 A nanopore consensus algorithm using a signal-level hidden Markov model.
nco 4.4.2, 4.4.8 NCO manipulates data stored in netCDF-accessible formats, including DAP, HDF4, and HDF5. It also exploits the geophysical expressivity of many CF (Climate & Forecast) metadata conventions, the flexible description of physical dimensions translated by UDUnits, the network transparency of OPeNDAP, the storage features (e.g., compression, chunking, groups) of HDF (the Hierarchical Data Format), and many powerful mathematical and statistical algorithms of GSL (the GNU Scientific Library). NCO is fast, powerful, and free.
netcdf 4.3.1.1, 4.3.3.1 NetCDF is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data.
netlogo 5.3.1 NetLogo is a multi-agent programmable modeling environment.
networkx 1.9.1 networkX is a Python language software package for the creation, manipulation, and study of the structure, dynamics, and function of complex networks.
nextflow 0.15.6, 0.16.4, 0.17.3, 0.22.5, 0.22.6 Nextflow is a fluent DSL modelled around the UNIX pipe concept, that simplifies writing parallel and scalable pipelines in a portable manner.
ngsplot 2.61 ngs.plot is a program that allows you to easily visualize your next-generation sequencing (NGS) samples at functional genomic regions.
node 0.10.28 Node.js is a platform built on Chrome's JavaScript runtime for easily building fast, scalable network applications.
novocraft 2.08, 2.08.02, 2.08.03, 3.00.01, 3.00.02, 3.00.05, 3.02 Powerful tool designed for mapping of short reads onto a reference genome from Illumina, Ion Torrent, and 454 NGS platforms.
numba 0.32.0 Numba is an Open Source NumPy-aware optimizing compiler for Python sponsored by Continuum Analytics, Inc. It uses the remarkable LLVM compiler infrastructure to compile Python syntax to machine code.
oases-kmer245 0.2.8 Oases is a de novo transcriptome assembler designed to produce transcripts from short read sequencing technologies, such as Illumina, SOLiD, or 454 in the absence of any genomic assembly.
oases 0.2.8 Oases is a de novo transcriptome assembler designed to produce transcripts from short read sequencing technologies, such as Illumina, SOLiD, or 454 in the absence of any genomic assembly.
OBITools 1.2.9 The OBITools package is a set of programs specifically designed for analyzing NGS data in a DNA metabarcoding context, taking into account taxonomic information.
ocaml 4.01 Ocaml Compiler
ODIN 0.4.1 ODIN is an HMM-based approach to detect and analyse differential peaks in pairs of ChIP-seq data. ODIN performs genomic signal processing, peak calling and p-value calculation in an integrated framework. ODIN is tailored for the comparison of two ChIP-seq signals without replicates.
OLB 1.9.4 adds environmental variables to qiime and qiime
oligotyping 1.4 a novel computational method that can help microbial ecologists to investigate concealed diversity at an extremely precise level within their closely related organisms by utilizing very subtle variations among 16S Ribosomal RNA gene pyro-tag sequences.
opam 1.1.1 Package manager for OCaml
openbabel 2.3.1 Open Babel is a chemical toolbox designed to speak the many languages of chemical data. It's an open, collaborative project allowing anyone to search, convert, analyze, or store data from molecular modeling, chemistry, solid-state materials, biochemistry, or related areas.
OpenBLAS 0.2.14, 0.2.15, 0.2.19 OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13
openmpi-intel 1.6.3 The Open MPI Project is an open source MPI-2 implementation that is developed and maintained by a consortium of academic, research, and industry partners.
openmpi 1.4.3, 1.6.3, 1.8.8 The Open MPI Project is an open source MPI-2 implementation that is developed and maintained by a consortium of academic, research, and industry partners.
orthomcl 2.0.2, 2.0.7 OrthoMCL is a genome-scale algorithm for grouping orthologous protein sequences. It provides not only groups shared by two or more species/genomes, but also groups representing species-specific gene expansion families. So it serves as an important utility for automated eukaryotic genome annotation. OrthoMCL starts with reciprocal best hits within each genome as potential in-paralog/recent paralog pairs and reciprocal best hits across any two genomes as potential ortholog pairs. Related proteins are interlinked in a similarity graph. Then MCL (Markov Clustering algorithm,Van Dongen 2000; www.micans.org/mcl) is invoked to split mega-clusters. This process is analogous to the manual review in COG construction. MCL clustering is based on weights between each pair of proteins, so to correct for differences in evolutionary distance the weights are normalized before running MCL.
pagan 20150723 PAGAN is a general-purpose method for the alignment of sequence graphs. PAGAN is based on the phylogeny-aware progressive alignment algorithm and uses graphs to describe the uncertainty in the presence of characters at certain sequence positions. However, graphs also allow describing the uncertainty in input sequences and modelling e.g. homopolymer errors in Roche 454 reads, or representing inferred ancestral sequences against which other sequences can then be aligned. PAGAN is still under development and will hopefully evolve to an easy-to-use, general-purpose method for phylogenetic sequence alignment.
pagit 1.64 Tools to generate automatically high quality sequence by ordering contigs, closing gaps, correcting sequence errors and transferring annotation.
pal2nal 14 PAL2NAL is a program that converts a multiple sequence alignment of proteins and the corresponding DNA (or mRNA) sequences into a codon alignment. The program automatically assigns the corresponding codon sequence even if the input DNA sequence has mismatches with the input protein sequence, or contains UTRs, polyA tails.
paml 4.4, 4.8 PAML is a package of programs for phylogenetic analyses of DNA or protein sequences using maximum likelihood. It is maintained and distributed for academic use free of charge by Ziheng Yang.
pandaseq 2.8, 2.10 a program to align Illumina reads, optionally with PCR primers embedded in the sequence, and reconstruct an overlapping sequence.
panther 1.03 score protein sequences against the entire PANTHER HMM library and analyze your sequences
parallel-netcdf 1.4.1 Parallel netCDF (PnetCDF) is an I/O library that supports data access to netCDF files in parallel, a collaborative work of Northwestern University and Argonne National Laboratory.
parallel 20150322 GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input.
pathway-tools 16.5 Pathway Tools is a comprehensive systems-biology software system that is associated with the BioCyc database collection and supports several use cases in bioinformatics.
PBJelly 12.9.14 PBJelly is a highly automated pipeline that aligns long sequencing reads (such as PacBio RS reads or long 454 reads in fasta format) to high-confidence draft assembles. PBJelly fills or reduces as many captured gaps as possible to produce upgraded draft genomes. Each step in PBJelly’s workflow can be run on a cluster, thus parallelizing the gap filling process for rapid turn around, even for very large eukaryotic genomes.
pbzip2 1.1.12 PBZIP2 is a parallel implementation of the bzip2 block-sorting file compressor that uses pthreads and achieves near-linear speedup on SMP machines. The output of this version is fully compatible with bzip2 v1.0.2 or newer (ie: anything compressed with pbzip2 can be decompressed with bzip2). PBZIP2 should work on any system that has a pthreads compatible C++ compiler (such as gcc). It has been tested on: Linux, Windows (cygwin & MinGW), Solaris, Tru64/OSF1, HP-UX, OS/2, OSX, and Irix.
pb_calibration 10.21, 10.22 Filter and calibration programs for Illumina sequencing data (in BAM files).
pcre 8.37 The PCRE library is a set of functions that implement regular expression pattern matching using the same syntax and semantics as Perl 5. PCRE has its own native API, as well as a set of wrapper functions that correspond to the POSIX regular expression API. The PCRE library is free, even for building proprietary software.
peaksplitter 1.0 Subdivision of ChIP-seq/ChIP-chip regions into discrete signal peaks
pear 0.9.5 an ultrafast, memory-efficient and highly accurate pair-end read merger. It is fully parallelized and can run with as low as just a few kilobytes of memory.
PennCNV 1.0.3 PennCNV is a free software tool for Copy Number Variation (CNV) detection from SNP genotyping arrays. Currently it can handle signal intensity data from Illumina and Affymetrix arrays. With appropriate preparation of file format, it can also handle other types of SNP arrays and oligonucleotide arrays.
penncnv 1.0.3 PennCNV is a free software tool for Copy Number Variation (CNV) detection from SNP genotyping arrays. Currently it can handle signal intensity data from Illumina and Affymetrix arrays. With appropriate preparation of file format, it can also handle other types of SNP arrays and oligonucleotide arrays.
perfsuite 1.0.0(20) ERROR:102: Tcl command execution failed: set _description "PerfSuite is a collection of tools, utilities, and libraries for software performance analysis where the primary design goals are ease of use, comprehensibility, interoperability, and simplicity. This software can provide a good "entry point" for more detailed performance analysis and can help point the way towards selecting other tools and/or techniques using more specialized software if necessary (for example, tools/libraries from academic research groups or third-party commercial software)."
perl 5.14.2, 5.16.1, 5.20.1 Perl 5 is a highly capable, feature-rich programming language with over 25 years of development. Perl 5 runs on over 100 platforms from portables to mainframes and is suitable for both rapid prototyping and large scale development projects.
pfamscan 1.0 PFamscan
PfamScan 1.5 search a FASTA file against a library of Pfam HMMs
phantompeakqualtools 1.1 This package computes quick but highly informative enrichment and quality measures for ChIP-seq/DNase-seq/FAIRE-seq/MNase-seq data. It can also be used to obtain robust estimates of the predominant fragment length or characteristic tag shift values in these assays.
phast 1.3 PHAST is a freely available software package for comparative and evolutionary genomics. It consists of about half a dozen major programs, plus more than a dozen utilities for manipulating sequence alignments, phylogenetic trees, and genomic annotations (see left panel). For the most part, PHAST focuses on two kinds of applications: the identification of novel functional elements, including protein-coding exons and evolutionarily conserved sequences; and statistical phylogenetic modeling, including estimation of model parameters, detection of signatures of selection, and reconstruction of ancestral sequences.
phispy 2.3 PhiSpy: A novel algorithm for finding prophages in microbial genomes that combines similarity-based and composition-based strategies
phred 0.020425.c Phred is a base-calling program for DNA sequence traces. Phred reads DNA sequence chromatogram files and analyzes the peaks to call bases, assigning quality scores (Phred scores) to each base call.
phyla_amphora 1.0 A Phylum-specific Automated Phylogenomic Inference Pipeline for Bacterial Sequences.
phylip 3.69 PHYLIP (the PHYLogeny Inference Package) is a package of programs for inferring phylogenies (evolutionary trees). Methods that are available in the package include parsimony, distance matrix, and likelihood methods, including bootstrapping and consensus trees. Data types that can be handled include molecular sequences, gene frequencies, restriction sites and fragments, distance matrices, and discrete characters.
phylocsf may-16-2014 Phylogenetic analysis of multi-species genome sequence alignments to identify conserved protein-coding region
phylophlan 1.1.0 PhyloPhlAn is a computational pipeline for reconstructing highly accurate and resolved phylogenetic trees based on whole-genome sequence information. The pipeline is scalable to thousands of genomes and uses the most conserved 400 proteins for extracting the phylogenetic signal. PhyloPhlAn also implements taxonomic curation, estimation, and insertion operations.
phylosift 1.0.1 PhyloSift is a suite of software tools to conduct phylogenetic analysis of genomes and metagenomes. Using a reference database of protein sequences, PhyloSift can scan new sequences against that database for homologs and identify the phylogenetic relationship of the new sequence to the database sequences. During this procedure, high quality alignments of codon and
phymmbl 4.0(20) ERROR:102: Tcl command execution failed: set _description "Phymm, a new classification approach for metagenomics data which uses interpolated Markov models (IMMs) to taxonomically classify DNA sequences, can accurately classify reads as short as 100 bp. Its accuracy for short reads represents a significant leap forward over previous composition-based classification methods. PhymmBL (rhymes with "thimble"), the hybrid classifier included in this distribution which combines analysis from both Phymm and BLAST, produces even higher accuracy."
picard-tools 1.34, 1.73, 1.90, 1.131, 1.141, 2.1.0, 2.4.1 A set of Java command line tools for manipulating high-throughput sequencing data (HTS) data and formats.
picrust 1.0.0, devel PICRUSt (pronounced 'pie crust') is a bioinformatics software package designed to predict metagenome functional content from marker gene (e.g., 16S rRNA) surveys and full genomes.
pilon 1.21, 1.22 Pilon is a software tool which can be used to: Automatically improve draft assemblies and Find variation among strains, including large event detection
platanus 1.2.1 Platanus: PLATform for Assembling NUcleotide Sequences
plink 1.90beta3, 1.90beta This is a comprehensive update to Shaun Purcell's PLINK command-line program, developed by Christopher Chang with support from the NIH-NIDDK's Laboratory of Biological Modeling, the Purcell Lab at Mount Sinai School of Medicine, and others.
poa 2.0 POA is Partial Order Alignment, a fast program for multiple sequence alignment in bioinformatics. Its advantages are speed, scalability, sensitivity, and the superior ability to handle branching / indels in the alignment.
polyphen 2.2.2 PolyPhen-2 (Polymorphism Phenotyping v2) is a software tool which predicts possible impact of amino acid substitutions on the structure and function of human proteins using straightforward physical and evolutionary comparative considerations.
poretools 0.5.1, 0.6.0 a toolkit for working with nanopore sequencing data from Oxford Nanopore.
pplacer 1.0, 1.1 Pplacer places query sequences on a fixed reference phylogenetic tree to maximize phylogenetic likelihood or posterior probability according to a reference alignment. Pplacer is designed to be fast, to give useful information about uncertainty, and to offer advanced visualization and downstream analysis.
prank 121002, 150803 PRANK is a probabilistic multiple alignment program for DNA, codon and amino-acid sequences. It’s based on a novel algorithm that treats insertions correctly and avoids over-estimation of the number of deletion events. In addition, PRANK borrows ideas from maximum likelihood methods used in phylogenetics and correctly takes into account the evolutionary distances between sequences. Lastly, PRANK allows for defining a potential structure for sequences to be aligned and then, simultaneously with the alignment, predicts the locations of structural units in the sequences.
prinseq 0.20.4 PRINSEQ can be used to filter, reformat, or trim your genomic and metagenomic sequence data. It generates summary statistics of your sequences in graphical and tabular format. It is easily configurable and provides a user-friendly interface.
prodigal 2.0, 2.2, 2.6.2 Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) is a microbial (bacterial and archaeal) gene finding program developed at Oak Ridge National Laboratory and the University of Tennessee
proj 4.8.0 Program proj (release 3) is a standard Unix filter function which converts ge-ographic longitude and latitude coordinates into cartesian coordinates, (λ,φ)→(x,y), by means of a wide variety of cartographic projection functions.
prokka 1.10 Prokka is a software tool for the rapid annotation of prokaryotic genomes. A typical 4 Mbp genome can be fully annotated in less than 10 minutes on a quad-core computer, and scales well to 32 core SMP systems. It produces GFF3, GBK and SQN files that are ready for editing in Sequin and ultimately submitted to Genbank/DDJB/ENA.
pydnase 0.1.7 a library for analyzing DNase-seq data
pyfasta 0.5.2 Stores a flattened version of the fasta file without spaces or headers and uses either a mmap of numpy binary format or fseek/fread so the sequence data is never read into memory. Saves a pickle (.gdx) of the start, stop (for fseek/mmap) locations of each header in the fasta file for internal use.
pyfftw 0.10.4 pyFFTW is a pythonic wrapper around FFTW 3, the speedy FFT library. The ultimate aim is to present a unified interface for all the possible transforms that FFTW can perform.
pygraphviz 1.3rc1 PyGraphviz is a Python interface to the Graphviz graph layout and visualization package. With PyGraphviz you can create, edit, read, write, and draw graphs using Python to access the Graphviz graph data structure and layout algorithms.
pylearn2 201601 Pylearn2 is a machine learning library. Most of its functionality is built on top of Theano.
pypy 4.0.1 PyPy is a fast, compliant alternative implementation of the Python language (2.7.10 and 3.2.5). It has several advantages and distinct features:
pyqt 4.11.2 PyQt is the Python bindings for Digia's Qt cross-platform application development framework.
pyrad 3.0.66 The benefit of pyRAD over most alternative methods for analyzing RADseq-like data comes in its use of an alignment-clustering method (vsearch) that allows for the inclusion of indel variation which improves identification of homology across highly divergent samples. For this reason pyRAD is commonly employed for RADseq studies at deeper phylogenetic scales, however, it works equally well at shallow scales.
pysam 0.8.0, 0.8.4, 0.9.1.4 Pysam is a python module for reading and manipulating Samfiles. It's a lightweight wrapper of the samtools C-API
python 2.6.6, 2.7.3, 2.7.6, 2.7.9, 3.4.1 Python is a programming language that lets you work quickly and integrate systems more effectively.
qiime 1.3.0, 1.5.0, 1.6.0, 1.7.0, 1.8, 1.9.0, 1.9.1 QIIME is an open-source bioinformatics pipeline for performing microbiome analysis from raw DNA sequencing data. QIIME is designed to take users from raw sequencing data generated on the Illumina or other platforms through publication quality graphics and statistics. This includes demultiplexing and quality filtering, OTU picking, taxonomic assignment, and phylogenetic reconstruction, and diversity analyses and visualizations. QIIME has been applied to studies based on billions of sequences from tens of thousands of samples.
qt 4.8.6 Qt is a cross-platform application and UI framework for developers using C++ or QML, a CSS & JavaScript like language
quake 0.3.4 Quake is a package to correct substitution sequencing errors in experiments with deep coverage (e.g. >15X), specifically intended for Illumina sequencing reads.
quast 2.2, 2.3 QUAST evaluates genome assemblies. It can works both with and without a given reference genome. The tool accepts multiple assemblies, thus is suitable for comparison.
quest alpha, devel adds environmental variables for EFI QUEST development branch
R 2.15.0, 2.15.1, 2.15.2, 2.15.3, 3.0.0, 3.0.2, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.2.3, 3.3.3, experimental R is a free software environment for statistical computing and graphics.
RACA 0.9.1.1 Reference-Assisted Chromosome Assembly
radtools 1.2.4 Tools for processing RAD Sequencing Illumina reads.
ragout 2.0 Ragout (Reference-Assisted Genome Ordering UTility) is a tool for chromosome assembly using multiple references. Given a set of assembly fragments (contigs/scaffolds) and one or multiple related references (complete or draft), it produces a chromosome-scale assembly (as a set of scaffolds).
rapsearch2 2.22 Reduced Alphabet based Protein similarity Search
RAxML 7.3.0 RAxML (Randomized Axelerated Maximum Likelihood) is a program for sequential and parallel Maximum Likelihood based inference of large phylogenetic trees. It has originally been derived from fastDNAml which in turn was derived from Joe Felsentein’s dnaml which is part of the PHYLIP package.
ray 2.20, 2.30 Ray -- Parallel genome assemblies for parallel DNA sequencing
Rcorrector 1.0.1 Rcorrector(RNA-seq error CORRECTOR) is a kmer-based error correction method for RNA-seq data.
rdp_classifier 2.5 The RDP Classifier is a naive Bayesian classifier that can rapidly and accurately provides taxonomic assignments from domain to genus, with confidence estimates for each assignment.
rdxplorer 3.2 The RDXplorer (Read Depth eXplorer) is a computational tool for copy number variants (CNV) detection in whole human genome sequence data using read depth (RD) coverage. CNV detection is based on the Event-Wise Testing (EWT) algorithm recently published by our group (see Publications). The read depth coverage is estimated in non-overlapping intervals (100bp Windows) across an individual genome based on the pileup generated by SAMTools.
repeatmasker 3.28, 4.0.5 RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences
RFMix 1.5.4 A discriminative method for local ancestry inference
rMATS 3.0.9 MATS is a computational tool to detect differential alternative splicing events from RNA-Seq data. The statistical model of MATS calculates the P-value and false discovery rate that the difference in the isoform ratio of a gene between two conditions exceeds a given user-defined threshold.
rnaQUAST 1.1.0 Homepage:
rockhopper 1.2, 2.0.2 Rockhopper is a comprehensive and user-friendly system for computational analysis of bacterial RNA-seq data. As input, Rockhopper takes RNA sequencing reads output by high-throughput sequencing technology (FASTQ, QSEQ, FASTA, SAM, or BAM files)
root 6.06 A modular scientific software framework. It provides all the functionalities needed to deal with big data processing, statistical analysis, visualisation and storage.
rsa-tools 2012-10-09 RSAT: Regulatory Sequence Analysis Tools
RSEM 1.2.29, 1.2.31 RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data. The RSEM package provides an user-friendly interface, supports threads for parallel computation of the EM algorithm, single-end and paired-end read data, quality scores, variable-length reads and RSPD estimation.
rstudio 0.97.312, 0.98.1102 RStudio is a set of integrated tools designed to help you be more productive with R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management.
rtg-tools 3.6.2 RTG Tools contains utilities to easily manipulate and accurately compare multiple VCF files, as well as utilities for processing other common NGS data formats.
ruby 1.9.3, 2.3.1 Ruby is a language of careful balance. Its creator, Yukihiro “Matz” Matsumoto, blended parts of his favorite languages (Perl, Smalltalk, Eiffel, Ada, and Lisp) to form a new language that balanced functional programming with imperative programming.
s3cmd 1.5.2 S3cmd is a free command line tool and client for uploading, retrieving and managing data in Amazon S3 and other cloud storage service providers that use the S3 protocol, such as Google Cloud Storage or DreamHost DreamObjects. It is best suited for power users who are familiar with command line programs. It is also ideal for batch scripts and automated backup to S3, triggered from cron, etc.
salmon 0.4.2, 0.6.0, 0.7.1, 0.7.2, 0.8.2 Salmon is a wicked-fast program to produce a highly-accurate, transcript-level quantification estimates from RNA-seq data.
sambamba 0.6.3 Sambamba is a high performance modern robust and fast tool (and library), written in the D programming language, for working with SAM and BAM files. Current parallelised functionality is an important subset of samtools functionality, including view, index, sort, markdup, and depth.
samblaster 0.1.22 samblaster is a fast and flexible program for marking duplicates in read-id grouped1 paired-end SAM files. It can also optionally output discordant read pairs and/or split read mappings to separate SAM files, and/or unmapped/clipped reads to a separate FASTQ file. When marking duplicates, samblaster will require approximately 20MB of memory per 1M read pairs.
samtools 0.1.16, 0.1.18, 0.1.19, 1.0, 1.2, 1.3.1 Samtools is a suite of programs for interacting with high-throughput sequencing data. It consists of three separate repositories
sam_comp 0.7 This is a simple arithmetic coding based compressor for the SAM and BAM (DNA sequence alignment) file format.
scaffold-builder 2.1 The abundance of repeat elements in genomes can impede the assembly of a single sequence. The tool Scaffold_builder was designed to generate scaffolds (super contigs of sequences joined by N-bases) using the homology provided by a closely related reference sequence.
scipy 0.15.1 Python-based ecosystem of open-source software for mathematics, science, and engineering.
scons 2.3.4 SCons is an Open Source software construction tool.that is, a next-generation build tool.
selscan 1.1.0 selscan currently implements EHH, iHS, XP-EHH, and nSL, and requires phased data. It should be run separately for each chromosome and population (or population pair for XP-EHH).
SHAPEIT v2r837 SHAPEIT is a fast and accurate method for estimation of haplotypes (aka phasing) from genotype or sequencing data.
shortbred 0.9.4 ShortBRED is a pipeline to take a set of protein sequences, reduce them to a set of unique identifying strings ('markers'), and then search for these markers in metagenomic data and determine the presence and abundance of the protein families of interest.
shotmap 11-4-2014 Shotmap is a software workflow that functionally annotates and compares shotgun metagenomes
SICER 1.1 A clustering approach for identification of enriched domains from histone modification ChIP-Seq data
signalp 4.1 SignalP 4.1 server predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms: Gram-positive prokaryotes, Gram-negative prokaryotes, and eukaryotes. The method incorporates a prediction of cleavage sites and a signal peptide/non-signal peptide prediction based on a combination of several artificial neural networks.
silix 1.2.9 The software package SiLiX implements a new algorithm for the clustering of homologous sequences, based on single transitive links (single linkage) with alignment coverage constraints.
sip 4.16.3 SIP is a tool that makes it very easy to create Python bindings for C and C++ libraries. It was originally developed to create PyQt, the Python bindings for the Qt toolkit, but can be used to create bindings for any C or C++ library.
siphy 0.5 SiPhy implements rigorous statistical tests to detect bases under selection from a multiple alignment data
smrtanalysis 1.4.0, 2.0.1, 2.3.0 The SMRT Analysis software suite performs assembly and variant detection analysis of sequencing data generated by the Pacific Biosciences instrument.
snap 1.0beta.18, 1.0dev.96 SNAP is a new sequence aligner that is 3-20x faster and just as accurate as existing tools like BWA-mem, Bowtie2 and Novoalign. It runs on commodity x86 processors, and supports a rich error model that lets it cheaply match reads with more differences from the reference than other tools.
SNAP 20131129 (Semi-HMM-based Nucleic Acid Parser) gene prediction tool
SnpEff 3.2, 3.3e, 4.2 Genetic variant annotation and effect prediction toolbox. It annotates and predicts the effects of variants on genes (such as amino acid changes).
snphylo 20160204 Phylogenetic tree is a good tool to infer evolutionary relationships among various organisms so the tree has been used in many evolutionary studies. Consequently, phylogenetic tree based on SNP data have been determined in resequencing projects. However, there was no simple way to determine phylogenetic tree with the huge number of variants determined from resequencing data. Thus, we had developed new pipeline, SNPhylo, to construct phylogenetic tree based on SNP data. With this pipeline, user can construct a phylogenetic tree from a file containing huge SNP data.
SOAP2 2.2 SOAPaligner/soap2 is a member of the SOAP (Short Oligonucleotide Analysis Package). It is an updated version of SOAP software for short oligonucleotide alignment. The new program features in super fast and accurate alignment for huge amounts of short reads generated by Illumina/Solexa Genome Analyzer. Compared to soap v1, it is one order of magnitude faster.
SOAPdenovo-Trans 1.02, 1.03 SOAPdenovo-Trans is a de novo transcriptome assembler basing on the SOAPdenovo framework, adapt to alternative splicing and different expression level among transcripts.The assembler provides a more accurate, complete and faster way to construct the full-length transcript sets.
SOAPdenovo 2.04 SOAPdenovo is a novel short-read assembly method that can build a de novo draft assembly for the human-sized genomes. The program is specially designed to assemble Illumina GA short reads. It creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost effective way. Now the new version is available. SOAPdenovo2, which has the advantage of a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closing, and optimizes for large genome.
SOAPdenovo2 r240 SOAPdenovo is a novel short-read assembly method that can build a de novo draft assembly for the human-sized genomes. The program is specially designed to assemble Illumina GA short reads. It creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost effective way. Now the new version is available. SOAPdenovo2, which has the advantage of a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closing, and optimizes for large genome.
SOAPec 2.01, 2.02 The read correction package is a short-read correction tool and part of SOAPdenovo . It is specially designed to correct Illum ina GA short reads.
SOAPErrorCorrection 0.04 Another correction tool for SOAPdenovo
SOAPGapCloser 1.12 The GapCloser is designed to close the gaps emerging during the scaffolding process by SOAPdenovo or other assembler, using the abundant pair relationships of short reads.
SOAPprepare 0.1 Data Preparation Module generates necessary data for SOAPdenovo to run 'map' and 'scaff' steps from Contigs generated by SOAPdenovo or other assemblers with various length of kmer.
SolexaQA 3.1.2 SolexaQA calculates sequence quality statistics and creates visual representations of data quality for second-generation sequencing data.
SPADA 20150512 SPADA is a computational pipeline that, provided a multiple sequence alignment for an interested gene/protein family, identifies all members of this family in a target genome sequence. This pipeline comes with manually-curated protein sequence alignments for all Cysteine-Rich Peptides family in plant genomes.
spades 3.0.0, 3.1.1, 3.5.0, 3.6.2, 3.7.1, 3.10.0, 3.10.1 SPAdes . St. Petersburg genome assembler . is intended for both standard isolates and single-cell MDA bacteria assemblies.
speciateit 184 This is a beta version of a speciation pipeline for 16S rRNA amplicon data. In principle, the pipeline can be used for speciation based on any highly preserved gene.
speciate_it 184 This is a beta version of a speciation pipeline for 16S rRNA amplicon data. In principle, the pipeline can be used for speciation based on any highly preserved gene.
spine 0.1.2 Spine identifies a core genome from input genomic sequences. Sequences are aligned using Nucmer and regions found to be in common between all or a user-defined subset of genomes will be returned.
SPINGO 1.1 SPecies level IdentificatioN of metaGenOmic amplicons (alternatively: an olde English word meaning ‘strong beer’)
sratoolkit 2.1.16, 2.3.5-2, 2.5.1, 2.5.7, 2.8.1 The NCBI SRA Toolkit enables reading (dumping) of sequencing files from the SRA database and writing (loading) files into the .sra format (Note that this is not required for submission).
srna-tools 20130118 A variety of tools for the analysis of high-throughput small RNA data.
ssaha2 2.5.5 SSAHA is a software tool for very fast matching and alignment of DNA sequences. It achieves its fast search speed by converting sequence information into a 'hash table' data structure, which can then be searched very rapidly for matches.
SSPACE-premium 2.3.1 SSPACE is a script able to extend and scaffold pre-assembled contigs using one or more mate pairs or paired end libraries, or even a combination.
SSPACE 1.1 SSPACE is a script able to extend and scaffold pre-assembled contigs using one or more mate pairs or paired end libraries, or even a combination.
stacks 0.9996, 0.99994, 1.2.0, 1.40, 1.44 Stacks is a software pipeline for building loci from short-read sequences, such as those generated on the Illumina platform. Stacks was developed to work with restriction enzyme-based data, such as RAD-seq, for the purpose of building genetic maps and conducting population genomics and phylogeography.
STAR-MP 1.0 Species Tree informed Architecture Reconstruction - Maximum Parsimony
STAR 2.4.0h1, 2.4.2a, 2.5.0a, 2.5.1b STAR: Spliced Transcripts Alignment to a Reference
STEM-hy 1.0 STEM-hy is a program for inferring maximum likelihood species trees from a collection of estimated gene trees under the coalescent model.
stringtie 1.2.4, 1.3.3 StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantitate full-length transcripts representing multiple splice variants for each gene locus. Its input can include not only the alignments of raw reads used by other transcript assemblers, but also alignments longer sequences that have been assembled from those reads.In order to identify differentially expressed genes between experiments, StringTie's output can be processed by specialized software like Ballgown, Cuffdiff or other programs (DESeq2, edgeR, etc.).
structure 2.3.4 The program structure is a free software package for using multi-locus genotype data to investigate population structure. Its uses include inferring the presence of distinct populations, assigning individuals to populations, studying hybrid zones, identifying migrants and admixed individuals, and estimating population allele frequencies in situations where many individuals are migrants or admixed.
stubb 2.1
subread 1.3.6, 1.4.5-p1, 1.4.6-p1, 1.4.6-p4, 1.5.0 The Subread package comprises a suite of software programs for processing next-gen sequencing read data including:
Sunflower 1.1.0 Sunflower models the simultaneous binding of transcription factors to DNA. It uses a hidden Markov model that resembles a sunflower.
supernova 1.0.0, 1.1.0, 1.1.2, 1.1.5, 1.2.0 Supernova is a software package for de novo assembly from Chromium Linked-Reads that are made from a whole-genome library of an individual DNA source.
supfam 1.75 SUPERFAMILY is a database of structural and functional annotation for all proteins and genomes.
SVDetect r0.7m SVDetect is a application for the isolation and the type prediction of intra- and inter-chromosomal rearrangements from paired-end/mate-pair sequencing data provided by the high-throughput sequencing technologies
svtools 0.2.0 svtools is a suite of utilities designed to help bioinformaticians construct and explore cohort-level structural variation calls. It is designed to efficiently merge and genotype calls from speedseq sv across thousands to tens of thousands of genomes
svtyper 0.0.4 Bayesian genotyper for structural variants
swig 3.0.8 SWIG is a software development tool that connects programs written in C and C++ with a variety of high-level programming languages. SWIG is used with different types of target languages including common scripting languages such as Javascript, Perl, PHP, Python, Tcl and Ruby.
tabix 0.2.6 Tabix indexes a TAB-delimited genome position file in.tab.bgz and creates an index file in.tab.bgz.tbi when region is absent from the command-line.
tabview 1.4.2 View a CSV file in a spreadsheet-like display.
targetp 1.1 Secretory signal peptides, mitochondrial targeting peptides and chloroplast transit peptides in eukaryotes.
tasr 1.0 TASR is a bioinformatic pipeline that can annotate Transposable elements using siRNAs mapping
tassel 3.0, 4.0, 5.0 While TASSEL has changed considerably since its initial public release in 2001, its primary function continues to be providing tools to investigate the relationship between phenotypes and genotypes
tbb 4.2-3, 4.3, 4.4 Intel Threading Building Blocks (Intel TBB) lets you easily write parallel C++ programs that take full advantage of multicore performance, that are portable and composable, and that have future-proof scalability.
tbl2asn 20170612 Tbl2asn is a command-line program that automates the creation of sequence records for submission to GenBank. It uses many of the same functions as Sequin but is driven generally by data files. Tbl2asn generates .sqn files for submission to GenBank. Additional manual editing is not required before submission.
tcoffee 10-r1613 A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures
texinfo 6.1 Texinfo is the official documentation format of the GNU project. It was invented by Richard Stallman and Bob Chassell many years ago, loosely based on Brian Reid's Scribe and other formatting languages of the time. It is used by many non-GNU projects as well.
texlive 2015 TeX Live is an easy way to get up and running with the TeX document production system. It provides a comprehensive TeX system with binaries for most flavors of Unix, including GNU/Linux, and also Windows. It includes all the major TeX-related programs, macro packages, and fonts that are free software, including support for many languages around the world.
Theano 0.7.0 Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.
THOR 0.1 THOR is an HMM-based approach to detect and analyze differential peaks in two sets of ChIP-seq data from distinct biological conditions with replicates. THOR performs genomic signal processing, peak calling and p-value calculation in an integrated framework.
tmhmm 2.0 Prediction of transmembrane helices in proteins
tophat 1.4.1 TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.
tophat2 2.0.4, 2.0.5, 2.0.6, 2.0.7, 2.0.8, 2.0.10, 2.0.13, 2.1.1 TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.
transdecoder 2.0.1 TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.
transrate 1.0.1, 1.0.3 Transrate analyses a transcriptome assembly in three key ways:
treemix 1.12 TreeMix is a method for inferring the patterns of population splits and mixtures in the history of a set of populations. In the underlying model, the modern-day populations in a species are related to a common ancestor via a graph of ancestral populations. We use the allele frequencies in the modern populations to infer the structure of this graph.
trf 4.04 Tandem Repeats Finder is a program to locate and display tandem repeats in DNA sequences.
trimmomatic 0.22, 0.27, 0.30, 0.32, 0.33, 0.36 Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data.
trim_galore 0.3.7, 0.4.1 A wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files, with some extra functionality for MspI-digested RRBS-type (Reduced Representation Bisufite-Seq) libraries.
trinityrnaseq-intel r2013-02-25, r2014-04-13, r2014-07-17 Trinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data.
trinityrnaseq 2.0.6, 2.1.1, 2.2.0, 2.4.0, r2012-06-08, r2013-02-25, r2013-08-14, r2014-04-13, r2014-07-17 Trinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data.
trinotate 2.0.2, 3.0.1, r20130826, r20140708 Trinotate is a comprehensive annotation suite designed for automatic functional annotation of transcriptomes, particularly de novo assembled transcriptomes, from model or non-model organisms.
tRNAscan-SE 1.23 tRNAscan-SE detects ~99% of eukaryotic nuclear or prokaryotic tRNA genes, with a false positive rate of less than one per 15 gigabases, and with a search speed of about 30 kb/second.
truesight 0.06 Self-training Algorithm for Splice Junction Detection using RNA-seq.
uchime 4.2.40 UCHIME is an algorithm for detecting chimeric sequences. It is implemented in the uchime_ref and uchime_denovo commands.
ucsc 20130806, v312 This directory contains Genome Browser and Blat application binaries built for standalone command-line use on various supported Linux and UNIX platforms.
udunits 2.1.24 The UDUNITS package supports units of physical quantities. Its C library provides for arithmetic manipulation of units and for conversion of numeric values between compatible units. The package contains an extensive unit database, which is in XML format and user-extendable. The package also contains a command-line utility for investigating units and converting values.
usearch 4.2.66, 5.2.32, 5.2.236, 6.0.307, 6.1.544, 7.0.959, 7.0.1090, 8.0.1517 USEARCH is a unique sequence analysis tool with thousands of users world-wide. USEARCH offers search and clustering algorithms that are often orders of magnitude faster than BLAST.
USeq 8.5.1 USeq is a collection of software tools for for both low and high level analysis of next generation, ultra high throughput signature sequencing data from the Solexa, SOLiD, and 454 platforms.
vcftools 0.1.7, 0.1.11, 0.1.12b, 0.1.13, 0.1.14 VCFtools is a program package designed for working with VCF files, such as those generated by the 1000 Genomes Project. The aim of VCFtools is to provide easily accessible methods for working with complex genetic variation data in the form of VCF files.
velvet-kmer245 1.2.10 Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454, developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI), near Cambridge, in the United Kingdom.
velvet 1.1.04, 1.2.08, 1.2.10 Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454, developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI), near Cambridge, in the United Kingdom.
VelvetOptimiser 2.2.5 VelvetOptimiser is a multi-threaded Perl script for automatically optimising the three primary parameter options (K, -exp_cov, -cov_cutoff) for the Velvet de novo sequence assembler.
ViennaRNA 2.1.9 The ViennaRNA Package consists of a C code library and several stand-alone programs for the prediction and comparison of RNA secondary structures.
vim 7.4 Vim is an advanced text editor that seeks to provide the power of the de-facto Unix editor 'Vi', with a more complete feature set. It's useful whether you're already using vi or using a different editor.
virsorter 1.0.3 VIRSorter is a pipeline designed to mine microbial draft genomes for viral signal (complete viral contigs or viral regions within microbial contigs)
virushunting 0.1, 0.2, 0.3, 0.4 Virus Hunting Pipeline Homepage:
vsearch 1.0.7, 2.0.4, 2.4.0 VSEARCH stands for vectorized search, as the tool takes advantage of parallelism in the form of SIMD vectorization as well as multiple threads to perform accurate alignments at high speed.
weaver 0.20 Allele specific base-pair resolution quantification of Strcutrual variations in cancer genome
weblogo 2.8.2, 3.3 WebLogo is a web based application designed to make the generation of sequence logos as easy and painless as possible.
wessim 1.0 A whole-exome sequencing simulator based on in silico exome capture
wgs 7.0 This tool provides sequence similarity searching against the EMBL (WGS) database using the FASTA suite of programs.
wise 2.2.3-rc7 Wise2 has four main executable programs using sequence inputs which are designed to provide access to the main algorithms sensibly. The algorithms you are interested in is genewise - compare protein information to genomic DNA and estwise - compare protein information to EST/cDNA DNA.
xenome 1.0.1-r A tool for classifying reads from xenograft samples. Xenograft sequencing has many associated difficulties. Shotgun sequence read data derived from xenograft material contains a mixture of reads arising from the host and reads arising from the graft. Xenome is an application for classifying the read mixture to separate the two, allowing for more precise analysis to be performed.
xerces-c 3.1.2 Xerces-C++ is a validating XML parser written in a portable subset of C++. Xerces-C++ makes it easy to give your application the ability to read and write XML data. A shared library is provided for parsing, generating, manipulating, and validating XML documents. Xerces-C++ is faithful to the XML 1.0 and 1.1 recommendations and many associated standards.
xz 5.2.2 XZ Utils is free general-purpose data compression software with a high compression ratio. XZ Utils were written for POSIX-like systems, but also work on some not-so-POSIX systems. XZ Utils are the successor to LZMA Utils.
yasm 1.2.0 Yasm is a complete rewrite of the NASM assembler under the 'new' BSD License (some portions are under other licenses, see COPYING for details). Yasm currently supports the x86 and AMD64 instruction sets, accepts NASM and GAS assembler syntaxes, outputs binary, ELF32, ELF64, 32 and 64-bit Mach-O, RDOFF2, COFF, Win32, and Win64 object formats, and generates source debugging information in STABS, DWARF 2, and CodeView 8 formats
zlib 1.2.8 zlib is designed to be a free, general-purpose, legally unencumbered -- that is, not covered by any patents -- lossless data-compression library for use on virtually any computer hardware and operating system. The zlib data format is itself portable across platforms.