Here is the list of available command line software. Most of them are available in GenoToul to this path : /usr/local/bioinfo/src.
alignAceAlignACE (Aligns Nucleic Acid Conserved Elements) is a program which finds sequence elements conserved in a set of DNA sequences.
blatThe BLAST-Like Alignment Tool: similarity search in databanks. BLAT on DNA is designed to quickly find sequences of 95% and greater similarity of length 25 bases or more. BLAT on proteins finds sequences of 80% and greater similarity of length 20 amino acids or more.
bowtieBowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: typically about 2.2 GB for the human genome (2.9 GB for paired-end).
Bowtie2Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.
BRATBRAT is an accurate and efficient tool for mapping short bisulfite-treated reads obtained from the Solexa-Illumina Genome Analyzer. BRAT supports single-end and pair-end short reads mapping and allows alignment of different length reads/mates. BRAT-bw is BRAT-BW, a fast, accurate and memory-efficient tool that maps bisulfite-treated short reads (BS-seq) to a reference genome using the FM-index (Burrows-Wheeler transform). The package includes tools to trim low quality reads ends and to report A, C, G, T counts at each base for forward and reverse strands of references.
BSMAPBSMAP is a short reads mapping software for bisulfite sequencing reads. Bisulfite treatment converts unmethylated Cytosines into Uracils (sequenced as Thymine) and leave methylated Cytosines unchanged, hence provides a way to study DNA cytosine methylation at single nucleotide resolution. BSMAP aligns the Ts in the reads to both Cs and Ts in the reference
bwaBurrows-Wheeler Aligner (BWA) is an efficient program that aligns relatively short nucleotide sequences against a long reference sequence such as the human genome. It implements two algorithms, bwa-short and BWA-SW. The former works for query sequences shorter than 200bp and the latter for longer sequences up to around 100kbp. Both algorithms do gapped alignment. They are usually more accurate and faster on queries with low error rates.
Clustal OmegaClustal Omega is the latest addition to the Clustal family. It offers a significant increase in scalability over previous versions, allowing hundreds of thousands of sequences to be aligned in only a few hours. It will also make use of multiple processors, where present. In addition, the quality of alignments is superior to previous versions, as measured by a range of popular benchmarks
clustalwMultiple sequence alignment program for DNA or proteins.
CRACA integrated RNA-Seq read analysis.
CUSHAW2CUSHAW2 (the second distribution of CUSHAW software package for next-generation sequencing read alignment) is a fast and parallel gapped read alignment to large genomes, such as the human genome.
CUSHAW3CUSHAW3 (the third distribution of CUSHAW software package for next-generation sequencing read alignment) is an open-source parallelized, sensitive and accurate short-read aligner.
DALIGNERThe commands below permit one to find all significant local alignments between reads encoded in Dazzler database. The assumption is that the reads are from a PACBIO RS II
long read sequencer.
e-PCRe-PCR identifies sequence tagged sites(STSs)within DNA sequences. Using e-PCR, you can search for sub-sequences that closely match the PCR primers and have the correct order, orientation, and spacing.
ecoPrimersecoPrimer is a barcoding software which is written in C language. It finds universal primers from a set of input DNA sequences by finding conserved regions without "a priori" on candidate sequences.
It also evaluates the quality of the primers and barcode regions by measuring the "barcode specificity" and "barcode coverage" indices
exonerateA generic tool for sequence alignment.
fastaFASTA is a sequence similarity search tool which uses heuristics for fast local alignment searching.
gassstGASSST (Global Alignment Short Sequence Search Tool) finds global alignments of short DNA sequences against large DNA banks. GASSST strong point is its ability to perform fast gapped alignments. It works well for both short and longer reads. It currently has been tested for reads up to 500bp.
glintComplete genome alignment tool
GMAP / GSNAPGMAP: A Genomic Mapping and Alignment Program for mRNA and EST Sequences, and
GSNAP: Genomic Short-read Nucleotide Alignment Program
KlastKLAST is a fast, accurate and NGS scalable bank-to-bank sequence similarity search tool providing significant accelerations of seeds-based heuristic comparison methods, such as the Blast suite of algorithms.
LAGANThe Lagan Tookit is a set of alignment programs for comparative genomics. The three main components are a pairwise aligner (LAGAN), a multiple aligner (M-LAGAN), and a glocal aligner (Shuffle-LAGAN). All three are based on the CHAOS local alignment tool and combine speed (regions up to several megabases can be aligned in minutes) with high accuracy.
LastLAST finds similar regions between sequences.
MAAFTMAFFT is a multiple sequence alignment program for unix-like operating systems
MAFFTMAFFT is a multiple sequence alignment program for unix-like operating systems. It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment of <?200 sequences), FFT-NS-2 (fast; for alignment of <?10,000 sequences), etc.
mapSpliceAccurate mapping of RNA-seq reads for splice junction discovery.
MauveMauve is a system for efficiently constructing multiple genome alignments in the presence of large-scale evolutionary events such as rearrangement and inversion. Multiple genome alignment provides a basis for research into comparative genomics and the study of evolutionary dynamics. Aligning whole genomes is a fundamentally different problem than aligning short sequences.
MOSAIKMOSAIK is a reference-guided assembler comprising of two main modular programs
MugsyMugsy is a multiple whole genome aligner. Mugsy uses Nucmer for pairwise alignment, a custom graph based segmentation procedure for identifying collinear regions, and the segment-based progressive multiple alignment strategy from Seqan::TCoffee. Mugsy accepts draft genomes in the form of multi-FASTA files and does not require a reference genome.
multalinMultiple sequence alignment with hierarchical clustering.
mummerMUMmer is a package for rapidly aligning entire genomes, whether in complete or draft form.
muscleMultiple sequence alignment (nucleic or proteic).
nbci-blastSimilarity search against databanks.
NextGenMapNextGenMap is a flexible and fast read mapping program that is more than twice as fast as BWA while achieving a mapping sensitivity similar to Stampy.
PAL2NALPAL2NAL is a program that converts a multiple sequence alignment of proteins and the corresponding DNA (or mRNA) sequences into a codon alignment. The program automatically assigns the corresponding codon sequence even if the input DNA sequence has mismatches with the input protein sequence, or contains UTRs, polyA tails. It can also deal with frame shifts in the input alignment, which is suitable for the analysis of pseudogenes. The resulting codon alignment can further be subjected to the calculation of synonymous (dS) and non-synonymous (dN) substitution rates.
paleomixThe PALEOMIX pipeline is a set of free and open-source pipelines and tools designed to enable the rapid processing of Next Generation Sequencing (NGS) data, starting from de-multiplexed reads from one or more samples, through sequence processing and alignment, and ending with genotyping, phylogenetic inference on the samples, as well as metagenomic analysis of the samples.
ParsnpParsnp was designed to align the core genome of hundreds to thousands of bacterial genomes within a few minutes to few hours. Input can be both draft assemblies and finished genomes, and output includes variant (SNP) calls, core genome phylogeny and multi-alignments.
PBJellyPBJelly is a highly automated pipeline that aligns long sequencing reads (such as PacBio RS reads or long 454 reads in fasta format) to high-confidence draft assembles.
picard-toolsPicard comprises Java-based command-line utilities that manipulate SAM files, and a Java API (SAM-JDK) for creating new programs that read and write SAM files. Both SAM text format and SAM binary (BAM) format are supported.
PLAST is a parallel alignment search tool for comparing large protein banks.
PLAST runs 3 to 5 times faster than the NCBI-BLAST software when processing large amount of data.
pysamPysam is a python module for reading and manipulating Samfiles. It's a lightweight wrapper of the samtools C-API.
QualiMapQualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Inteface (GUI) and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.
quickdistCalculates a matrix of pairwise distances between sequences in a multiple sequence alignment.
realignerReAligner is used to realign multi-alignments of DNA fragments. converter is a utility for reformatting multi-alignments.
RUMRUM is an alignment, junction calling, and feature quantification pipeline specifically designed for Illumina RNA-Seq data. RUM can also be used effectively for DNA sequencing (e.g. ChIP-Seq) and microarray probe mapping. RUM also has a strand specific mode.
samstatSAMStat is an efficient C program to quickly display statistics in html format of large sequence files from next generation sequencing projects.
samtoolsSAM (Sequence Alignment/Map). SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.
SeaViewSeaView is a multiplatform, graphical user interface for multiple sequence alignment and molecular phylogeny.
SEGEMEHLsegemehl is a software to map short sequencer reads to reference genomes. Unlike other methods, segemehl is able to detect not only mismatches but also insertions and deletions. Furthermore, segemehl is not limited to a specific read length and is able to mapprimer- or polyadenylation contaminated reads correctly. segemehl implements a matching strategy based on enhanced suffix arrays (ESA). Segemehl now supports the SAM format, reads gziped queries to save both disk and memory space and allows bisulfite sequencing mapping and split read mapping.
seqtools (dotter belvu blixem blixemh)A suite of tools for visualising sequence alignments. Blixem is an interactive browser of pairwise alignments that have been stacked up in a "master-slave" multiple alignment; it is not a 'true' multiple alignment but a 'one-to-many' alignment. It displays an overview section showing the positions of genes and alignments around the alignment window, and a detail section showing the actual alignment of protein or nucleotide sequences to the genomic DNA sequence. Dotter is a graphical dot-matrix program for detailed comparison of two sequences. Every residue in one sequence is compared to every residue in the other, with one sequence plotted on the x-axis and the other on the y-axis. Noise is filtered out so that alignments appear as diagonal lines. Belvu is a multiple sequence alignment viewer and phylogenetic tool. It has an extensive set of user-configurable modes to color residues by conservation or by residue type, and some basic alignment editing capabilities. It can generate distance matrices between sequences and construct distance-based trees, either graphically or as part of a phylogenetic software pipeline.
sim4sim4 is a program designed to align an expressed DNA sequence with a genomic sequence, allowing for introns.
smaltSMALT efficiently aligns DNA sequencing reads with genomic reference sequences. Reads from a range of sequencing platforms, for example Illumina-Solexa, Roche-454, PacBio or ABI-Sanger, can be processed including paired-end reads
StampyStampy is a package for the mapping of short reads from illumina sequencing machines onto a reference genome.
SubreadA tool kit for processing next-gen sequencing data
T-CoffeeT-Coffee is a multiple sequence alignment package. You can use T-Coffee to align sequences or to combine the output of your favorite alignment methods (Clustal, Mafft, Probcons, Muscle...) into one unique alignmen.
TFM_PvalueTFM-Pvalue is a software suite providing tools for computing the score threshold associated to a given P-value and the P-value associated to a given score threshold. It uses Position Weight Matrices, such as those available in the Transfac or Jaspar databases.
TopHatTopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.
trimAltrimAl: a tool for automated alignment trimmin
USEARCHHigh-throughput biological sequence analysis. It is a distributed as single binary program that implements a suite of algorithms comparable to BLASTN, BLASTP, BLASTX, BLASTCLUST, CD-HIT, CD-HIT-EST, CD-HIT-2D, CD-HIT-EST-2D, CD-HIT-OTU, CD-HIT-454, ChimeraSlayer, Perseus, RAPsearch and more. It supports a rich set of sequence matching options, including E-values, identity, coverage (fraction of query or target sequence covered by the alignment) and maximum gap length, and a range of output file formats including FASTA, BLAST-like, user-defined tabbed text and a native format designed for clustering applications. Supported alignment styles include local (gapped and ungapped), like BLAST, and global, which is most often used in clustering applications. User-settable parameters allow tuning of substitution scores, gap penalties and Karlin-Altschul statistics.
Wise2Wise2 is a package focused on comparisons of biopolymers, commonly DNA sequence and protein sequence. These are the programs which you might use for this:
genewise: a single protein vs a single genomic dna sequence
genewisedb: a database of proteins vs a database of genomic dna sequences.
estwise: a single protein vs a single EST/cDNA sequence.
estwisedb: a database of proteins vs a database of EST/cDNA sequences.
wu-blastSimilarity search against databanks, Washington University Blast.