msa: An R Package for Multiple Sequence Alignment
The msa provides a unified R/Bioconductor interface to the multiple sequence alignment algorithms ClustalW, ClustalOmega, and MUSCLE. All three algorithms are integrated in the package, therefore, they do not depend on any external software tools and are available for all major platforms. The multiple sequence alignment algorithms are complemented by a function for pretty-printing multiple sequence alignments using the LaTeX ackage TeXshade.Installation
The R package msa is available from Bioconductor. The first version of the package has been released as part of Bioconductor 3.1 on April 17, 2015. The current version of the package is 1.10.0 (released on October 31, 2017, as part of Bioconductor 3.6). To install msa, follow the simple standard procedure for installing Bioconductor packages i.e. enter the following into your R session:Please note that Bioconductor 3.6 requires R version 3.4.2.source("http://www.bioconductor.org/biocLite.R") biocLite("msa")
The current development version of the package is 1.11.0.
Documentation
Getting started
- To load the package, enter "library(msa)" in your R session.
- To view the user manual, enter "vignette("msa")".
- To do a first example, enter "example(msa)".
User support
If you encounter any issues or if you have any question that might be of interest also for other users, before writing a private message to the package developers/maintainers, please consider posting on Bioconductor Support or StackOverflow. For all other matters regarding the package, please contact [email protected].Citing this package
If you use this package for research that is published later, you are kindly asked to cite it as follows:U. Bodenhofer, E. Bonatesta, C. Horejš-Kainrath, and S. Hochreiter (2015). msa: an R package for multiple sequence alignment. Bioinformatics 31(24):3997-3999. DOI: 10.1093/bioinformatics/btv494.
R source code for example alignment presented in paper: Example.R (0.7 KB)Moreover, we insist that, any time you use/cite the package, you also cite the original paper in which the algorithm/method/package that you have been using has been introduced:
- ClustalW:
- J. D. Thompson, D. G. Higgins, and T. J. Gibson (1994).CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22(22):4673 4680. DOI: 10.1093/nar/22.22.4673.
- ClustalOmega:
- F. Sievers, A. Wilm, D. Dineen, T. J. Gibson, K. Karplus, W. Li, R. Lopez, H. McWilliam, M. Remmert, J. Söding, J. D. Thompson, and D. G. Higgins (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol., 7:539. DOI: 10.1038/msb.2011.75.
- MUSCLE:
- R. C. Edgar (2004) MUSCLE: a multiple sequence alignment method with
reduced time and space complexity. BMC Bioinformatics,
5(5):113.
DOI: 10.1186/1471-2105-5-113.
R. C. Edgar (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res., 32(5):1792 1797. DOI: 10.1093/nar/gkh340.
- TeXshade:
- E. Beitz (2000) TeXshade: shading and labeling of multiple sequence alignments using LaTeX2e. Bioinformatics, 16(2):135-139. DOI: 10.1093/bioinformatics/16.2.135.
Change log
- Version 1.10.0:
- release as part of Bioconductor 3.6
- Version 1.9.0:
- devel branch created from version 1.8.0
- Version 1.8.0:
- release as part of Bioconductor 3.5
- Version 1.7.1:
- additional conversions implemented for msaConvert() function
- added a new method msaConsensusSequence() that extends the functionality provided by Biostring's consensusString() method
- added a new method msaConservationScore()
- print() method extended such that it now also allows for customization of the consensus sequence (via the new msaConsensusSequence() method)
- package now depends on Biostrings version ≥ 2.40.0 in order to make sure that consensusMatrix() also works correctly for masked alignments
- corresponding changes in documentation and vignette
- Version 1.7.0:
- new branch for Bioconductor 3.5 devel
- Version 1.6.0:
- release as part of Bioconductor 3.4
- Version 1.5.5:
- fixes in ClustalOmega source code to ensure Windows compatibility of GCC6 compatibility fix
- Version 1.5.4:
- bug fix in msaClustalW(): unsupported parameter 'tree' deactivated
- fixes in ClustalOmega source code to ensure GCC6 compatibility
- fix in msaConvert() function to improve safety of call to suggested package 'phangorn'
- Version 1.5.3:
- additional conversions implemented for msaConvert() function
- corresponding changes in documentation
- Version 1.5.1 / 1.5.2:
- version number bumps for technical reasons related to Bioconductor build servers
- Version 1.5.0:
- new branch for Bioconductor 3.4 devel
- Version 1.4.0:
- release as part of Bioconductor 3.3
- Version 1.3.7:
- fixes in msaPrettyPrint() function
- Version 1.3.6:
- msaPrettyPrint() now also accepts dashes in file names
- added section about pretty-printing wide alignments to package vignette
- Version 1.3.5:
- adaptation of displaying help text by msa() function
- Version 1.3.4:
- added function for checking and fixing sequence names for possibly problematic characters that could lead to LaTeX errors when using msaPrettyPrint()
- corresponding changes in documentation
- minor namespace fix
- Version 1.3.3:
- added function for converting multiple sequence alignments for use with other sequence alignment packages
- corresponding changes in documentation
- Version 1.3.2:
- further fixes in Makefiles and Makevars files to account for changes in build system
- update of citation information
- Version 1.3.1:
- fixes in Makefiles and Makevars files to account for changes in build system
- Version 1.3.0:
- new branch for Bioconductor 3.3 devel
- Version 1.2.0:
- release as part of Bioconductor 3.2
- Version 1.1.3:
- bug fix related to custom substitution matrices in the MUSCLE interface
- correction and updates of documentation
- Version 1.1.2:
- new print() function for multiple alignments that also allows for displaying alignments in their entirety (plus additional customizations)
- strongly improved handling of custom substitution matrices by msaClustalW(): now custom matrices can also be supplied for nucleotide sequences which can also be passed via the "substitutionMatrix" argument. The "dnamatrix" argument is still available for the sake of backwards compatibility.
- strongly improved handling of custom substitution matrices by msaMuscle()
- fix of improperly aligned sequence logos produced by msaPrettyPrint()
- updated citation information
- Version 1.1.1:
- fix of msa() function
- Version 1.1.0:
- new branch for Bioconductor 3.2 devel
- Version 1.0.0:
- first official release as part of Bioconductor 3.1