The Smith-Waterman algorithm is used to perform local sequence alignment of strings. The strings mostly represent DNA strands or protein sequences.
This article discusses the implementation of the Smith-Waterman algorithm in Python.
Smith-Waterman Algorithm in Python
swalign module contains several functions to implement the Smith-Waterman algorithm in Python. You can install the
swalign module using
PIP by executing the following statement in the command line.
pip3 install swalign
The above statement installs the module for Python version 3. To install the module in Python version 2, you can use the following command.
pip install swalign
After installing the
swalign module, we will use the following steps to implement the Smith-Waterman algorithm in our Python program.
- First, we will import the
swalignmodule using the
- To perform the alignment, we must create a nucleotide scoring matrix. In the matrix, we provide a score for each match and mismatch.
Commonly, we use 2 for a match score and -1 for a mismatch.
To create the nucleotide scoring matrix, we will use the
NucleotideScoringMatrix()takes the match score as its first input argument and the mismatch score as its second input argument.
After execution, it returns an
Once we get the nucleotide matrix, we will create a
LocalAlignmentobject using the
LocalAlignment()method takes the nucleotide scoring matrix as its input and returns a
Once we get the
LocalAlignmentobject, we can execute the Smith-Waterman algorithm using the
align()method, when invoked on a
LocalAlignmentobject, takes a string representing a DNA strand as its first input argument. It takes another string representing the reference DNA strand.
After execution, the
align()method returns an
Alignmentobject contains the match details and mismatch of the input strings and several other details.
You can observe the entire process in the following example.
import swalign dna_string = "ATCCACAGC" reference_string = "ATGCAGCGC" match_score = 2 mismatch_score = -1 matrix = swalign.NucleotideScoringMatrix(match_score, mismatch_score) lalignment_object = swalign.LocalAlignment(matrix) alignment_object = lalignment_object.align(dna_string, reference_string) alignment_object.dump()
Query: 1 ATGCAGC-GC 9 ||.|| | || Ref : 1 ATCCA-CAGC 9 Score: 11 Matches: 7 (70.0%) Mismatches: 3 CIGAR: 5M1I1M1D2M
This article discusses how we can implement the Smith-Waterman algorithm using Python’s
You can also use the functions defined in the
scikit learn-bio module for other implementations of the Smith-Waterman algorithm in Python.