Smith-Waterman Algorithm in Python

The Smith-Waterman algorithm is used to perform local sequence alignment of strings. The strings mostly represent DNA strands or protein sequences.
This article discusses the implementation of the Smith-Waterman algorithm in Python.
Smith-Waterman Algorithm in Python
The swalign
module contains several functions to implement the Smith-Waterman algorithm in Python. You can install the swalign
module using PIP
by executing the following statement in the command line.
pip3 install swalign
The above statement installs the module for Python version 3. To install the module in Python version 2, you can use the following command.
pip install swalign
After installing the swalign
module, we will use the following steps to implement the Smith-Waterman algorithm in our Python program.
-
First, we will import the
swalign
module using theimport
statement. -
To perform the alignment, we must create a nucleotide scoring matrix. In the matrix, we provide a score for each match and mismatch.
Commonly, we use 2 for a match score and -1 for a mismatch.
-
To create the nucleotide scoring matrix, we will use the
NucleotideScoringMatrix()
method. TheNucleotideScoringMatrix()
takes the match score as its first input argument and the mismatch score as its second input argument.After execution, it returns an
IdentityScoringMatrix
object. -
Once we get the nucleotide matrix, we will create a
LocalAlignment
object using theLocalAlignment()
method. TheLocalAlignment()
method takes the nucleotide scoring matrix as its input and returns aLocalAlignment
object. -
Once we get the
LocalAlignment
object, we can execute the Smith-Waterman algorithm using thealign()
method. -
The
align()
method, when invoked on aLocalAlignment
object, takes a string representing a DNA strand as its first input argument. It takes another string representing the reference DNA strand. -
After execution, the
align()
method returns anAlignment
object. TheAlignment
object contains the match details and mismatch of the input strings and several other details.
You can observe the entire process in the following example.
import swalign
dna_string = "ATCCACAGC"
reference_string = "ATGCAGCGC"
match_score = 2
mismatch_score = -1
matrix = swalign.NucleotideScoringMatrix(match_score, mismatch_score)
lalignment_object = swalign.LocalAlignment(matrix)
alignment_object = lalignment_object.align(dna_string, reference_string)
alignment_object.dump()
Output:
Query: 1 ATGCAGC-GC 9
||.|| | ||
Ref : 1 ATCCA-CAGC 9
Score: 11
Matches: 7 (70.0%)
Mismatches: 3
CIGAR: 5M1I1M1D2M
Conclusion
This article discusses how we can implement the Smith-Waterman algorithm using Python’s swalign
module.