Smith-Waterman Algorithm in Python

Smith-Waterman Algorithm in Python

  1. Smith-Waterman Algorithm in Python
  2. Conclusion

The Smith-Waterman algorithm is used to perform local sequence alignment of strings. The strings mostly represent DNA strands or protein sequences.

This article discusses the implementation of the Smith-Waterman algorithm in Python.

Smith-Waterman Algorithm in Python

The swalign module contains several functions to implement the Smith-Waterman algorithm in Python. You can install the swalign module using PIP by executing the following statement in the command line.

pip3 install swalign

The above statement installs the module for Python version 3. To install the module in Python version 2, you can use the following command.

pip install swalign

After installing the swalign module, we will use the following steps to implement the Smith-Waterman algorithm in our Python program.

  1. First, we will import the swalign module using the import statement.
  2. To perform the alignment, we must create a nucleotide scoring matrix. In the matrix, we provide a score for each match and mismatch.
Commonly, we use 2 for a match score and -1 for a mismatch.
  1. To create the nucleotide scoring matrix, we will use the NucleotideScoringMatrix() method. The NucleotideScoringMatrix() takes the match score as its first input argument and the mismatch score as its second input argument.

    After execution, it returns an IdentityScoringMatrix object.

  2. Once we get the nucleotide matrix, we will create a LocalAlignment object using the LocalAlignment() method. The LocalAlignment() method takes the nucleotide scoring matrix as its input and returns a LocalAlignment object.

  3. Once we get the LocalAlignment object, we can execute the Smith-Waterman algorithm using the align() method.

  4. The align() method, when invoked on a LocalAlignment object, takes a string representing a DNA strand as its first input argument. It takes another string representing the reference DNA strand.

  5. After execution, the align() method returns an Alignment object. The Alignment object contains the match details and mismatch of the input strings and several other details.

You can observe the entire process in the following example.

import swalign

dna_string = "ATCCACAGC"
reference_string = "ATGCAGCGC"
match_score = 2
mismatch_score = -1
matrix = swalign.NucleotideScoringMatrix(match_score, mismatch_score)
lalignment_object = swalign.LocalAlignment(matrix)
alignment_object = lalignment_object.align(dna_string, reference_string)
alignment_object.dump()

Output:

Query:  1 ATGCAGC-GC 9
          ||.|| | ||
Ref  :  1 ATCCA-CAGC 9

Score: 11
Matches: 7 (70.0%)
Mismatches: 3
CIGAR: 5M1I1M1D2M

Conclusion

This article discusses how we can implement the Smith-Waterman algorithm using Python’s swalign module.

You can also use the functions defined in the scikit learn-bio module for other implementations of the Smith-Waterman algorithm in Python.

Related Article - Python Algorithm

  • Rabin-Karp Algorithm in Python
  • Union-Find Algorithm in Python
  • Depth-First Search in Python
  • Sieve of Eratosthenes in Python
  • Linear Search in Python