How to Remove Stop Words in Python

Samyak Jain Feb 02, 2024
  1. Use the NLTK Package to Remove Stop Words in Python
  2. Use the stop-words Package to Remove Stop Words in Python
  3. Use the remove_stpwrds Method in the textcleaner Library to Remove Stop Words in Python
How to Remove Stop Words in Python

Stop words are the commonly used words that are generally ignored by the search engine, such as the, a, an, and more. These words are removed to save space in the database and the processing time. The sentence, There is a snake in my boot without stop words will be just snake boot.

In this tutorial, we will discuss how to remove stop words in Python.

Use the NLTK Package to Remove Stop Words in Python

The nlkt (Natural Language Processing) package can be used to remove stop words from the text in Python. This package contains stop words from many different languages.

We can iterate through a list and check if a word is a stop word or not using the list from this library.

For example,

import nltk
from nltk.corpus import stopwords

dataset = ["This", "is", "just", "a", "snake"]
A = [word for word in dataset if word not in stopwords.words("english")]
print(A)

Output:

['This', 'snake']

The following code will show a list of stop words in Python:

import nltk
from nltk.corpus import stopwords

print(stopwords.words("english"))

Output:

{'ourselves', 'hers', 'between', 'yourself', 'but', 'again', 'there', 'about', 'once', 'during', 'out', 'very', 'having', 'with', 'they', 'own', 'an', 'be', 'some', 'for', 'do', 'its', 'yours', 'such', 'into', 'of', 'most', 'itself', 'other', 'off', 'is', 's', 'am', 'or', 'who', 'as', 'from', 'him', 'each', 'the', 'themselves', 'until', 'below', 'are', 'we', 'these', 'your', 'his', 'through', 'don', 'nor', 'me', 'were', 'her', 'more', 'himself', 'this', 'down', 'should', 'our', 'their', 'while', 'above', 'both', 'up', 'to', 'ours', 'had', 'she', 'all', 'no', 'when', 'at', 'any', 'before', 'them', 'same', 'and', 'been', 'have', 'in', 'will', 'on', 'does', 'yourselves', 'then', 'that', 'because', 'what', 'over', 'why', 'so', 'can', 'did', 'not', 'now', 'under', 'he', 'you', 'herself', 'has', 'just', 'where', 'too', 'only', 'myself', 'which', 'those', 'i', 'after', 'few', 'whom', 't', 'being', 'if', 'theirs', 'my', 'against', 'a', 'by', 'doing', 'it', 'how', 'further', 'was', 'here', 'than'} 

Use the stop-words Package to Remove Stop Words in Python

The stop-words package is used to remove stop words from the text in Python. This package contains stop words from many languages like English, Danish, French, Spanish, and more.

For example,

from stop_words import get_stop_words

dataset = ["This", "is", "just", "a", "snake"]
A = [word for word in dataset if word not in get_stop_words("english")]
print(A)

Output:

["This", "just", "snake"]

The above code will filter the dataset by removing all the stop words used in the English language.

Use the remove_stpwrds Method in the textcleaner Library to Remove Stop Words in Python

The remove_stpwrds() method in the textcleaner library is used to remove stop words from the text in Python.

For example,

import textcleaner as tc

dataset = ["This", "is", "just", "a", "snake"]
data = tc.document(dataset)
print(data.remove_stpwrds())

Output:

This
snake