One-Hot Encoding on NumPy Array in Python

Manav Narula Apr 29, 2021
  1. Use the NumPy Module to Perform One-Hot Encoding on a NumPy Array in Python
  2. Use the sklearn Module to Perform One-Hot Encoding on a NumPy Array in Python
  3. Use the pandas Module to Perform One-Hot Encoding on a NumPy Array in Python
  4. Use the keras Module to Perform One-Hot Encoding on a NumPy Array in Python
One-Hot Encoding on NumPy Array in Python

Python has a vast framework available for machine learning. We can train and test models easily. However, when it comes to categorical data, some algorithms cannot operate with such data labels and require numeric values.

Therefore, one-hot encoding is a highly used technique for encoding data before using it in an algorithm.

In this tutorial, we will learn how to perform one-hot encoding on numpy arrays.

Use the NumPy Module to Perform One-Hot Encoding on a NumPy Array in Python

In this method, we will generate a new array that contains the encoded data. We will use the numpy.zeros() function to create an array of 0s of the required size. We will then replace 0 with 1 at corresponding locations by using the numpy.arange() function.

For example,

import numpy as np

a = np.array([1, 0, 3])
b = np.zeros((a.size, a.max() + 1))
b[np.arange(a.size), a] = 1
print(b)

Output:

[[0. 1. 0. 0.]
 [1. 0. 0. 0.]
 [0. 0. 0. 1.]]

We can also use the eye() function to perform one-hot encoding on arrays. It returns a 2-Dimensional with 1 at the main diagonal and 0 elsewhere by default. We can use this method and specify the locations we want 1s to be, as shown below.

import numpy as np

values = [1, 0, 3]
n_values = np.max(values) + 1
print(np.eye(n_values)[values])

Output:

[[0. 1. 0. 0.]
 [1. 0. 0. 0.]
 [0. 0. 0. 1.]]

Use the sklearn Module to Perform One-Hot Encoding on a NumPy Array in Python

The sklearn.preprocessing.LabelBinarizer is a class available in Python, which can perform this encoding efficiently. It is used to binarize multi-labels by converting them to numeric form. We will use the transform() function to convert the data using an object of this class.

The following code explains this.

import sklearn.preprocessing
import numpy as np

a = np.array([1, 0, 3])
label_binarizer = sklearn.preprocessing.LabelBinarizer()
label_binarizer.fit(range(max(a) + 1))
b = label_binarizer.transform(a)
print(b)

Output:

[[0 1 0 0]
 [1 0 0 0]
 [0 0 0 1]]

Use the pandas Module to Perform One-Hot Encoding on a NumPy Array in Python

Datasets for Machine Learning algorithms are usually in the form of a pandas DataFrame. Therefore the pandas module is well equipped to perform data encoding. The get_dummies() can be used to convert a categorical dataset into numerical indicators thus, performing the one-hot encoding. The final result is a DataFrame.

For example,

import pandas as pd
import numpy as np

a = np.array([1, 0, 3])
b = pd.get_dummies(a)
print(b)

Output:

  0  1  3
0  0  1  0
1  1  0  0
2  0  0  1

Use the keras Module to Perform One-Hot Encoding on a NumPy Array in Python

The keras module is widely used for Machine Learning in Python. The to_categorical() function from this module can perform one-hot encoding on data.

The code snippet below shows how.

from keras.utils.np_utils import to_categorical
import numpy as np

a = np.array([1, 0, 3])
b = to_categorical(a, num_classes=(len(a) + 1))
print(b)

Output:

[[0. 1. 0. 0.]
 [1. 0. 0. 0.]
 [0. 0. 0. 1.]]
Author: Manav Narula
Manav Narula avatar Manav Narula avatar

Manav is a IT Professional who has a lot of experience as a core developer in many live projects. He is an avid learner who enjoys learning new things and sharing his findings whenever possible.

LinkedIn