OneHot Encoding on NumPy Array in Python
 Use the NumPy Module to Perform OneHot Encoding on a NumPy Array in Python

Use the
sklearn
Module to Perform OneHot Encoding on a NumPy Array in Python 
Use the
pandas
Module to Perform OneHot Encoding on a NumPy Array in Python 
Use the
keras
Module to Perform OneHot Encoding on a NumPy Array in Python
Python has a vast framework available for machine learning. We can train and test models easily. However, when it comes to categorical data, some algorithms cannot operate with such data labels and require numeric values.
Therefore, onehot encoding is a highly used technique for encoding data before using it in an algorithm.
In this tutorial, we will learn how to perform onehot encoding on numpy arrays.
Use the NumPy Module to Perform OneHot Encoding on a NumPy Array in Python
In this method, we will generate a new array that contains the encoded data. We will use the numpy.zeros()
function to create an array of 0s of the required size. We will then replace 0 with 1 at corresponding locations by using the numpy.arange()
function.
For example,
import numpy as np
a = np.array([1, 0, 3])
b = np.zeros((a.size, a.max()+1))
b[np.arange(a.size),a] = 1
print(b)
Output:
[[0. 1. 0. 0.]
[1. 0. 0. 0.]
[0. 0. 0. 1.]]
We can also use the eye()
function to perform onehot encoding on arrays. It returns a 2Dimensional with 1 at the main diagonal and 0 elsewhere by default. We can use this method and specify the locations we want 1s to be, as shown below.
import numpy as np
values = [1, 0, 3]
n_values = np.max(values) + 1
print(np.eye(n_values)[values])
Output:
[[0. 1. 0. 0.]
[1. 0. 0. 0.]
[0. 0. 0. 1.]]
Use the sklearn
Module to Perform OneHot Encoding on a NumPy Array in Python
The sklearn.preprocessing.LabelBinarizer
is a class available in Python, which can perform this encoding efficiently. It is used to binarize multilabels by converting them to numeric form. We will use the transform()
function to convert the data using an object of this class.
The following code explains this.
import sklearn.preprocessing
import numpy as np
a = np.array([1,0,3])
label_binarizer = sklearn.preprocessing.LabelBinarizer()
label_binarizer.fit(range(max(a)+1))
b = label_binarizer.transform(a)
print(b)
Output:
[[0 1 0 0]
[1 0 0 0]
[0 0 0 1]]
Use the pandas
Module to Perform OneHot Encoding on a NumPy Array in Python
Datasets for Machine Learning algorithms are usually in the form of a pandas
DataFrame. Therefore the pandas
module is well equipped to perform data encoding. The get_dummies()
can be used to convert a categorical dataset into numerical indicators thus, performing the onehot encoding. The final result is a DataFrame.
For example,
import pandas as pd
import numpy as np
a = np.array([1,0,3])
b = pd.get_dummies(a)
print(b)
Output:
0 1 3
0 0 1 0
1 1 0 0
2 0 0 1
Use the keras
Module to Perform OneHot Encoding on a NumPy Array in Python
The keras
module is widely used for Machine Learning in Python. The to_categorical()
function from this module can perform onehot encoding on data.
The code snippet below shows how.
from keras.utils.np_utils import to_categorical
import numpy as np
a = np.array([1,0,3])
b = to_categorical(a,num_classes = (len(a)+1))
print(b)
Output:
[[0. 1. 0. 0.]
[1. 0. 0. 0.]
[0. 0. 0. 1.]]
Manav is a IT Professional who has a lot of experience as a core developer in many live projects. He is an avid learner who enjoys learning new things and sharing his findings whenever possible.
LinkedIn