# One-Hot Encoding on NumPy Array in Python

Manav Narula Apr 29, 2021 Apr 19, 2021

Python has a vast framework available for machine learning. We can train and test models easily. However, when it comes to categorical data, some algorithms cannot operate with such data labels and require numeric values.

Therefore, one-hot encoding is a highly used technique for encoding data before using it in an algorithm.

In this tutorial, we will learn how to perform one-hot encoding on numpy arrays.

## Use the NumPy Module to Perform One-Hot Encoding on a NumPy Array in Python

In this method, we will generate a new array that contains the encoded data. We will use the `numpy.zeros()` function to create an array of 0s of the required size. We will then replace 0 with 1 at corresponding locations by using the `numpy.arange()` function.

For example,

``````import numpy as np
a = np.array([1, 0, 3])
b = np.zeros((a.size, a.max()+1))
b[np.arange(a.size),a] = 1
print(b)
``````

Output:

``````[[0. 1. 0. 0.]
[1. 0. 0. 0.]
[0. 0. 0. 1.]]
``````

We can also use the `eye()` function to perform one-hot encoding on arrays. It returns a 2-Dimensional with 1 at the main diagonal and 0 elsewhere by default. We can use this method and specify the locations we want 1s to be, as shown below.

``````import numpy as np
values = [1, 0, 3]
n_values = np.max(values) + 1
print(np.eye(n_values)[values])
``````

Output:

``````[[0. 1. 0. 0.]
[1. 0. 0. 0.]
[0. 0. 0. 1.]]
``````

## Use the `sklearn` Module to Perform One-Hot Encoding on a NumPy Array in Python

The `sklearn.preprocessing.LabelBinarizer` is a class available in Python, which can perform this encoding efficiently. It is used to binarize multi-labels by converting them to numeric form. We will use the `transform()` function to convert the data using an object of this class.

The following code explains this.

``````import sklearn.preprocessing
import numpy as np
a = np.array([1,0,3])
label_binarizer = sklearn.preprocessing.LabelBinarizer()
label_binarizer.fit(range(max(a)+1))
b = label_binarizer.transform(a)
print(b)
``````

Output:

``````[[0 1 0 0]
[1 0 0 0]
[0 0 0 1]]
``````

## Use the `pandas` Module to Perform One-Hot Encoding on a NumPy Array in Python

Datasets for Machine Learning algorithms are usually in the form of a `pandas` DataFrame. Therefore the `pandas` module is well equipped to perform data encoding. The `get_dummies()` can be used to convert a categorical dataset into numerical indicators thus, performing the one-hot encoding. The final result is a DataFrame.

For example,

``````import pandas as pd
import numpy as np
a = np.array([1,0,3])
b = pd.get_dummies(a)
print(b)
``````

Output:

``````  0  1  3
0  0  1  0
1  1  0  0
2  0  0  1
``````

## Use the `keras` Module to Perform One-Hot Encoding on a NumPy Array in Python

The `keras` module is widely used for Machine Learning in Python. The `to_categorical()` function from this module can perform one-hot encoding on data.

The code snippet below shows how.

``````from keras.utils.np_utils import to_categorical
import numpy as np
a = np.array([1,0,3])
b = to_categorical(a,num_classes = (len(a)+1))
print(b)
``````

Output:

``````[[0. 1. 0. 0.]
[1. 0. 0. 0.]
[0. 0. 0. 1.]]
``````
Author: Manav Narula

Manav is a IT Professional who has a lot of experience as a core developer in many live projects. He is an avid learner who enjoys learning new things and sharing his findings whenever possible.