How to Randomly Shuffle DataFrame Rows in Pandas

Suraj Joshi Feb 02, 2024
  1. pandas.DataFrame.sample()method to Shuffle DataFrame Rows in Pandas
  2. numpy.random.permutation() to Shuffle Pandas DataFrame Rows
  3. sklearn.utils.shuffle() to Shuffle Pandas DataFrame Rows
How to Randomly Shuffle DataFrame Rows in Pandas

We could use sample() method of the Pandas DataFrame objects, permutation() function from NumPy module and shuffle() function from sklearn package to randomly shuffle DataFrame rows in Pandas.

pandas.DataFrame.sample()method to Shuffle DataFrame Rows in Pandas

pandas.DataFrame.sample() can be used to return a random sample of items from an axis of DataFrame object. We set the axis parameter to 0 as we need to sample elements from row-wise, which is the default value for the axis parameter.

The frac parameter determines what fraction of total instances need to be returned. If we wish to shuffle, we set the value of frac to 1.

import pandas as pd

dates = ["April-10", "April-11", "April-12", "April-13"]
fruits = ["Apple", "Papaya", "Banana", "Mango"]
prices = [3, 1, 2, 4]

df = pd.DataFrame({"Date": dates, "Fruit": fruits, "Price": prices})
print(df)

df_shuffled = df.sample(frac=1).reset_index(drop=True)
print(df_shuffled)

Output:

       Date   Fruit  Price
0  April-10   Apple      3
1  April-11  Papaya      1
2  April-12  Banana      2
3  April-13   Mango      4
       Date   Fruit  Price
3  April-13   Mango      4
2  April-12  Banana      2
0  April-10   Apple      3
1  April-11  Papaya      1

Dataframe.shuttle method shuffles rows of Pandas DataFrame, as shown above. The indices of DataFrame rows keep the same as initial indices.

We could add reset_index() method to reset the dataframe index.

import pandas as pd

dates = ["April-10", "April-11", "April-12", "April-13"]
fruits = ["Apple", "Papaya", "Banana", "Mango"]
prices = [3, 1, 2, 4]

df = pd.DataFrame({"Date": dates, "Fruit": fruits, "Price": prices})
print(df)

df_shuffled = df.sample(frac=1).reset_index(drop=True)
print(df_shuffled)

Output:

       Date   Fruit  Price
0  April-10   Apple      3
1  April-11  Papaya      1
2  April-12  Banana      2
3  April-13   Mango      4
       Date   Fruit  Price
0  April-11  Papaya      1
1  April-13   Mango      4
2  April-10   Apple      3
3  April-12  Banana      2

Here, the drop=True option prevents the index column from being added as the new column.

numpy.random.permutation() to Shuffle Pandas DataFrame Rows

We can use numpy.random.permutation() to shuffle indices of DataFrame. When the shuffled indices are used to select rows using the iloc() method, we get randomly shuffled rows.

import pandas as pd
import numpy as np

dates = ["April-10", "April-11", "April-12", "April-13"]
fruits = ["Apple", "Papaya", "Banana", "Mango"]
prices = [3, 1, 2, 4]

df = pd.DataFrame({"Date": dates, "Fruit": fruits, "Price": prices})

df_shuffled = df.iloc[np.random.permutation(df.index)].reset_index(drop=True)
print(df_shuffled)

Output:

       Date   Fruit  Price
0  April-13   Mango      4
1  April-12  Banana      2
2  April-10   Apple      3
3  April-11  Papaya      1

You might get a different result while running the same code. It is because np.random.permutation() function generates different permutations of numbers each time.

sklearn.utils.shuffle() to Shuffle Pandas DataFrame Rows

We can also use sklearn.utils.shuffle() to shuffle rows of Pandas DataFrame.

import pandas as pd
import numpy as np
import sklearn

dates = ["April-10", "April-11", "April-12", "April-13"]
fruits = ["Apple", "Papaya", "Banana", "Mango"]
prices = [3, 1, 2, 4]

df = pd.DataFrame({"Date": dates, "Fruit": fruits, "Price": prices})

df_shuffled = sklearn.utils.shuffle(df)
print(df_shuffled)

Output:

       Date   Fruit  Price
3  April-13   Mango      4
0  April-10   Apple      3
1  April-11  Papaya      1
2  April-12  Banana      2

If you do not have sklearn package installed in your you can simply install it using the script:

pip install -U scikit-learn
Author: Suraj Joshi
Suraj Joshi avatar Suraj Joshi avatar

Suraj Joshi is a backend software engineer at Matrice.ai.

LinkedIn

Related Article - Pandas DataFrame Row