Get Index of Rows Whose Column Matches Specific Value in Pandas

Get Index of Rows Whose Column Matches Specific Value in Pandas

  1. Get Indices of Rows Containing Integers/Floats in Pandas
  2. Get Indices of Rows Containing Strings in Pandas

This article demonstrates how to to get the index of rows that matches certain criteria in Pandas.

The necessity to find the indices of the rows is important in feature engineering. These skills can be useful to remove the outliers or abnormal values in a Dataframe. The indices, also known as the row labels, can be found in Pandas using several functions. In the following examples, we will be working on the dataframe created using the following snippet.

import pandas as pd
import numpy as np

np.random.seed(0)

df = pd.DataFrame(np.random.randint(1,20,size=(20, 4)), columns=list('ABCD'))

print (df)

Output:

     A   B   C   D
0   13  16   1   4
1    4   8  10  19
2    5   7  13   2
3    7   8  15  18
4    6  14   9  10
5   17   6  16  16
6    1  19   4  18
7   15   8   1   2
8   10   1  11   4
9   12  19   3   1
10   1   5   6   7
11   9  18  16   5
12  10  11   2   2
13   8  10   4   7
14  12  15  19   1
15  15   4  13  11
16  12   5   7   5
17  16   4  13   5
18   9  15  16   4
19  16  14  17  18

Get Indices of Rows Containing Integers/Floats in Pandas

The pandas.DataFrame.loc function can access rows and columns by its labels/names. It is straight forward in returning the rows matching the given boolean condition passed as a label. Notice the square brackets next to df.loc in the snippet.

import pandas as pd
import numpy as np

np.random.seed(0)

df = pd.DataFrame(np.random.randint(1,20,size=(20, 4)), columns=list('ABCD'))

print (df.loc[df['B'] == 19])

The rows corresponding to the boolean condition is returned as an output in the format of a Dataframe.

Output:

    A   B  C   D
6   1  19  4  18
9  12  19  3   1

Multiple conditions can be chained and applied together to the function, as shown below. This helps in isolating the rows based on specific conditions.

import pandas as pd
import numpy as np

np.random.seed(0)

df = pd.DataFrame(np.random.randint(1,20,size=(20, 4)), columns=list('ABCD'))

print (df.loc[(df['B'] == 19) | (df['C'] == 19)])

Output:

     A   B   C   D
6    1  19   4  18
9   12  19   3   1
14  12  15  19   1

Get Index of Rows With pandas.DataFrame.index()

If you would like to find just the matched indices of the dataframe that satisfies the boolean condition passed as an argument, pandas.DataFrame.index() is the easiest way to achieve it.

import pandas as pd
import numpy as np

np.random.seed(0)

df = pd.DataFrame(np.random.randint(1,20,size=(20, 4)), columns=list('ABCD'))

print (df.index[df['B'] == 19].tolist())

In the above snippet, the rows of column A matching the boolean condition == 1 is returned as output as shown below.

Output:

[6, 9]

The reason why we put tolist() behind the index() method is to convert the Index to the list; otherwise, the result is of Int64Index data type.

Int64Index([6, 9], dtype='int64'

Retrieving just the indices can be done based on multiple conditions too. The snippet can be written as follows:

import pandas as pd
import numpy as np

np.random.seed(0)

df = pd.DataFrame(np.random.randint(1,20,size=(20, 4)), columns=list('ABCD'))

print (df.index[(df['B'] == 19) | (df['C'] == 19)].tolist())

Output:

[6, 9, 14]

Get Indices of Rows Containing Strings in Pandas

The string values can be matched based on two methods. Both the methods shown in the previous section will work, except for the condition change.

In the following examples, we will use the following snippet.

import pandas as pd

df = pd.DataFrame({"Name": ["blue", 
                            "delta", 
                            "echo", 
                            "charlie", 
                            "alpha"], 
                    "Type": ["Raptors",
                            "Raptors",
                            "Raptors",
                            "Raptors",
                            "Tyrannosaurus rex"]
                    })

print (df)

Output:

      Name               Type
0     blue            Raptors
1    delta            Raptors
2     echo            Raptors
3  charlie            Raptors
4    alpha  Tyrannosaurus rex

Get Index of Rows With the Exact String Match

The equality condition used in the previous section can be used in finding the exact string match in the Dataframe. We will look for the two strings.

import pandas as pd

df = pd.DataFrame({"Name": ["blue", 
                            "delta", 
                            "echo", 
                            "charlie", 
                            "alpha"], 
                    "Type": ["Raptors",
                            "Raptors",
                            "Raptors",
                            "Raptors",
                            "Tyrannosaurus rex"]
                    })

print (df.index[(df['Name'] == 'blue')].tolist())
print ('\n')
print (df.loc[df['Name'] == 'blue'])
print ('\n')
print (df.loc[(df['Name'] == 'charlie') & (df['Type'] =='Raptors')])

Output:

[0]

   Name     Type
0  blue  Raptors

      Name     Type
3  charlie  Raptors

As seen above, both the index and the rows matching the condition can be received.

Get Index of Rows With the Partial String Match

The string values can be partially matched by chaining the dataframe to the str.contains function. In the following example, we will be looking for the string ha in charlie and alpha.

import pandas as pd

df = pd.DataFrame({"Name": ["blue", 
                            "delta", 
                            "echo", 
                            "charlie", 
                            "alpha"], 
                    "Type": ["Raptors",
                            "Raptors",
                            "Raptors",
                            "Raptors",
                            "Tyrannosaurus rex"]
                    })

print (df.index[df['Name'].str.contains('ha')].tolist())
print ('\n')
print (df.loc[df['Name'].str.contains('ha')])
print ('\n')
print (df.loc[(df['Name'].str.contains('ha')) & (df['Type'].str.contains('Rex'))])

Output:

[3, 4]

      Name               Type
3  charlie            Raptors
4    alpha  Tyrannosaurus rex


    Name               Type
4  alpha  Tyrannosaurus rex

This function can be very useful in performing a partial string matching across multiple columns of the dataframe.

Related Article - Pandas DataFrame

  • Get Pandas DataFrame Column Headers as a List
  • Delete Pandas DataFrame Column
  • Convert Pandas Column to Datetime
  • Convert a Float to an Integer in Pandas DataFrame
  • Sort Pandas DataFrame by One Column's Values
  • Get the Aggregate of Pandas Group-By and Sum
  • Related Article - Pandas DataFrame Row

  • Get the Row Count of a Pandas DataFrame
  • Randomly Shuffle DataFrame Rows in Pandas
  • Filter Dataframe Rows Based on Column Values in Pandas
  • Iterate Through Rows of a DataFrame in Pandas
  • Get Index of All Rows Whose Particular Column Satisfies Given Condition in Pandas
  • Find Duplicate Rows in a DataFrame Using Pandas