Mask in Pandas

Mask in Pandas

Preet Sanghavi Feb-23, 2022 Feb-15, 2022 Pandas Pandas Mask
  1. Use the dates_data to Create a Dummy Dataframe in Pandas
  2. Use Masking to Filter Data in Pandas

Pandas is an advanced data analysis tool or a package extension in Python. Many companies and organizations require high-quality data analysis to use this tool on a large scale.

A data analyst must decide whether to use pandas based on the data type. It is highly recommended to use Pandas when we have data in a SQL table, a spreadsheet or heterogenous columns.

The data can be ordered or unordered, and time-series data is also supported. In this tutorial, let us understand how to mask data in pandas.

Masking is essentially a way to filter data based on one or more than one condition. The output of this masking is generally an object that is returned as true or false based on the condition.

Use the dates_data to Create a Dummy Dataframe in Pandas

It can be understood as an advanced If-Else scheme for a data frame. However, we will first create a dummy data frame using dates_data, along with a few rows.

import pandas as pd
index = pd.date_range('2013-1-1',periods=100,freq='30Min')
dates_data = pd.DataFrame(data=list(range(100)), columns=['value'], index=index)
dates_data['value2'] = 'Alpha'
dates_data['value2'].loc[0:10] = 'Beta'

The code block creates a data frame with rows with dates and two columns named value and value2. To view the entries in the data, we use the following code:

print(dates_data)

Output:

                     value value2
2013-01-01 00:00:00      0   Beta
2013-01-01 00:30:00      1   Beta
2013-01-01 01:00:00      2   Beta
2013-01-01 01:30:00      3   Beta
2013-01-01 02:00:00      4   Beta
...                    ...    ...
2013-01-02 23:30:00     95  Alpha
2013-01-03 00:00:00     96  Alpha
2013-01-03 00:30:00     97  Alpha
2013-01-03 01:00:00     98  Alpha
2013-01-03 01:30:00     99  Alpha

As we can see, we have 100 different entries with time set up equally after intervals of 30 minutes each.

Two additional columns named value and value2 are created where we have some values set as numbers and others as either Alpha or Beta.

Use Masking to Filter Data in Pandas

Masking is an advanced concept in Pandas where the analyst tries to filter data based on a particular condition.

It is possible to filter this data based on one or more than one condition. We will try to explore each one of these cases in detail here.

Let us begin by filtering data such that we only wish to fetch entries from our data frame dates_data.

mask = dates_data['value2'] == 'Beta'
print(dates_data[mask])

Output:

                     value value2
2013-01-01 00:00:00      0   Beta
2013-01-01 00:30:00      1   Beta
2013-01-01 01:00:00      2   Beta
2013-01-01 01:30:00      3   Beta
2013-01-01 02:00:00      4   Beta
2013-01-01 02:30:00      5   Beta
2013-01-01 03:00:00      6   Beta
2013-01-01 03:30:00      7   Beta
2013-01-01 04:00:00      8   Beta
2013-01-01 04:30:00      9   Beta

We have entries related to only the Beta values in the value2 column of the dates_data data frame.

In this way, we can create a mask and then superimpose that mask on our data to filter data. This mask can also be understood as a stencil to filter out certain data.

We will filter data with a certain range of values from the value column and only the Beta value from the value2 column in the dates_data data frame.

mask = (dates_data['value2'] == 'Beta') & (dates_data['value'] > 3)
print(dates_data[mask])

Output:

                     value value2
2013-01-01 02:00:00      4   Beta
2013-01-01 02:30:00      5   Beta
2013-01-01 03:00:00      6   Beta
2013-01-01 03:30:00      7   Beta
2013-01-01 04:00:00      8   Beta
2013-01-01 04:30:00      9   Beta

As we can see in the code block above, we have successfully filtered data such that we have only values greater than 3 in the value column and the value Beta only in the value2 column.

Therefore, with the help of the Masking technique in Pandas, we can efficiently filter data based on our requirement and based on one condition or more than.

Preet Sanghavi avatar Preet Sanghavi avatar

Preet writes his thoughts about programming in a simplified manner to help others learn better. With thorough research, his articles offer descriptive and easy to understand solutions.

LinkedIn GitHub