Pandas DataFrame DataFrame.sample() Function

Minahil Noor Jan 30, 2023 Pandas Pandas DataFrame

Syntax of pandas.DataFrame.sample()
Example Codes: DataFrame.sample()
Example Codes: DataFrame.sample() to Extract the Columns
Example Codes: DataFrame.sample() to Generate a Fraction of Data
Example Codes: DataFrame.sample() to Oversample the DataFrame
Example Codes: DataFrame.sample() With weights

Pandas DataFrame DataFrame.sample() Function

Python Pandas DataFrame.sample() function generates a sample of a random row or a column from a DataFrame. The sample can contain more than one row or column.

Syntax of `pandas.DataFrame.sample()`

DataFrame.sample(
    n=None, frac=None, replace=False, weights=None, random_state=None, axis=None
)

Parameters


`n`	It is an integer. It represents the random number of the rows or columns to be selected from the `DataFrame`.
`frac`	It is a `float` value. It specifies the percentage of random rows or columns to be extracted from the `DataFrame`. For example, `frac=0.45` means that the random rows or columns selected will be 45% of the original data.
`replace`	It is a boolean value. If it is set to `True` then it returns the sample with the replacement of data.
`weights`	It is a string or an N-dimensional `array-like` structure. If it is called on a `DataFrame` then it accepts the name of a column when the axis is 0. The rows with values greater in weights column are more likely to be returned as the sample data.
`random_state`	It is an integer or `numpy.random.RandomState` function. If it is an integer then it returns the same number of rows or columns in every iteration. Otherwise, it returns a `numpy.random.RandomState` object.
`axis`	It is an integer or a string. It specifies the target axis either rows or columns. It can be 0 or `index` and 1 or `columns`.

Return

It returns a Series or a DataFrame. The returned Series or DataFrame is a caller that contains n items selected randomly from the original DataFrame.

Example Codes: `DataFrame.sample()`

By default, the function returns a sample containing rows i.e axis=1.

import pandas as pd

dataframe=pd.DataFrame({'Attendance': {0: 60, 1: 100, 2: 80,3: 75, 4: 95},
                    'Name': {0: 'Olivia', 1: 'John', 2: 'Laura',3: 'Ben',4: 'Kevin'},
                    'Obtained Marks': {0: 56, 1: 75, 2: 82, 3: 64, 4: 67}})
print(dataframe)

Our DataFrame is as below.

   Attendance    Name  Obtained Marks
0          60  Olivia              56
1         100    John              75
2          80   Laura              82
3          75     Ben              64
4          95   Kevin              67

All the parameters of this function are optional. If we execute this function without passing any parameter, it returns a single random row as an output.

import pandas as pd

dataframe=pd.DataFrame({'Attendance': {0: 60, 1: 100, 2: 80,3: 75, 4: 95},
                    'Name': {0: 'Olivia', 1: 'John', 2: 'Laura',3: 'Ben',4: 'Kevin'},
                    'Obtained Marks': {0: 56, 1: 75, 2: 82, 3: 64, 4: 67}})
dataframe1 = dataframe.sample()
print(dataframe1)

Output1:

   Attendance Name  Obtained Marks
3          75  Ben              64

Output2:

   Attendance   Name  Obtained Marks
4          95  Kevin              67

Outpt1 and output2 show the execution of the same program twice. Every time this function generates a random sample of rows from the given DataFrame.

Example Codes: `DataFrame.sample()` to Extract the Columns

To generate columns in a sample we will simply change our axis to 1.

import pandas as pd

dataframe = pd.DataFrame(
    {
        "Attendance": {0: 60, 1: 100, 2: 80, 3: 75, 4: 95},
        "Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
        "Obtained Marks": {0: 56, 1: 75, 2: 82, 3: 64, 4: 67},
    }
)
dataframe1 = dataframe.sample(n=1, axis=1)
print(dataframe1)

Output:

     Name
0  Olivia
1    John
2   Laura
3     Ben
4   Kevin

The function has generated a sample of a single column as an output. The number of columns was set by the parameter n=1.

Example Codes: `DataFrame.sample()` to Generate a Fraction of Data

import pandas as pd

dataframe = pd.DataFrame(
    {
        "Attendance": {0: 60, 1: 100, 2: 80, 3: 75, 4: 95},
        "Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
        "Obtained Marks": {0: 56, 1: 75, 2: 82, 3: 64, 4: 67},
    }
)
dataframe1 = dataframe.sample(frac=0.5)
print(dataframe1)

Output:

   Attendance   Name  Obtained Marks
3          75    Ben              64
4          95  Kevin              67
1         100   John              75

The returned sample is 50% of the original data.

Example Codes: `DataFrame.sample()` to Oversample the DataFrame

If frac>1, then the parameter replace should be True to allow the same row could be sampled more than once; otherwise, it will raise a ValueError.

import pandas as pd

dataframe = pd.DataFrame(
    {
        "Attendance": {0: 60, 1: 100, 2: 80, 3: 75, 4: 95},
        "Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
        "Obtained Marks": {0: 56, 1: 75, 2: 82, 3: 64, 4: 67},
    }
)
dataframe1 = dataframe.sample(frac=1.5, replace=True)
print(dataframe1)

Output:

   Attendance   Name  Obtained Marks
3          75     Ben              64
0          60  Olivia              56
1         100    John              75
2          80   Laura              82
1         100    John              75
2          80   Laura              82
0          60  Olivia              56
4          95   Kevin              67

If replace is set to be False meanwhile frac is larger than 1, than it raises a ValueError.

import pandas as pd

dataframe = pd.DataFrame(
    {
        "Attendance": {0: 60, 1: 100, 2: 80, 3: 75, 4: 95},
        "Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
        "Obtained Marks": {0: 56, 1: 75, 2: 82, 3: 64, 4: 67},
    }
)
dataframe1 = dataframe.sample(frac=1.5, replace=False)
print(dataframe1)

Output:

Traceback (most recent call last):
  File "..\test.py", line 6, in <module>
    dataframe1 = dataframe.sample(frac=1.5, replace=False)
  File "..\lib\site-packages\pandas\core\generic.py", line 5044, in sample
    raise ValueError(
ValueError: Replace has to be set to `True` when upsampling the population `frac` > 1.

Example Codes: `DataFrame.sample()` With `weights`

import pandas as pd

dataframe = pd.DataFrame(
    {
        "Attendance": {0: 60, 1: 100, 2: 80, 3: 75, 4: 95},
        "Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
        "Obtained Marks": {0: 56, 1: 75, 2: 82, 3: 64, 4: 67},
    }
)
dataframe1 = dataframe.sample(n=2, weights="Attendance")
print(dataframe1)

Output:

   Attendance   Name  Obtained Marks
1         100   John              75
4          95  Kevin              67

Here, the rows with greater values in the Attendance column are selected in the returned sample.

Enjoying our tutorials? Subscribe to DelftStack on YouTube to support us in creating more high-quality video guides. Subscribe

Syntax of pandas.DataFrame.sample()

Parameters

Return

Example Codes: DataFrame.sample()

Example Codes: DataFrame.sample() to Extract the Columns

Example Codes: DataFrame.sample() to Generate a Fraction of Data

Example Codes: DataFrame.sample() to Oversample the DataFrame

Example Codes: DataFrame.sample() With weights

Related Article - Pandas DataFrame

Syntax of `pandas.DataFrame.sample()`

Example Codes: `DataFrame.sample()`

Example Codes: `DataFrame.sample()` to Extract the Columns

Example Codes: `DataFrame.sample()` to Generate a Fraction of Data

Example Codes: `DataFrame.sample()` to Oversample the DataFrame

Example Codes: `DataFrame.sample()` With `weights`