How to Sample With Replacement in Python

Manav Narula Feb 12, 2024
  1. Python Sample With Replacement
  2. Use the random.choices() Function to Sample With Replacement in Python
  3. Use the random.choice() Function to Sample With Replacement in Python
  4. Use the numpy.random.choice() Function to Sample With Replacement in Python
  5. Python Sample With Replacement Using the numpy.random.randint Function
  6. Conclusion
How to Sample With Replacement in Python

Sampling with replacement is a statistical technique where elements are selected from a dataset, and after each selection, the chosen element is put back into the dataset. This process allows for the possibility of selecting the same element multiple times.

In Python, there are several methods to perform sampling with replacement, each with its advantages and use cases. In this article, we will explore different methods along with example codes and explanations.

Python Sample With Replacement

Sampling refers to the process of selecting samples of data out of a given sequence. Several functions are available in the random module to select a sample from a given sequence.

There is also a random submodule within the numpy package to work with random numbers in an array.

We can use the random.choice() function to select a single random element. The random.sample() function can sample without replacement.

The random.choices() function is used for sampling with replacement in Python.

This tutorial demonstrates how to get a sample with a replacement in Python. We will select the sample from a list of integers.

Use the random.choices() Function to Sample With Replacement in Python

Python 3.6 introduced the random.choices() function. The random.choices function in the random module provides a simple way to perform sampling with replacement.

It takes a population and a k parameter specifying the number of elements to sample.

Syntax

The syntax of the random.choices() function is as follows:

random.choices(population, weights=None, cum_weights=None, k=1)

Parameters

  • population: This is a required parameter and represents the population from which the elements are chosen.
  • weights: An optional parameter that assigns weights to the elements in the population. It must be of the same length as the population. Weights indicate the likelihood of selecting each element.
  • cum_weights: An alternative to weights, cum_weights stands for cumulative weights. If provided, it should be of the same length as the population. Cumulative weights allow specifying a range of values, and the choice is made based on these ranges.
  • k: An optional parameter representing the number of elements to be chosen. It defaults to 1.

Return Value

The random.choices() function returns a list containing the sampled elements.

Example

We can pass the list and the total number of elements required to get the final sample. The result is returned in a list.

import random

lst = [5, 8, 9, 6, 2, 3, 1, 0, 11, 12, 10]
print(random.choices(lst, k=5))

Output:

[1, 11, 10, 5, 10]

In the above example, we create a sample with a replacement in Python of length 5 from a list in Python. We can also specify some weights using the weights parameter to make the selections.

The cum_weights can also make selections based on the cumulative weights. The weights get converted to cumulative weights internally.

Use the random.choice() Function to Sample With Replacement in Python

The random.choice() function is a versatile tool that simplifies the process of randomly selecting elements from a sequence with replacement.

Syntax

The syntax of the random.choice() function is straightforward:

random.choice(sequence)

Parameters

sequence: This is a mandatory parameter representing the sequence from which an element is randomly chosen.

Return Value

The random.choice() function returns a single randomly selected element from the specified sequence.

Example

We can run the for loop to generate a list with randomly selected elements. Since the function will run in every loop, elements will get selected without knowing the previously selected element.

Below is an example of performing a sample with replacement by using list comprehension along with the random.choice function.

import random

lst = [5, 8, 9, 6, 2, 3, 1, 0, 11, 12, 10]
result = [random.choice(lst) for _ in range(5)]
print(result)

Output:

[2, 0, 0, 12, 6]

We use list comprehension to create a list and store randomly selected elements (generated by the random.choice() function) in this list.

This method manually creates a list by repeatedly choosing random elements from the population using random.choice. The underscore _ is used as a convention to indicate that the loop variable is not used in the loop body.

Use the numpy.random.choice() Function to Sample With Replacement in Python

There is a random submodule in the numpy package. We can use the numpy.random.choice() function to sample with replacement in Python.

The numpy.random.choice() function selects a given number of elements from a one-dimensional numpy array. The final result is returned in a numpy array.

This function accepts a parameter called replace (True by default). If this parameter is changed to False, the sample is returned without replacement.

Syntax

The syntax of the numpy.random.choice() function is as follows:

numpy.random.choice(a, size=None, replace=True, p=None)

Parameters

  • a: This is a required parameter and represents the population from which the elements are chosen.
  • size: An optional parameter that specifies the output shape. If None (default), a single value is returned.
  • replace: An optional Boolean parameter. If True (default), sampling is done with replacement. If False, it is done without replacement.
  • p: An optional parameter that assigns probabilities to each element in the population. It must be a 1-D array-like object of the same length as a.

Return Value

The numpy.random.choice() function returns an array containing the sampled elements.

Example

We will generate a sample with replacement using this function in the example below.

import numpy

lst = [5, 8, 9, 6, 2, 3, 1, 0, 11, 12, 10]
arr = numpy.array(lst)
print(numpy.random.choice(arr, 5))

Output:

[11 10  6  9  3]

This code snippet utilizes the NumPy library in Python to demonstrate random sampling with replacement.

It begins by creating a list lst containing integer values. The list is then converted into a NumPy array named arr.

The numpy.random.choice() function is employed to randomly select 5 elements from the array arr with replacement. In other words, each selection is independent, and the chosen element is placed back into the array, allowing for the possibility of selecting the same element multiple times.

The result of the sampling is printed, providing an array of 5 elements randomly chosen from the original array arr.

Python Sample With Replacement Using the numpy.random.randint Function

If your population consists of consecutive integers, you can use the numpy.random.randint function to generate random indices.

Syntax

The syntax of the numpy.random.randint function is as follows:

numpy.random.randint(low, high=None, size=None, dtype=int)

Parameters

  • low: This is a required parameter representing the inclusive lower boundary of the random integers to be generated.
  • high: An optional parameter that specifies the exclusive upper boundary of the random integers. If not provided, it defaults to low.
  • size: An optional parameter that represents the output shape. If not provided, a single integer is returned.
  • dtype: An optional parameter specifying the data type of the output. The default is int.

Return Value

The numpy.random.randint function returns random integers from the specified range as a NumPy array.

Example

import numpy as np

population = [1, 2, 3, 4, 5]
sample_size = 3

indices = np.random.randint(0, len(population), size=sample_size)
sampled_data = [population[i] for i in indices]

print("Sampled Data:", sampled_data)

Output:

Sampled Data: [2, 2, 1]

Here, numpy.random.randint generates random integer indices, and then the corresponding elements are extracted from the population.

Conclusion

In conclusion, sampling with replacement is a vital statistical technique allowing the random selection of elements from a dataset with the possibility of reselection. In Python, various methods cater to this requirement, each serving specific use cases.

The random.choices() function, introduced in Python 3.6, simplifies the process by offering a flexible and efficient way to perform sampling with replacement. The syntax, parameters, and examples provided in this article illustrate its usage.

Additionally, the random.choice() function and its integration with list comprehension offer an alternative approach for sampling with replacement. For users working with NumPy, the numpy.random.choice() and numpy.random.randint functions provide powerful tools to achieve random sampling efficiently, leveraging the capabilities of NumPy arrays.

Whether working with simple lists or NumPy arrays, understanding these methods equips Python developers with the knowledge to implement effective and tailored sampling strategies in their statistical analyses, simulations, or machine-learning tasks.

Author: Manav Narula
Manav Narula avatar Manav Narula avatar

Manav is a IT Professional who has a lot of experience as a core developer in many live projects. He is an avid learner who enjoys learning new things and sharing his findings whenever possible.

LinkedIn

Related Article - Python Random