Create a List With a Specific Size in Python

  1. Preallocate Storage for Lists
  2. Preallocate Storage for Other Sequential Data Structures

Preallocating storage for lists or arrays is a typical pattern among programmers when they know the number of elements ahead of time.

Unlike C++ and Java, in Python, you have to initialize all of your pre-allocated storage with some values. Usually, developers use false values for that purpose, such as None, '', False, and 0.

Python offers several ways to create a list of a fixed size, each with different performance characteristics.

To compare performances of different approaches, we will use Python’s standard module timeit. It provides a handy way to measure run times of small chunks of Python code.

Preallocate Storage for Lists

The first and fastest way to use the * operator, which repeats a list a specified number of times.

>>> [None] * 10
[None, None, None, None, None, None, None, None, None, None]

A million iterations (default value of iterations in timeit) take approximately 117 ms.

>>> timeit("[None] * 10")

Another approach is to use the range built-in function with a list comprehension.

>>> [None for _ in range(10)]
[None, None, None, None, None, None, None, None, None, None]

It’s almost six times slower and takes 612 ms second per million iterations.

>>> timeit("[None for _ in range(10)]")

The third approach is to use a simple for loop together with the list.append().

>>> a = []
>>> for _ in range(10):
...   a.append(None)
>>> a
[None, None, None, None, None, None, None, None, None, None]

Using loops is the slowest method and takes 842 ms to complete a million iterations.

>>> timeit("for _ in range(10): a.append(None)", setup="a=[]")

Preallocate Storage for Other Sequential Data Structures

Since you’re preallocating storage for a sequential data structure, it may make a lot of sense to use the array built-in data structure instead of a list.

>>> from array import array
>>> array('i',(0,)*10)
array('i', [0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

As we see below, this approach is second fastest after [None] * 10.

>>> timeit("array('i',(0,)*10)", setup="from array import array")

Let’s compare the above pure Python approaches to the NumPy Python package for scientific computing.

>>> from numpy import empty
>>> empty(10)
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

The NumPy way takes 589 ms per million iterations.

>>> timeit("empty(10)", setup="from numpy import empty")

However, the NumPy way will be much faster for more massive lists.

>>> timeit("[None]*10000")
>>> timeit("empty(10000)", setup="from numpy import empty")

The conclusion is that it’s best to stick to [None] * 10 for small lists, but switch to NumPy’s empty() when dealing with more massive sequential data.

Related Article - Python List

  • Find the Index of an Item in Python List
  • What Is Difference Between Del, Remove and Pop on Python Lists