Create a List With a Specific Size in Python

  1. Preallocate Storage for Lists
  2. Preallocate Storage for Other Sequential Data Structures

Preallocating storage for lists or arrays is a typical pattern among programmers when they know the number of elements ahead of time.

Unlike C++ and Java, in Python, you have to initialize all of your pre-allocated storage with some values. Usually, developers use false values for that purpose, such as None, '', False, and 0.

Python offers several ways to create a list of a fixed size, each with different performance characteristics.

To compare performances of different approaches, we will use Python’s standard module timeit. It provides a handy way to measure run times of small chunks of Python code.

Preallocate Storage for Lists

The first and fastest way to use the * operator, which repeats a list a specified number of times.

>>> [None] * 10
[None, None, None, None, None, None, None, None, None, None]

A million iterations (default value of iterations in timeit) take approximately 117 ms.

>>> timeit("[None] * 10")

Another approach is to use the range built-in function with a list comprehension.

>>> [None for _ in range(10)]
[None, None, None, None, None, None, None, None, None, None]

It’s almost six times slower and takes 612 ms second per million iterations.

>>> timeit("[None for _ in range(10)]")

The third approach is to use a simple for loop together with the list.append().

>>> a = []
>>> for _ in range(10):
...   a.append(None)
>>> a
[None, None, None, None, None, None, None, None, None, None]

Using loops is the slowest method and takes 842 ms to complete a million iterations.

>>> timeit("for _ in range(10): a.append(None)", setup="a=[]")

Preallocate Storage for Other Sequential Data Structures

Since you’re preallocating storage for a sequential data structure, it may make a lot of sense to use the array built-in data structure instead of a list.

>>> from array import array
>>> array('i',(0,)*10)
array('i', [0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

As we see below, this approach is second fastest after [None] * 10.

>>> timeit("array('i',(0,)*10)", setup="from array import array")

Let’s compare the above pure Python approaches to the NumPy Python package for scientific computing.

>>> from numpy import empty
>>> empty(10)
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

The NumPy way takes 589 ms per million iterations.

>>> timeit("empty(10)", setup="from numpy import empty")

However, the NumPy way will be much faster for more massive lists.

>>> timeit("[None]*10000")
>>> timeit("empty(10000)", setup="from numpy import empty")

The conclusion is that it’s best to stick to [None] * 10 for small lists, but switch to NumPy’s empty() when dealing with more massive sequential data.

DelftStack is a collective effort contributed by software geeks like you. If you like the article and would like to contribute to DelftStack by writing paid articles, you can check the write for us page.

Related Article - Python List

  • Create List of Zeros in Python
  • Append to Front of a List in Python