How to Create a List with a Specific Size in Python

Preallocating storage for lists or arrays is a typical pattern among programmers when they know the number of elements ahead of time.

Unlike C++ and Java, in Python, you have to initialize all of your pre-allocated storage with some values. Usually, developers use false values for that purpose, such as None, '', False, and 0.

Python offers several ways to create a list of a fixed size, each with different performance characteristics.

To compare performances of different approaches, we will use Python’s standard module timeit. It provides a handy way to measure run times of small chunks of Python code.

Preallocating storage for lists

The first and fastest way to use the * operator, which repeats a list a specified number of times.

>>> [None] * 10
[None, None, None, None, None, None, None, None, None, None]

A million iterations (default value of iterations in timeit) take approximately 117 ms.

>>> timeit("[None] * 10")
0.11655918900214601

Another approach is to use the range built-in function with a list comprehension.

>>> [None for _ in range(10)]
[None, None, None, None, None, None, None, None, None, None]

It’s almost six times slower and takes 612 ms second per million iterations.

>>> timeit("[None for _ in range(10)]")
0.6115895550028654

The third approach is to use a simple for loop together with the list.append().

>>> a = []
>>> for _ in range(10):
...   a.append(None)
...
>>> a
[None, None, None, None, None, None, None, None, None, None]

Using loops is the slowest method and takes 842 ms to complete a million iterations.

>>> timeit("for _ in range(10): a.append(None)", setup="a=[]")
0.8420009529945673

Preallocating storage for other sequential data structures

Since you’re preallocating storage for a sequential data structure, it may make a lot of sense to use the array built-in data structure instead of a list.

>>> from array import array
>>> array('i',(0,)*10)
array('i', [0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

As we see below, this approach is second fastest after [None] * 10.

>>> timeit("array('i',(0,)*10)", setup="from array import array")
0.4557597979946877

Let’s compare the above pure Python approaches to the NumPy Python package for scientific computing.

>>> from numpy import empty
>>> empty(10)
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

The NumPy way takes 589 ms per million iterations.

>>> timeit("empty(10)", setup="from numpy import empty")
0.5890094790011062

However, the NumPy way will be much faster for more massive lists.

>>> timeit("[None]*10000")
16.059584009999526
>>> timeit("empty(10000)", setup="from numpy import empty")
1.1065983309963485

The conclusion is that it’s best to stick to [None] * 10 for small lists, but switch to NumPy’s empty() when dealing with more massive sequential data.