Create a List With a Specific Size in Python

  1. Preallocate Storage for Lists
  2. Preallocate Storage for Other Sequential Data Structures

Preallocating storage for lists or arrays is a typical pattern among programmers when they know the number of elements ahead of time.

Unlike C++ and Java, in Python, you have to initialize all of your pre-allocated storage with some values. Usually, developers use false values for that purpose, such as None, '', False, and 0.

Python offers several ways to create a list of a fixed size, each with different performance characteristics.

To compare performances of different approaches, we will use Python’s standard module timeit. It provides a handy way to measure run times of small chunks of Python code.

Preallocate Storage for Lists

The first and fastest way to use the * operator, which repeats a list a specified number of times.

>>> [None] * 10
[None, None, None, None, None, None, None, None, None, None]

A million iterations (default value of iterations in timeit) take approximately 117 ms.

>>> timeit("[None] * 10")
0.11655918900214601

Another approach is to use the range built-in function with a list comprehension.

>>> [None for _ in range(10)]
[None, None, None, None, None, None, None, None, None, None]

It’s almost six times slower and takes 612 ms second per million iterations.

>>> timeit("[None for _ in range(10)]")
0.6115895550028654

The third approach is to use a simple for loop together with the list.append().

>>> a = []
>>> for _ in range(10):
...   a.append(None)
...
>>> a
[None, None, None, None, None, None, None, None, None, None]

Using loops is the slowest method and takes 842 ms to complete a million iterations.

>>> timeit("for _ in range(10): a.append(None)", setup="a=[]")
0.8420009529945673

Preallocate Storage for Other Sequential Data Structures

Since you’re preallocating storage for a sequential data structure, it may make a lot of sense to use the array built-in data structure instead of a list.

>>> from array import array
>>> array('i',(0,)*10)
array('i', [0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

As we see below, this approach is second fastest after [None] * 10.

>>> timeit("array('i',(0,)*10)", setup="from array import array")
0.4557597979946877

Let’s compare the above pure Python approaches to the NumPy Python package for scientific computing.

>>> from numpy import empty
>>> empty(10)
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

The NumPy way takes 589 ms per million iterations.

>>> timeit("empty(10)", setup="from numpy import empty")
0.5890094790011062

However, the NumPy way will be much faster for more massive lists.

>>> timeit("[None]*10000")
16.059584009999526
>>> timeit("empty(10000)", setup="from numpy import empty")
1.1065983309963485

The conclusion is that it’s best to stick to [None] * 10 for small lists, but switch to NumPy’s empty() when dealing with more massive sequential data.

Contribute
DelftStack is a collective effort contributed by software geeks like you. If you like the article and would like to contribute to DelftStack by writing paid articles, you can check the write for us page.

Related Article - Python List

  • Write a List to a File With Python
  • Sort a List Alphabetically in Python