Create a List With a Specific Size in Python

Jinku Hu Dec 10, 2020 Nov 09, 2019
  1. Preallocate Storage for Lists
  2. Preallocate Storage for Other Sequential Data Structures
Create a List With a Specific Size in Python

Preallocating storage for lists or arrays is a typical pattern among programmers
when they know the number of elements ahead of time.

Unlike C++ and Java, in Python, you have to initialize all of your pre-allocated storage with some values. Usually, developers use false values for that purpose, such as None, '', False, and 0.

Python offers several ways to create a list of a fixed size, each with
different performance characteristics.

To compare performances of different approaches, we will use Python’s standard
module timeit.
It provides a handy way to measure run times of small chunks of Python code.

Preallocate Storage for Lists

The first and fastest way to use the * operator, which repeats a list a specified
number of times.

>>> [None] * 10
[None, None, None, None, None, None, None, None, None, None]

A million iterations (default value of iterations in timeit) take approximately
117 ms.

>>> timeit("[None] * 10")

Another approach is to use the range built-in function with a list comprehension.

>>> [None for _ in range(10)]
[None, None, None, None, None, None, None, None, None, None]

It’s almost six times slower and takes 612 ms second per million iterations.

>>> timeit("[None for _ in range(10)]")

The third approach is to use a simple for loop together with the list.append().

>>> a = []
>>> for _ in range(10):
...   a.append(None)
>>> a
[None, None, None, None, None, None, None, None, None, None]

Using loops is the slowest method and takes 842 ms to complete a million iterations.

>>> timeit("for _ in range(10): a.append(None)", setup="a=[]")

Preallocate Storage for Other Sequential Data Structures

Since you’re preallocating storage for a sequential data structure, it may make a lot of sense to use the array built-in data structure instead of a list.

>>> from array import array
>>> array('i',(0,)*10)
array('i', [0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

As we see below, this approach is second fastest after [None] * 10.

>>> timeit("array('i',(0,)*10)", setup="from array import array")

Let’s compare the above pure Python approaches to the NumPy Python package for scientific computing.

>>> from numpy import empty
>>> empty(10)
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

The NumPy way takes 589 ms per million iterations.

>>> timeit("empty(10)", setup="from numpy import empty")

However, the NumPy way will be much faster for more massive lists.

>>> timeit("[None]*10000")
>>> timeit("empty(10000)", setup="from numpy import empty")

The conclusion is that it’s best to stick to [None] * 10 for small lists, but switch
to NumPy’s empty() when dealing with more massive sequential data.

Author: Jinku Hu
Jinku Hu avatar Jinku Hu avatar

Founder of Jinku has worked in the robotics and automotive industries for over 8 years. He sharpened his coding skills when he needed to do the automatic testing, data collection from remote servers and report creation from the endurance test. He is from an electrical/electronics engineering background but has expanded his interest to embedded electronics, embedded programming and front-/back-end programming.


Related Article - Python List