Python Dict vs Asdict

Abdullah Bukhari Oct 10, 2023
  1. the dataclasses Library in Python
  2. Why dict Is Faster Than asdict
Python Dict vs Asdict

The dataclasses library was introduced in Python 3.7, allowing us to make structured classes specifically for data storage. These classes have specific properties and methods to deal with data and its portrayal.

the dataclasses Library in Python

To install the dataclasses library, use the below command.

pip install dataclasses

Unlike a normal class in Python, the dataclasses are implemented using the @dataclass decorators with classes. Also, attribute declaration is made using type hints, which specify data types for the attributes in the dataclass.

Below is a code snippet that puts the concept into practice.

# A bare-bones Data Class
# Don't forget to import the dataclass module
from dataclasses import dataclass


@dataclass
class Student:
    """A class which holds a students data"""

    # Declaring attributes
    # Making use of type hints

    name: str
    id: int
    section: str
    classname: str
    fatherName: str
    motherName: str


# Below is a dataclass instance
student = Student("Muhammad", 1432, "Red", "0-1", "Ali", "Marie")
print(student)

Output:

Student(name='Muhammad', id=1432, section='Red', classname='0-1', fatherName='Ali', motherName='Marie')

There are two points to note in the code above. First, a dataclass object accepts arguments and assigns them to relevant data members without an _init_() constructor.

This is so because the dataclass provides a built-in _init_() constructor.

The second point to note is that the print statement neatly prints the data present in the object without any function specifically programmed to do this. This means it must have an altered _repr_() function.

Why dict Is Faster Than asdict

In most cases, where you would have used dict without dataclasses, you certainly should continue using dict.

However, the asdict performs extra tasks during a copy call that might not be useful for your case. These extra tasks will have an overhead that you’d like to avoid.

Here’s what it does according to the official documentation. Each dataclass object is first converted to a dict of its fields as name: value pairs.

Then, the dataclasses, dicts, lists, and tuples are recursed.

For instance, if you need recursive dataclass dictification, go for asdict. Otherwise, all the extra work that goes into providing it is wasted.

If you use asdict in particular, then modifying the implementation of contained objects to use dataclass will change the result of asdict on the outer objects.

from dataclasses import dataclass, asdict
from typing import List


@dataclass
class APoint:
    x1: int
    y1: int


@dataclass
class C:
    aList: List[APoint]


point_instance = APoint(10, 20)
assert asdict(point_instance) == {"x1": 10, "y1": 20}
c = C([APoint(30, 40), APoint(50, 60)])
assert asdict(c) == {"aList": [{"x1": 30, "y1": 40}, {"x1": 50, "y1": 60}]}

Moreover, the recursive business logic can in no way handle circular references. If you use dataclasses to represent, well, let’s say, a graph, or some other data structure with circular references, the asdict will certainly crash.

@dataclasses.dataclass
class GraphNode:
    name: str
    neighbors: list["GraphNode"]


x = GraphNode("x", [])
y = GraphNode("y", [])
x.neighbors.append(y)
y.neighbors.append(x)
dataclasses.asdict(x)
# The code will crash here as
# the max allowed recursion depth would have exceeded
# while calling the python object
# in case you're running this on jupyter notebook notice
# that the kernel will restart as the code crashed

Furthermore, asdict builds a new dict, the __dict__ though directly accesses the object’s dict attribute.

It is important to note that the return value of asdict won’t, by any means, be affected by the reassignment of the original object’s attributes.

Also, considering that asdict uses fields if you add attributes to a dataclass object that don’t map to declared fields, the asdict won’t include them.

Lastly, although the docs don’t explicitly mention it, asdict will call deep-copy on anything that isn’t a dataclass instance, dict, list, or tuple.

return copy.deepcopy(instance)  # a very costly operation !

Dataclass instance, dicts, lists, and tuples go through the recursive logic, which additionally builds a copy just with the recursive dictification applied.

If you are reasonably well versed in the object-oriented paradigm, then you’d know that deep-copy is a costly operation on its own as it inspects every object to see what needs to be copied; the lack of memo handling essentially means that asdict in all likelihood might create multiple copies of shared objects in nontrivial object graphs.

Beware of such a scenario:

from dataclasses import dataclass, asdict


@dataclass
class PointClass:
    x1: object
    y1: object


obj_instance = object()
var1 = PointClass(obj_instance, obj_instance)
var2 = asdict(var1)
print(var1.x1 is var1.y1)  # prints true
print(var2["x1"] is var2["y1"])  # prints false
print(var2["x1"] is var1.x1)  # prints false

Output:

True
False
False

Related Article - Python Dataclass