How to Convert XML to Dictionary in Python

Hemank Mehtani Feb 02, 2024
  1. Use the xmltodict Module to Convert XML String Into a Dictionary in Python
  2. Use the ElemenTree Library to Convert XML String Into Dictionary in Python
  3. Handling Attributes
  4. Using minidom Library (xml.dom.minidom) to Convert XML to Dictionary in Python
  5. Using xmljson Library to Convert XML to Dictionary in Python
  6. Conclusion
How to Convert XML to Dictionary in Python

Working with XML data is a common task in programming, especially when dealing with web services, configuration files, or data interchange between systems.

XML (eXtensible Markup Language) provides a structured way to represent data, but it often needs to be converted into a more accessible format for processing.

In Python, there are several methods and libraries available to convert XML data into a dictionary, which is a versatile and widely used data structure.

In this article, we’ll explore four different methods for achieving this conversion, each with its own advantages and use cases.

Use the xmltodict Module to Convert XML String Into a Dictionary in Python

xmltodict is a Python library that allows you to parse XML data and convert it into a nested dictionary structure. It provides a straightforward and efficient way to work with XML data without having to write complex parsing code manually. The library is not a built-in Python module, so you’ll need to install it separately using pip.

pip install xmltodict

Once installed, you can import xmltodict in your Python script and start using it.

The core idea behind xmltodict is to convert the hierarchical structure of XML into a nested dictionary. Each XML element becomes a dictionary key, and its content, if any, becomes the associated value. If an XML element has child elements, they are represented as nested dictionaries.

Consider the following XML data as an example:

<student>
    <id>DEL</id>
    <name> Jack </name>
    <email>jack@example.com</email>
    <semester>8</semester>
    <class>CSE</class>
    <cgpa> 7.5</cgpa>
</student>

Using xmltodict, this XML data would be converted into a Python dictionary like this:

{
    "student": {
        "id": "DEL",
        "name": " Jack ",
        "email": "jack@example.com",
        "semester": "8",
        "class": "CSE",
        "cgpa": " 7.5",
    }
}

Let’s walk through a step-by-step example of using xmltodict to convert XML data into a dictionary.

import xmltodict

xml_data = """<student>
    <id>DEL</id>
    <name> Jack </name>
    <email>jack@example.com</email>
    <semester>8</semester>
    <class>CSE</class>
    <cgpa> 7.5</cgpa>
</student>"""

# Parse XML and convert it to a dictionary
data_dict = xmltodict.parse(xml_data)

# Accessing data in the dictionary
student = data_dict["student"]

# Printing student information
print(f"Student ID: {student['id']}")
print(f"Name: {student['name']}")
print(f"Email: {student['email']}")
print(f"Semester: {student['semester']}")
print(f"Class: {student['class']}")
print(f"CGPA: {student['cgpa']}")

In this example:

  1. We import the xmltodict library.
  2. We define an XML string called xml_data containing sample XML data representing a student’s information.
  3. We use xmltodict.parse(xml_data) to convert the XML data into a Python dictionary called data_dict.
  4. We access and print the student’s information from the dictionary.

When you run this code, it will output:

Student ID: DEL
Name:  Jack 
Email: jack@example.com
Semester: 8
Class: CSE
CGPA: 7.5

Handling Whitespace and Attributes

xmltodict handles whitespace and attributes seamlessly. In the example XML data, you may have noticed that there are leading and trailing whitespaces in some elements. xmltodict preserves these spaces in the dictionary values.

Additionally, if an XML element has attributes, they are included as key-value pairs in the dictionary.

For instance, consider the following XML data:

<student id="DEL">
    <name> Jack </name>
    <email>jack@example.com</email>
    <semester>8</semester>
    <class>CSE</class>
    <cgpa> 7.5</cgpa>
</student>

Using xmltodict, the resulting Python dictionary would include attributes:

{
    "student": {
        "@id": "DEL",
        "name": " Jack ",
        "email": "jack@example.com",
        "semester": "8",
        "class": "CSE",
        "cgpa": " 7.5",
    }
}

As shown in the dictionary, attributes are represented with the @ symbol in the dictionary keys, and the attribute values are included as key-value pairs within the element’s dictionary.

Here, we can see that the result is in the form of an ordered dictionary. An ordered dictionary preserves the order of the key-value pairs in a dictionary. The parse() function here parses the XML data to an ordered dictionary.

Use the ElemenTree Library to Convert XML String Into Dictionary in Python

ElementTree is a built-in library in Python that provides a simple and efficient way to parse XML data and work with it in a tree-like structure. It allows you to traverse and manipulate XML data by representing it as a hierarchy of elements, making it suitable for various XML processing tasks.

import xml.etree.ElementTree as ET

# Define the XML data
xml_data = """<student>
    <id>DEL</id>
    <name> Jack </name>
    <email>jack@example.com</email>
    <semester>8</semester>
    <class>CSE</class>
    <cgpa> 7.5</cgpa>
</student>"""

# Parse the XML data
root = ET.fromstring(xml_data)

# Initialize an empty dictionary
data_dict = {}

# Iterate through the XML elements
for child in root:
    # Remove leading and trailing whitespace from the text
    text = child.text.strip() if child.text is not None else None
    # Assign the element's text to the dictionary key
    data_dict[child.tag] = text

# Print the resulting dictionary
print(data_dict)

In this code:

  1. We import the xml.etree.ElementTree library as ET.
  2. We define the XML data as a string in the xml_data variable.
  3. We parse the XML data using ET.fromstring(xml_data) to create an ElementTree object, and we store the root element in the root variable.
  4. We initialize an empty dictionary called data_dict to store the converted XML data.
  5. We iterate through the child elements of the root element using a for loop.
  6. For each child element, we extract the text content using child.text. We also remove any leading and trailing whitespace using strip(). We check if the text is not None before assigning it to the dictionary key.
  7. Finally, we print the resulting dictionary, which contains the XML data converted into a key-value structure.

When you run this code, it will output the following dictionary:

{
    "id": "DEL",
    "name": "Jack",
    "email": "jack@example.com",
    "semester": "8",
    "class": "CSE",
    "cgpa": "7.5",
}

Handling Attributes

If your XML data includes attributes, you can access them using the attrib property of an element. Let’s consider XML data with attributes:

<student id="DEL">
    <name> Jack </name>
    <email>jack@example.com</email>
    <semester>8</semester>
    <class>CSE</class>
    <cgpa> 7.5</cgpa>
</student>

To access the id attribute, you can modify the code as follows:

# Accessing an attribute
student_id = root.get("id")
print(f"Student ID: {student_id}")

This code snippet retrieves the id attribute of the <student> element using the get() method and prints it:

Student ID: DEL

Using minidom Library (xml.dom.minidom) to Convert XML to Dictionary in Python

minidom is part of the Python standard library and is a lightweight, minimalistic implementation of the Document Object Model (DOM) for XML. It allows you to work with XML data as a tree-like structure, enabling you to traverse, manipulate, and extract information from XML documents.

Here’s a step-by-step guide on how to use minidom to convert the provided XML data into a dictionary:

import xml.dom.minidom as minidom

# Define the XML data
xml_data = """<student>
    <id>DEL</id>
    <name> Jack </name>
    <email>jack@example.com</email>
    <semester>8</semester>
    <class>CSE</class>
    <cgpa> 7.5</cgpa>
</student>"""

# Parse the XML data
dom = minidom.parseString(xml_data)

# Get the root element
root = dom.documentElement

# Initialize an empty dictionary
data_dict = {}

# Iterate through the child nodes of the root element
for node in root.childNodes:
    if node.nodeType == minidom.Node.ELEMENT_NODE:
        # Remove leading and trailing whitespace from the text content
        text = node.firstChild.nodeValue.strip() if node.firstChild else None
        # Assign the element's text content to the dictionary key
        data_dict[node.tagName] = text

# Print the resulting dictionary
print(data_dict)

In this code:

  1. We import the xml.dom.minidom library as minidom.
  2. We define the XML data as a string in the xml_data variable.
  3. We parse the XML data using minidom.parseString(xml_data) to create a Document object (dom), and we obtain the root element of the XML document using dom.documentElement.
  4. We initialize an empty dictionary called data_dict to store the converted XML data.
  5. We iterate through the child nodes of the root element using a for loop. We check if a node is an element node using node.nodeType == minidom.Node.ELEMENT_NODE.
  6. For each element node, we extract the text content using node.firstChild.nodeValue. We also remove any leading and trailing whitespace using strip(). If the node has no text content, we set the dictionary value to None.
  7. Finally, we print the resulting dictionary, which contains the XML data converted into a key-value structure.

When you run this code, it will output the following dictionary:

{
    "id": "DEL",
    "name": "Jack",
    "email": "jack@example.com",
    "semester": "8",
    "class": "CSE",
    "cgpa": "7.5",
}

Handling Attributes

If your XML data includes attributes, you can access them using the getAttribute() method of an element. Let’s consider XML data with attributes:

<student id="DEL">
    <name> Jack </name>
    <email>jack@example.com</email>
    <semester>8</semester>
    <class>CSE</class>
    <cgpa> 7.5</cgpa>
</student>

To access the id attribute, you can modify the code as follows:

# Accessing an attribute
student_id = root.getAttribute("id")
print(f"Student ID: {student_id}")

This code snippet retrieves the id attribute of the <student> element using the getAttribute() method and prints it as below.

Student ID: DEL

Using xmljson Library to Convert XML to Dictionary in Python

xmljson is a Python library designed for parsing and converting XML data into JSON or a dictionary-like format. It provides flexibility by allowing you to choose from different conversion styles, such as badgerfish,gdata, and more, depending on your specific requirements. This library is particularly useful when you need to handle XML data and want to work with it in a structured format like JSON or a dictionary.

Below is a step-by-step guide on how to use xmljson to convert the provided XML data into a dictionary:

  1. Install the xmljson Library:

    You can install the xmljson library using pip:

    
    pip install xmljson
    
  2. Parsing XML and Converting to a Dictionary:

    After installing the library, you can use it to parse the XML data and convert it into a dictionary. Here’s a Python script that demonstrates this process:

    from xmljson import badgerfish as bf
    
    # Define the XML data
    xml_data = """<student>
       <id>DEL</id>
       <name> Jack </name>
       <email>jack@example.com</email>
       <semester>8</semester>
       <class>CSE</class>
       <cgpa> 7.5</cgpa>
    </student>"""
    
    # Convert XML to a dictionary using the "badgerfish" style
    data_dict = bf.data(xml_data)
    
    # Print the resulting dictionary
    print(data_dict)
    

    In this code:

    • We import the badgerfish module from xmljson as bf.
    • We define the XML data as a string in the xml_data variable.
    • We use bf.data(xml_data) to convert the XML data into a dictionary using the “badgerfish” style. You can choose different styles based on your preference and requirements.
  3. Handling the Resulting Dictionary:

    After converting the XML data to a dictionary, you can easily access and manipulate the data as needed. For example, to access the student’s ID, you can use the following code:

    student_id = data_dict["student"]["id"]["$"]
    print(f"Student ID: {student_id}")
    

    This code retrieves the ID from the resulting dictionary and prints it:

    Student ID: DEL
    

Conclusion

In this article, we’ve explored four different methods for converting XML data into dictionaries in Python. Each method offers its own advantages and is suitable for various scenarios, depending on your specific requirements and preferences.

  • Using xmltodict Library: xmltodict provides a straightforward way to parse XML data and convert it into a nested dictionary structure. It’s ideal when you want a quick and efficient solution for handling XML data.
  • Using ElementTree (xml.etree.ElementTree) Library: Python’s built-in ElementTree library offers a lightweight and efficient approach to parse and work with XML data in a tree-like structure. It’s a versatile choice for various XML processing tasks.
  • Using minidom (xml.dom.minidom) Library: minidom is part of the Python standard library and provides a minimalistic implementation of the DOM for XML. It’s useful for traversing, manipulating, and extracting information from XML documents.
  • Using xmljson Library: xmljson is designed for parsing and converting XML data into JSON or a dictionary-like format. It offers flexibility by supporting different conversion styles, making it valuable when you need to work with XML data in a structured format.

Depending on your project’s requirements and your familiarity with these methods, you can choose the one that best suits your needs. Converting XML data into dictionaries simplifies data processing and manipulation in Python, enabling you to work with XML data more efficiently in your applications.

Related Article - Python Dictionary

Related Article - Python XML