How to Parse XML in Bash

MD Aminul Islam Feb 02, 2024
  1. Use xmllint to Parse XML in Bash
  2. Use XMLStarlet to Parse XML in Bash
How to Parse XML in Bash

Finding any developer who still doesn’t work with XML is almost impossible. It’s a popular markup language widely used to structure and transfer data.

This article will show how we can parse XML through Bash.

We are going to talk about two libraries here. Our first library is xmllint, and the second is known as XMLStarlet.

You need to install them before working with them.

Use xmllint to Parse XML in Bash

This is the most common library that can be used to parse the XML file. But you have to download and install the library before using it.

To install this library, you need to execute the below commands.

sudo apt-get update -qq
sudo apt-get install -y libxml2-utils

You must install the libxml2-utils package with the apt-get.

If you have an XML file named MyXML.xml, you can easily fetch the XML by using the below command.

xmllint MyXML.xml

After executing the above command, you will get an output like the below.

<?xml version="1.0"?>
<specification>
        <type>Laptop</type>
        <model>Macbook</model>
        <screenSizeInch>14</screenSizeInch>
</specification>

This library contains some options or flags. The available options for the library are shared below.

  1. --auto - This flag is for generating a document for testing.
  2. --catalogs - This flag is for using the catalogs from SGML_CATALOG_FILES. Otherwise, /etc/xml/catalog is used by default.
  3. --chkregister - This flag is for turning on node registration.
  4. --compress - This flag is for turning on gzip compression of output.
  5. --copy - This flag is for testing the internal copy implementation.
  6. --c14n - This flag is for using the W3C XML Canonicalization (C14N) that serializes the result of parsing through stdout. It also keeps comments in the result.
  7. --dtdvalid URL - This flag is for using the DTD specified by the URL for validation.
  8. --dtdvalidfpi FPI - This flag is for using the DTD that a Public Identifier FPI for validation specifies; please note that this flag will require a catalog exporting that works as a Public Identifier to work.
  9. --debug - This flag is for parsing a file. It also outputs an annotated tree that is the in-memory version of the document.
  10. --debugent - This flag is for debugging the entities defined in the document.
  11. --dropdtd - This flag is for removing DTD from the output.
  12. --dtdattr - This flag will fetch external DTD. It also populates the tree with Inherited Attributes.
  13. --encode - This flag will provide output in the given encoding.
  14. --format - This flag will reformat and reindent the output.
  15. --help - This flag will print out a summary of the usage for xmllint.
  16. --html - This flag is for using the HTML parser.
  17. --htmlout - This flag will show the result as an HTML file. It will output the necessary HTML tags surrounding the result tree output so that the results can be displayed/viewed in a browser.
  18. --insert - This flag is for testing valid insertions.
  19. --loaddtd - This flag is for fetching the external DTD.
  20. --load-trace - This flag will display all the documents loaded when processing to stderr.
  21. --maxmem NNBYTES - This flag is for testing the parser memory support. Here, the NNBYTES is the maximum number of bytes that the library can allocate.
  22. --memory - This flag is for parsing from memory.
  23. --noblanks - This flag will drop ignorable blank spaces.
  24. --nocatalogs - This flag specifies not to use any catalogs.
  25. --nocdata - This flag will substitute the CDATA section through equivalent text nodes.
  26. --noent - This flag will substitute entity values for entity references.
  27. --nonet - This flag specifies not to use the internet to fetch DTDs or entities.
  28. --noout - This flag will suppress the output. xmllint will show the output of the result tree by default.
  29. --nowarning - This flag specifies not to emit warnings from the validator and/or parser.
  30. --nowrap - This flag specifies not to output HTML doc wrapper.
  31. --noxincludenod - This flag is to do XInclude processing but specifies not to generate the XInclude start and end nodes.
  32. --nsclean - This flag is to remove redundant namespace declarations.
  33. --output FILE - This flag defines a file path where xmllint saves the result of parsing.
  34. --path "PATH(S)" - This flag is to use the (colon-separated or space-separated) list of Filesystem paths that are specified by PATHS for loading DTDs or entities. Here, space-separated lists are enclosed by quotation marks.
  35. --pattern PATTERNVALUE - This flag is for exercising the pattern recognition engine that can be used with a reader interface. It is also used for debugging.
  36. --postvalid - This flag is for validating after parsing is completed.
  37. --push - This flag enables the push mode.
  38. --recover - This flag is for outputting any parsable portions of the invalid document.
  39. --relaxng SCHEMA - This flag will use a RelaxNG file named SCHEMA for validation.
  40. --repeat - This flag is for repeating 100 times for timing or profiling.
  41. --schema - This flag will use the W3C XML Schema file known as SCHEMA.
  42. --shell - Run a navigating shell.
  43. --stream - This flag is for streaming the API.
  44. --testIO - This flag will test the user input/output support.
  45. --timing - This flag will output information about the time the xmllint takes to perform the various steps.
  46. --valid - This flag will check the document’s validity.
  47. --version - This flag will display the version of the library.
  48. --walker - This flag will test the walker module
  49. --xinclude - This flag will do XInclude processing.
  50. --xmlout - This flag is mainly used in conjunction with --html. It will save the document with the XML serializer. It is mainly used to convert from HTML to XHTML.

Use XMLStarlet to Parse XML in Bash

Another popular library for parsing any XML document is known as XMLStarlet. The primary command of the library is xmlstarlet.

You must execute the below command as a root to install this library.

sudo dnf install xmlstarlet

It contains useful options that make validating, transforming, or querying XML files easier. You can easily fetch an XML file through the most simple command of the library.

xmlstarlet format MyXML.xml

After executing the above command, you will see the contents of the XML file as an output like the below.

<?xml version="1.0"?>
<specification>
        <type>Laptop</type>
        <model>Macbook</model>
        <screenSizeInch>14</screenSizeInch>
</specification>

All the codes used in this article are written in Bash. It will only work in the Linux Shell environment.

MD Aminul Islam avatar MD Aminul Islam avatar

Aminul Is an Expert Technical Writer and Full-Stack Developer. He has hands-on working experience on numerous Developer Platforms and SAAS startups. He is highly skilled in numerous Programming languages and Frameworks. He can write professional technical articles like Reviews, Programming, Documentation, SOP, User manual, Whitepaper, etc.

LinkedIn