XML parsing Examples

XML parsing refers to reading XML documents and extracting meaningful data from them. XML parsers allow us to work with XML data by transforming it into an easily manageable format (like objects or data structures) in different programming languages.

This chapter will guide you through XML parsing in various popular programming languages, demonstrating examples in Python, Java, C#, JavaScript, and PHP. By the end of this section, you will have a solid understanding of how to parse XML in these languages.

Types of XML Parsers

  1. DOM (Document Object Model) Parser: Reads the entire XML document into memory and represents it as a tree structure. It’s ideal for smaller XML files where memory usage isn’t a concern.
  2. SAX (Simple API for XML) Parser: Reads XML sequentially. It is event-driven and doesn’t load the whole XML into memory, making it suitable for large documents.
  3. StAX (Streaming API for XML): Like SAX, but allows both reading and writing XML. It provides a pull-based approach, where the programmer decides when to read the next event.

XML Parsing in Python

Python has a built-in library called xml.etree.ElementTree for parsing XML.

Example

				
					import xml.etree.ElementTree as ET

xml_data = '''
<bookstore>
  <book>
    <title>XML Developer's Guide</title>
    <author>Author Name</author>
    <price>44.95</price>
  </book>
  <book>
    <title>Learn XML</title>
    <author>Another Author</author>
    <price>39.95</price>
  </book>
</bookstore>
'''

# Parse the XML data
root = ET.fromstring(xml_data)

# Extract and print book titles and authors
for book in root.findall('book'):
    title = book.find('title').text
    author = book.find('author').text
    print(f"Title: {title}, Author: {author}")

				
			
				
					 # Output 
Title: XML Developer's Guide, Author: Author Name
Title: Learn XML, Author: Another Author

				
			

Here:

  • ET.fromstring() parses the XML string into a tree structure.
  • findall() and find() are used to extract specific elements.

XML Parsing in Java

Java provides the javax.xml.parsers package, with support for both DOM and SAX parsers. Below is an example using DOM parsing.

Example

				
					import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.*;

public class XMLParser {
    public static void main(String[] args) throws Exception {
        String xmlData = "<bookstore><book><title>XML Developer's Guide</title><author>Author Name</author><price>44.95</price></book></bookstore>";

        // Parse XML
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        Document doc = factory.newDocumentBuilder().parse(new InputSource(new StringReader(xmlData)));

        // Get elements by tag name
        NodeList books = doc.getElementsByTagName("book");

        for (int i = 0; i < books.getLength(); i++) {
            Element book = (Element) books.item(i);
            String title = book.getElementsByTagName("title").item(0).getTextContent();
            String author = book.getElementsByTagName("author").item(0).getTextContent();
            System.out.println("Title: " + title + ", Author: " + author);
        }
    }
}

				
			
				
					 # Output 
Title: XML Developer's Guide, Author: Author Name

				
			

Here:

  • ET.fromstring() parses the XML string into a tree structure.
  • findall() and find() are used to extract specific elements.

XML Parsing in C#

In C#, the System.Xml namespace provides tools for parsing XML, including the XmlDocument class.

Example

				
					using System;
using System.Xml;

class Program
{
    static void Main()
    {
        string xmlData = "<bookstore><book><title>XML Developer's Guide</title><author>Author Name</author><price>44.95</price></book></bookstore>";

        XmlDocument doc = new XmlDocument();
        doc.LoadXml(xmlData);

        XmlNodeList books = doc.GetElementsByTagName("book");

        foreach (XmlNode book in books)
        {
            string title = book["title"].InnerText;
            string author = book["author"].InnerText;
            Console.WriteLine($"Title: {title}, Author: {author}");
        }
    }
}

				
			
				
					// Output 
Title: XML Developer's Guide, Author: Author Name

				
			

Here:

  • XmlDocument.LoadXml() parses the XML string.
  • GetElementsByTagName() retrieves elements, and you access child elements using InnerText.

XML Parsing in JavaScript (Browser)

In JavaScript, you can use the DOMParser for parsing XML strings.

Example

				
					const xmlData = `
<bookstore>
  <book>
    <title>XML Developer's Guide</title>
    <author>Author Name</author>
    <price>44.95</price>
  </book>
</bookstore>`;

const parser = new DOMParser();
const xmlDoc = parser.parseFromString(xmlData, "application/xml");

const books = xmlDoc.getElementsByTagName("book");

for (let i = 0; i < books.length; i++) {
  const title = books[i].getElementsByTagName("title")[0].textContent;
  const author = books[i].getElementsByTagName("author")[0].textContent;
  console.log(`Title: ${title}, Author: ${author}`);
}

				
			
				
					// Output 
Title: XML Developer's Guide, Author: Author Name

				
			

Here:

  • DOMParser.parseFromString() converts the XML string into a document object.
  • getElementsByTagName() retrieves XML elements, and textContent extracts the value.

XML Parsing in PHP

In PHP, the SimpleXML extension provides an easy way to parse XML.

Example

				
					<?php
$xmlData = '
<bookstore>
  <book>
    <title>XML Developer\'s Guide</title>
    <author>Author Name</author>
    <price>44.95</price>
  </book>
</bookstore>';

$xml = simplexml_load_string($xmlData);

foreach ($xml->book as $book) {
    echo "Title: " . $book->title . ", Author: " . $book->author . "\n";
}
?>

				
			
				
					// Output 
Title: XML Developer's Guide, Author: Author Name

				
			

Here:

  • simplexml_load_string() parses the XML string into an object.
  • You can access XML elements directly as properties of the object.

XML Parsing in Go (Golang)

Go provides the encoding/xml package for parsing XML data.

Example

				
					package main

import (
    "encoding/xml"
    "fmt"
    "strings"
)

type Book struct {
    Title  string `xml:"title"`
    Author string `xml:"author"`
    Price  string `xml:"price"`
}

type Bookstore struct {
    Books []Book `xml:"book"`
}

func main() {
    xmlData := `
    <bookstore>
        <book>
            <title>XML Developer's Guide</title>
            <author>Author Name</author>
            <price>44.95</price>
        </book>
    </bookstore>`

    var bookstore Bookstore
    xml.Unmarshal([]byte(xmlData), &bookstore)

    for _, book := range bookstore.Books {
        fmt.Printf("Title: %s, Author: %s\n", book.Title, book.Author)
    }
}

				
			
				
					// Output 
Title: XML Developer's Guide, Author: Author Name

				
			

Here:

  • xml.Unmarshal() parses the XML string into Go structs.
  • You can define structs (Book, Bookstore) with tags to match XML elements.

XML parsing is essential for working with structured data in various programming languages. Whether you're working with small XML files or handling large datasets, each language provides powerful libraries for reading and manipulating XML. From Python’s ElementTree to Go’s encoding/xml, you now have the knowledge to parse XML in the most common programming languages. This chapter provides a full guide on XML parsing across different platforms and serves as a one-stop resource. Happy coding !❤️

Table of Contents