XML Parsing Using LINQ

Demac Media has an exceptionally diverse set of clients and their business requirements may vary at times. An XML file is one approach to storing large volumes of data and often times we need to parse these files to update and load products. I have written about parsing before, but this time I will discuss XML formatted files and the general approach of parsing such a file.

Parsing XML Files Methodology

There are various methods of parsing XML files using C#. The preferred method used to parse such files is converting the XML data using Language-Integrated Query (LINQ). Another way of parsing such a file is using the old fashioned iteration method, where the program iterates over each element. The LINQ method looks a bit intimidating at first, but if you understand how the two methods relate to each other, the resulting code looks much cleaner and is easier to understand. It helps speed the development process when you get the hang of it.

<?xml version="1.0" encoding="utf-8" ?>

 <product>
    <sku>1</sku>
    <description>Red T-shirt</description>
    <price>45</price>   
    <quantity>4</quantity>
    <colour>red</colour>
    <size>
        <size1>medium</size1>
        <size2>large</size2>
        <size3>small</size3>
    </size>
 </product>
 <product>
    <sku>1</sku>
    <description>Red T-shirt</description>
    <price>45</price>   
    <quantity>4</quantity>
    <colour>red</colour>
    <size>
        <size1>medium</size1>
        <size2>large</size2>
        <size3>small</size3>
    </size>
 </product>

<product>
    <sku>1</sku>
    <description>Red T-shirt</description>
    <price>45</price>   
    <quantity>4</quantity>
    <colour>red</colour>
    <size>
        <size1>medium</size1>
        <size2>large</size2>
        <size3>small</size3>
    </size>
 </product>

Designing an Algorithm to Help Parse Files

Lets start off with designing an algorithm that will help us parse the above file using the traditional for loop. But first let me construct a product class that will help me store the information of each product.

public class Product
{
    public string Sku {get;set;}
    public string Price {get;set;}
    public string Quantity {get;set;}
    public string Colour {get;set;}
    public List<string> Size {get;set;}
}  

METHOD 1: FOR-LOOP

public List<Product> ParseXml(string inputFilePath)
{
      var productList = new List<Product>();
      using (var xmlReader = new StreamReader(inputFilePath))
      {
           var doc = XDocument.Load(xmlReader);
           XNamespace nonamespace = XNamespace.None;
           var xmlProducts = doc.Descendants(nonamespace + "product");
      
         foreach (var item in xmlProducts)    
         {
           var product = new Product {
           Sku = item.Element(“sku”).Value;
           Price = item.Element(“price”).Value;
           Colour = item.Element(“colour”).Value;
           Quantity = item.Element(“quantity”).Value;
         }  
     };

     var sizeList = new List<string>();
     foreach(var size in item.Descendants(“size”)
     {
        if (!string.IsNullOrEmpty()))
        {
           sizeList.Add(size.Element(“a001”).Value);
        }
     }    
     product.Size = sizeList;
     productList.Add(product);
     } 
  }
return productList;
}

Method 2: LINQ

public List<Product> ParseXml(string inputFilePath)
{
   var product= new List<Products>();
   using (var xmlReader = new StreamReader(inputFilePath))
   {
      var doc = XDocument.Load(xmlReader);
      XNamespace nonamespace = XNamespace.None; 
      result = (from product in doc.Descendants(nonamespace + "product")
      select new Product
      {
        Sku = product.Element(“size”).Value,
        Price= product.Element(“price”).Value,
        Quantity =product.Element(“quantity”).Value;
        Colour = item.Element(“colour”).Value;
        Series = (from size in product.Descendants(“size”)
                select new List<string>
                {
                    size.Element(“size1”),
                    size.Element(“size2”),
                    size.Element(“size3”)
               }).ToList(),
           }).ToList();
      }
 return product;
}

The initial statements of the ParseXml method are similar since both take the file path as string and initialize a StreamReader to read the xml data. Since our XML is not in a namespace we initialize the namespace as none.

Clearly the two methods look really different. In particular, notice how the LINQ method is very different from the for-loop method.

Each product can have multiple sizes. Hence, the first method loops through each element of the product element and subsequently loops through the “size” element. The LINQ method looks more like a SQL statement than an iterative method to construct a product. Moreover, each variable is assigned values different compared in LINQ compared to the for-loop method.

It basically says that from the product element select all element size, price and quantity. Since, the series element has multiple elements we need to somehow push all of them into the series list. To do this, we simply need to write another LINQ statement to output a list of strings which contain each child of the size element.

Method 2 on the other hand, looks more elegant and neater in general. It is especially useful when parsing files that have hundreds of elements. Although, Method 1 is easy to program and understand, it is also inefficient during the development phase of parsing an XML file with a couple of hundred elements.

So there you have it, a nice clean way for parsing an XML file. A quick note before I sign off, there are instances where you might need to use Method 1 instead of Method 2. Development is sometimes a trade off between best practice and efficiency so choose wisely.