[Mini Tutorial] – Parsing Data Files Using Reflections

Data integration is an integral part of any eCommerce website that requires their existing data model to be imported to a database. Demac Media’s clients require different kinds of data integration depending on their existing data structure. Some clients have an existing ERP system and require data to be exported from there and imported to Magento. But in some cases clients provide Demac with a flat file generally a CSV which needs to be parsed efficiently and stored in a data structure that can integrate well with Magento.

In this blog I will share a really cool way of parsing such files using reflections. In all fairness I despised reflections when it was first introduced to me in school but after working with Demac’s integration team, I have realised how big of an asset it can be.

Essentially, reflections allows programs to examine themselves and their objects at runtime. What this means is that at runtime you can access the names of public methods of a class and this is the feature that we are going to exploit when parsing these files.

For the sake of simplicity I would use a product file as an example that is provided to us by clients.

A typical product file looks the following:

Product.csv
Sku, Description, Price, Quantity 
1001, T-Shirt, 10.00, 2
1002, Cap, 5.00, 2
1003, Jacket,1

The first line of the CSV file are the attributes, usually called headers, for obvious reasons. The collection of these attributes with associated values is called a product. So for example a particular product will look like the following:

Product1 :
sku : 1001
Description:T-Shirt
Price: 10.00
Quantity: 2

The aim here is to some how parse the CSV and pick the values for the appropriate attributes so that each product looks like Product1.

First we need to know how to read a CSV file so that we can access this data. You might need to define a class that will model your products in the beginning. As mentioned before a product is a collection of all the attributes(headers) and their values. A typical product class will look like the following:

public class DemacProduct
{
   public string Sku { get; set; };
   public string Description { get; set; };
   public string Price { get; set; };
   public string Quantity { get; set; };
   //Parsing function
   //Reflection function
 }

Here’s how a typical parsing method will look like. The following lines of code will be placed in the above class.

    var filePath = @"D:\eMac\Media\products.csv";
   var readFile = new StreamReader(File.OpenRead(filePath));
   var getHeaders = readFile.ReadLine().Split(','); //Read headers 
   while (!readFile.EndOfStream) 
  { 
    var line = readFile.ReadLine(); //read each line
     //and split each line to get the values of the
     //respective headers
     var values = line.Split(','); 
  }

Is it that easy, really ? Nope, not really. Remember we need to get all the products in the CSV. The above code will simply parse the file and split each line. The next step is to associate these values with the headers we discussed above. If you look carefully at the code, the getHeaders variable stores the headers of the file in an array. This is done by reading the first line of the file and seperating the string based on commas.

The challenge now is to associate each value of the products with the corresponding header. You can do that in a number of ways. I created a dictionary that would store each header and store its respective value in that dictionary. I then stored this dictionary in a list so that I can store all the products.

Too much to take in yet? Let me ellaborate. Take a look below.

var filePath = @"D:\eMac\Media\products.csv";
var readFile = new StreamReader(File.OpenRead(filePath));
var getHeaders = readFile.ReadLine().Split(','); //Read headers
// List to store products.
var listOfItems = new List<Dictionary<String, String>>();
 while (!readFile.EndOfStream) 
 { 
   var line = readFile.ReadLine(); //read each line
   //and split each line to get the values of the respective headers
   var values = line.Split(','); 
   int i = 0;
   foreach (String val in values)
  {       //store headers 
           var currHeader = getHeaders[i].ToString(); 
            addItem.Add(currHeader, val); 
            i++;
    }
  listOfItems.Add(addItem);//list of products
}

A dictionary is a data structure that stores data based on a key value pair. So in our case the key here is the curreHeader(Sku,price e.t.c), which gets its value from the array of headers we stored in getHeaders. This header is stored as a key in the dictionary.

The for-loop goes loops over the values array and stores it in the dictionary. For example Product1 will look like this in a dictionary:

Product1 Dictionary (key ->t; value):
Sku ->; 1001, Description ->;T-Shirt,Price->;10.00,Quantity->; 2

Once all these products are pushed into the list of dictionaries (“listOfItems”) the resulting list of all the products will look something like the following:

listOfItems: Product1,Product2,Product3,.... ProductN

You might be wondering why is this “noob” is trying to use reflections when he can create a list of DemacProducts and set the values of each attribute using the values array. But as you would soon realize that if we use reflections we can produce a more elegant solution to the problem and the code looks much cleaner. Furthermore, if a client adds a new header to the flat file, we would just need to add the header name to our DemacProduct class. So here’s how the cool kids do it :

public static List<DemacProduct> BuildProducts 
                (List<Dictionary<String,String>>listOfItems){
var results = new List<DemacProduct>;();
var makeProduct = new DemacProduct();
     //Reflections</strong>
     //Gets the type of the current instance class,int,long,string
var getType = makeProduct.GetType();
    //Returns all the public properties of the class(DemacProduct)
var getProperty = getType.GetProperties();
 foreach (var items in listOfItems)
 {
    makeProduct = new DemacProduct();
    foreach (var dict in items)
   {
     var key = dict.Key;
     var value = dict.Value;
     //Check if the the key obtained from the headers is there
     var findProductProperty = 
       getProperty.Where(
       x => x.Name.ToLowerInvariant() == key.ToLowerInvariant()
       ).FirstOrDefault();
     //if the correct field in the class is equal to the key 
     //as in the dictionary
        if (findProductProperty != null)
        {
        findProductProperty.SetValue(makeProduct, value, null);
        }
      }
       results.Add(makeProduct);
     } 
     return results;
}

Overwhelmed? Let me explain. The BuilProducts method returns a list of DemacProducts and takes a list of dictionaries as an input which we created before.

As you can see I initialized an new instance of DemacProduct. I then used the methods below to first get the type of the makeProduct and then its respective properties. I used the GetType mehod to get the type of makeProduct which is a DemacProduct.

To access DemacsProducts public methods I used GetProperties which will return all the properties of the DemacaProducts class.

var getType = makeProduct.GetType();
var getProperty = getType.GetProperties(); 

The outer loop of the code loops through the list of products in the list of dictionaries. The inner loop loops through all the dictionary keys (SKU, Price, Description etc).

The real gist of the function lies in the following LINQ statement:

var findProductProperty = getProperty.Where(
    x => x.Name.ToLowerInvariant() == key.ToLowerInvariant()
    ).FirstOrDefault();

This statement looks for a property name that matches the name of the key. So if we consider our DemacProduct class the property names of our interest would be :

SKU, Description, Price, Quantity

Recall that our dictionary also contains keys of the same name. It checks if the key name matches the name of the public property we have defined in our DemacProducts class(the getter and setter methods) then this is the value of the dictionary we our interested in.

The LINQ statement will assign a value to findProductProperty only when there is a match between the property names and the key (header) elements in the dictionary.

Until now we know each value of a product with respect to their headers. What we need to do now is create a DemacProduct and push it into the list so that we can satisfy the function return type.

To do this I used to the SetValue method which can be used to set a value of specific property. In our case this property is  findProductProperty.

findProductProperty.SetValue(makeProduct, value, null);

As an example let’s assume that the property (findProductProperty) in context is the SKU field. The above line of code will take this property and set the value of this property as defined by the variable value. This value of the property will be set in the makeProduct object.

By the end of the inner loop you would have a DemacProduct similar to Product1 which can be pushed into a list of DemacProducts and voila you now have successfully managed to get a list of all the products.

Although, the whole process might be a tad bit cumbersome but if you really get a grasp of the concept it really becomes quite addicting. So don’t get carried away with it!

I am pretty sure there are more efficient ways of using reflections, but the objective of this blog was to give the coder/reader an insight on how to use reflections as a tool when programming.

HAPPY CODING!