Dealing with Duplicate Content in Magento

Throughout the development process, site and business owners have an overwhelming amount of points in the project scope to attend to. Between Design and UX, Integration, Development, etc etc, you get the point. However, and smartly so, we’ve have customers bring up questions around SEO. The question you can’t ask me is “how will the site perform SEO wise?” Sorry everybody, I do not have the golden answer to that very broad question.

The great thing is, our clients usually do a fair amount of homework before hand so they know what to ask me. Our customers chose to use Magento as their platform, so they need a basic understanding of how the platform is designed to meet the requirements of being a search friendly site. We talk about basic stuff like how to set up Google Sitemaps, Google Analytics, entering Meta Descriptions, Titles, Page Content, naming images (alt-image) and how to connect Analytics accounts to Magento. Once the site is submitted to a search engine index, it’s time to get crawled. The unfortunate reality of a full scale eCommerce website is that crawlers are going to find a certain amount of content across the site that may result in duplicate content. For an eCommerce website, this is a standard issue, but by recognizing the problem, seeking out solutions is not going to require you pulling out all of your hair. Let’s take a look.


Related: Relevant Keyword Research for eCommerce Sites

What causes so much duplicate content?

Google Webmaster Tools blog gives some pretty clear cut examples of what causes duplicate content while giving some great solutions. What I’ve noticed is that in Magento, there is a large amount of recorded duplicate content pages that are generated by product pages. Even though that page is typically unique to that specific product, there are a variety of ways get to that page through the navigation of a site. What happens is addition paths to accessing a product page, the URL path changes, but the end location is the same. For example:

User experience of top level categories is designed for visitors to access pages quickly and efficiently. Below is an example of categories usually placed above a main slider:

While users can click any top level category, most eCommerce sites then drop down into a sub menu.

What does this look like in your source code?

I have had clients ask “how will search engines identify those sub categories within the drop down menu?“. When you submit a sitemap a webmaster program such as Google or Bing Webmaster Tools, your are giving them the architecture of the site so that crawlers can access it. Crawl bots are now given access to this map to explore the site, with the exception of anything you have disallowed in your robots.txt file (For those who need clarification, robots.txt files tell crawlers what they can, and cannot crawl)

That top level navigation into the drop down menu is a list. The main category in Magento is labelled as a class=”level-top parent”, while the rest are div class=”subnavcontainer”.

As you can see, <a href> links are inserted into the list, which is the avenue crawlers are going to take to each page, so the drop down menu does not affect whether or not those links are crawl able.

Now, let’s go find some duplicate content shall we.

In this first example, I’m going to head over to the Home Page, hover over the category Accessories and chose Hosiery (not that I am in current need of fashionable new tights).

After browsing around for a bit, I chose a product I want to view and get directed to the landing page (click to enlarge).

Note the URL: /en/accessories/hosiery/knee-high-1pr-pk-3-stripes.html

Now let’s just start right from the main category and find the same product.

Note the URL: /en/accessories/knee-high-1pr-pk-3-stripes.html

What we have here is a clear cut example of Canonicalization (now say it really fast!). In a nutshell what happened here is there are multiple pages that have the same content but different URL’s. Even though the intention for the product page was to be 1 page for that specific product, it is now showing up in multiple locations. The reason for that is how you are categorizing your products. If you are uploading them manually, or using an inventory management system to import new products, they are being dropped into specific categories, and the way the navigation works on the front end, the URL’s and breadcrumbs following the path the user takes to the page and creating multiple pages with the same content.

How does Magento Handle Canonicalization?

Google gives a nice overview of dealing with canonicalization in the Webmasters blog, but Magento is prepared for such dilemmas. Simply go to to your System –> Configuration and scroll to the Catalog –> Search Engine Optimization menu.

In this menu, the options for enabling canonical URL’s are

1. Use Canonical Link Meta Tag For Categories
2. Use Canonical Link Meta Tag For Products

Select Yes for Both, Save Config.

We can even take it one step further by eliminating the URL Path’s from the product URL by switching the setting on “Use Categories Path for Product URLs” to No.

Using Moz, the change in duplicate content will be drastic.

Which URL will Search Engines Use?

The answer is simple, whichever they deem the most relevant to the search query. How? Ask the experts behind search engine algorithms. It is fair to assume that if a URL string contains question marks, squiggles, equals signs and others on site search related sloppiness, search engines will take the clean URL path to display in it’s search results.

Conclusion

Unfortunately we are not done yet. In another post I will talk about handling duplicate page titles which can also be very confusion to search crawlers. The settings above for URL path’s is entirely up to you, if I were to make a recommendation, it comes down to the products you are selling. Knee high socks are accessories, but the sub category hosiery identifies these products as the kind of accessories they are. A final, complete product like a whole car seat can probably do without. Either way ensure that your back end settings are enabled and your product pages have unique content (product videos, descriptions, reviews) to separate from other products that may be very similar.