Products come in all kinds of shapes, sizes, colors, various configurations. In many ecommerce systems, each variation counts as a unique product with it’s own SKU, which means a lot of unique URLs for essentially a single product. This can make for a lot of pages of duplicate content.
Oh noes!
As an SEO (or developer,) eliminating duplicate content can be one of the biggest challenges you face. Sure, we can do the band-aid solutions (see below) but if you’re anything like me, you are looking to cure the problem, not treat the symptom.
5,000 Pages For One Product ?!?
Here is an example of a website that has 52 different styles, 19 colors, and 5 sizes all for the same “t-rex hates” shirt design.
Assuming that each of these variations has unique SKU (and they do on this site), let’s do the math. You’re looking at 4,940 unique URLs! All for what is essentially the same product in multiple variations.
We can trim that number back a bit because, in some cases, the different styles do (or should) constitute a unique product. How do we determine when this should be the case? Think about the shopper. Is there a strong chance they will search for a “t-rex hates jersey” versus “t-rex hates golf shirt?” That’s quite possible. Therefore, we want those searchers to land on a page that already has that style prominent. This site already, smartly, does that.
But we still have the color/sizing issue. Searchers are far less likely to search using those specific criteria. And even if they do, the one landing page/URL will suffice. But this site produces a unique URL for every variable. It may not be 5,000 URLs, but it’s still a lot.
An OK Solution
This site in particular got around the unique URL issue by using the hashtag for all unimportant variables:
- www.site.com/trex-hates_tshirt?productId=1321218904#color=navy/white&size=medium
- www.site.com/trex-hates_tshirt?productId=1321218904#color=green&size=x-large
- www.site.com/trex-hates_tshirt?productId=1321218904#color=red&size=3x-large tall
What this does is causes the search engines to discount anything after the #. Everything before that is the indexible URL, which is the same across all variables:
- www.site.com/trex-hates_tshirt?productId=1321218904
A Better Solution
This is a decent solution, but not necessarily the best one. Preferably you can handle all the variable selections without using URL parameters. There is no reason for the URL to change even if a separate SKU is required. The database should track which variables are selected and send the appropriate SKU through with the order.
Here is a site that does it this way. Notice the change in SKU based on the selection of the shirt size.
The URL stays the same, but the system is tracking the selections being made by the visitor and passes that along once the order is placed.
Not all systems work this way or may require extensive reprogramming to make it happen. You have to decide if this fix is something you can do and the costs associated.
Alternative Solutions
Should you need alternative (and hopefully temporary) solutions, here are two:
Alternative #1: Implement the canonical tag. The canonical tag on each of these size/color variations would consistently point to the “primary” URL. That’s the solution this site uses, they have their canonical tag pointing to:
- www.site.com/trex-hates_tshirt
The only time this changes is when you click on a different shirt style, which I already noted makes sense because you do want those as unique landing pages.
Alternative #2: Use Google Search Console to ignore all the parameters on the end of the URLs. What are the parameters? Everything from the “?” and beyond. The URLs above have three parameters: Product ID, Color, and Size. You can tell Google search console to ignore each or all of these.
By doing this, you’re basically telling Google to only consider what should be the canonical URL and nothing more. The downside to this solution is that it is specific to Google. You’ll then have to do the same thing with Bing.
You also need to be very careful that you don’t ignore parameters that are necessary. For example, if the system also uses a parameter to determine shirt style, you want to make sure you are not excluding that in Search Console. Otherwise, you’ll wipe out all of these great landing pages from being found in search.
Also, if product ID is important, you don’t want to tell search engines to ignore that, either. Looking at the URLs above, I would have thought that the product ID was an essential parameter, but it’s not. The same product shows with or without it, which means you do want it excluded from consideration.
Always be careful when messing with search engines and parameter indexing. You have to know what you’re doing or you can really screw up your ability to get key pages indexed.