As technologically advanced as search engines are, there are a few areas where they still don’t quite measure up. It’s well known that search engines don’t like duplicate content on web sites, but did you know you could unknowingly be serving them duplicate content? It’s quite common, really.
The problem occurs when search engines spider your site or find it through links that use differing URLs to reach the same page. For example, the following URLs can all represent your home page:
- www.yoursite.com
- www.yoursite.com/index.html
- yoursite.com
- yoursite.com/index.html
But to the search engine, each of these URLs can represent a different page in your site, each duplicating the same content. This can possibly (though unlikely) trigger serious duplicate content penalties. But the more likely result is that your search engine rankings will suffer, at least initially, until the search engines get around to determining that these are the same page and therefore treat them as such.
The effect on rankings largely stems from your internal linking structure as well as external inbound links. If multiple sites each link to the various “versions” of your home page (though it’s really the same page) search engines will view those links as leading to different pages. So if you have ten site’s linking to you, you may not get the full benefit of those ten links because they may have been divided between pages. This is why your rankings could suffer.
Google is pretty good about figuring out that these pages are not duplicates, therefore combining the link attribution to your home page. But the other search engines are not and, even with Google, it may take a while. Until that happens, your rankings may remain understated in the search results.
The easiest way to determine if the search engines are finding “multiple” home pages is to run a backward link check. Put in each of the variations and record your numbers. Here is what I found for Pole Position Marketing:
www.polepositionmarketing.com
Google: 1,030
MSN: 10,219
Yahoo: 15,800
www.polepositionmarketing.com/index.php
Google: 1,030
MSN: 3
Yahoo: 51
polepositionmarketing.com
Google: 0
MSN: 10,283
Yahoo: 2
polepositionmarketing.com/index.php
Google: 0
MSN: 3
Yahoo: 0
You’ll notice that Google records the same number of links for the first two URLs recorded. This is because Google has already determined that these are the same. Take out the ‘www.’ and you’ll see that Google records zero links. I forced this to happen intentionally. I’ll explain that in a bit.
Let’s look at MSN. It appears that MSN records the www. and non-www. version of the home page as almost the same, but separately from the index.php versions. Only a few links are pointed to the index.php page and I’m able to click in from the link check results page to find out why. It looks like MSN found one other website linking to that page and then some internal site pages link to my index.php. I thought I had eliminated all links to my index.php page but it looks like I missed a few. I’ll fix them now and explain what I did in a bit as well.
Yahoo appears to be the worse offender in treating home pages as duplicates. Every variation results in a different number, though it’s comforting that the non www. .index.php shows a zero as expected. As for the others, these are linking issues that I can resolve on my end and then just wait for the search engine to catch up.
Implementing a Solution
On most servers, the solution to the www. versus non www. issue can be resolved by adding a piece of code to your .htaccess file. It’s best to consult with your web host before editing this file unless you know what you’re doing. Making incorrect changes here can cause your site to crash. Here is what we added:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^polepositionmarketing.com
RewriteRule (.*) https://www.polepositionmarketing.com/$1 [R=301,L]
What this does is restrict the search engine, and the user’s access to the www. version of my site. In fact, this not only takes care of the home page but all other pages on the site as well. Give it a try; try to access polepositionmarketing.com/about-us.php. See what happens? You’re automatically redirected to www.polepositionmarketing.com/about-us.php. The search engines are redirected too. In fact, this tells them that the non www. version doesn’t even exist, which is what we want, and what causes the backlinks to report as zero.
While this doesn’t prevent people from linking to yoursite.com in favor of www.yoursite.com, it does let the search engine know that they really meant to link to the “correct” version: www.yoursite.com.
As for the index.html (or in my case index.php) page, this largely can be fixed by how you manage your own internal linking. The solution is to make sure that all of your internal site links to your home page simply point to http://www.site.com as opposed to http://www.site.com/index.htm. When using WYSIWYG (what you see is what you get) HTML editors such as DreamWeaver or FrontPage, its quite common to attach your links to the .html page. You’ll have to consciously remember not to do that. If you use absolute links you need the full URL as shown above. If you use relative links then to link to your home page you want to us “/” rather than index.html. That will do the trick.
The effect of this is to 1) not provide an index.html version of the page that is spiderable by the search engines, and 2) not provide that option for other sites to use in their link. The page www.yoursite.com/index.html will still be accessible on the server, but if you have no links to that page, users will never find it. In fact, they won’t know whether your home page is index.html, index.php or homepage.html. That really leaves them only one option for the link with is www.yoursite.com.
Implementing these settings provides a means to avoid duplication issues or waiting for search engines to resolve the variations, while also allowing you to capture the full link value of every link to your home page. It’s a small amount of work for a big payoff in the long run.
2 Responses to Prevent Home Page Link Value Leakage