Last month I posted some of my thoughts and theories on duplicate content where I explained the different types of duplicate content that the search engines find. I wanted to expand a bit on the in-site duplicate content that we often see with various websites. I’ll take these one at a time over the course of the next few days or weeks, depending on how often I post.
www. vs. no www.
Real quick, go to your browser and type in yoursite.com. Does the URL in the browsers address bar change to a) http://yoursite.com or b) http://www.yoursite.com?
Now type in www.yoursite.com. Does the URL in your browser change to a) http://www.yoursite.com or b) http://yoursite.com?
In both of those instances, if you answered A then you have potential duplication issues. Here is an example of one of my articles on Gooruze.com which shows the potential duplication:
Take away the www. from the URL and lo-and behold you see the exact same article:
You can see how this can become a problem, with virtually every article having its www. or non-www. twin.
The various versions are accessed depending on how each person typed in the website in the address bar to begin with (or the link they followed.) Did they type in the www. or not? You may have, I may have not. If I then bookmark the site or provide a link to it from another site, and you do the same, we’re both sending the search engines to two different pages (URLs) both of which have the same content. If the search engines spider starting from either of those links, then literally hundreds of articles will be indexed, half of which are pure duplicates.
Fortunately, this issue isn’t as bad as a lot of duplicate content issues because most of the search engines have gotten pretty good at figuring out that those pages are the same, after a bit of time. In most cases the search engines will equate the two versions, with or without the www., as being the same page(s). But it doesn’t happen right away. In fact it can take several months or perhaps more, depending on the site, for the engines to tie the two together. While some are content to wait it out, the real danger is that you are potentially handicapping your link flow and incoming link juice while you wait.
The less you have to make the search engines think the better. Even if you’re confident that the search engines have already made the connection between the www. and non-www.versions being one and the same, you never know what might change that in the future. The best strategy then is to be proactive in “fixing” this kind of duplication.
If you’re running on an Apache server then the solution is relatively simple. Simply add this bit of code to your .htaccess file in the root directory of your server:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^site.com
RewriteRule (.*) http://www.site.com/$1 [R=301,L]
Don’t ask me to explain it, all I know is that it works! If your site is on any other kind of server, then you’ll have to contact your web host for a fix. The .htaccess file is pretty finicky so be sure to back it up before making any changes. Once you get the updated version uploaded, give it a shot. If you type in site.com the address should redirect to http://www.site.com. Now do the same thing but with an inner page of your site. Type in site.com/page and you should be redirected to http://www.site.com/page. There you go. All set.
2 Responses to Duplicate Content Issues: www. vs. no www.