Tuesday, November 5, 2024

Fixing Duplicate Content Issues

Not all webmasters are aware that search engines view each URL as a different page, even it has the same content or is different from the other URL by ending with a slash or if it preceeded by a www or not. To serve only one version of the page to get the deserved traffic, you’ll need to remember a couple of things about fixing URL issues.

www and non-www

The simplest matter is having both versions of a domain, the one with www and the one without (smart developers even call this URL canonicalization issue, but I believe – or hope – there has to be an easier term for this). The search engines view them as two separate sites with identical content and regard links pointing to them accordingly. Though they each get their own share of traffic, you can get more traffic by having just one version, because the links would be pointing to one site, thus boosting its search engine rankings.

To redirect a site from the non-www version to the www version, you can insert the following code in your .htaccess file:

RewriteEngine On
RewriteCond %{HTTP_HOST} ^yoursite\.com [nc]
RewriteRule (.*) http://www.yoursite.com/$1 [R=301,L]

  • Replace yoursite.com with your domain
  • copy the code to your .htaccess file to your root (main) folder on your server
  • refresh your page (press the “Refresh” button, F5, Ctrl+R or anything that works for you)

It’ll only work for Unix platforms with Apache and mod_rewrite enabled. If it doesn’t work after refreshing the page, make sure you do have Unix hosting and contact your support to check if they have mod_rewrite enabled. If they don’t, I’d suggest switching hosting.

Read more about using 301 (permanent) redirects with .htaccess.

URLs with and without slashes

Though it is technically easy to understand that www and non-www sites are different, how about understanding that pages with and without slashes in the end of the URL are different, too, according to the search engines. That’s why you only need to have one version of those.

If you have a WordPress blog, then a simple Permalink Redirect plugin will help. I have recently installed it on my site and it works like a charm.

For other websites, you’ll need to check your Apache version with your hosting and go to the according section of the Apache manual. Here are the links:

To me, the code samples in the guide look the same. I also didn’t manage to make them work – probably due to conflicts with WordPress, or something. You can discuss the topic or ask questions about the trailing slash issue at Cre8asite Forums.

Having unique, accessible pages

Suppose you have pages with the same content on your article and you want to only have one page, but don’t want to lose the link weight or decrease the experience on your website. Then you need to 301 redirect (probably a PHP one) one page to another.

A 301 redirect (as well as .htaccess redirect) can be very helpful in solving duplicate content and broken links (if you have links pointing to a page that doesn’t exist) and is mostly the only cure to the problem.

The feeds

If you have a blog or just serve RSS feeds to your visitors, you should consider blocking them from the robots through the robots.txt file, because it’ll help filter out duplicate content from their indices (feeds have the same content as your main site, remember?).

A good start could be inserting the following code in the robots.txt file:

User-agent: *
Disallow: /feed/
Disallow: /feed/atom/
Disallow: /feed/rss/
Disallow: /wp-
Disallow: /#comment

Alternatively, you can try using the following code to block spiders from indexing feeds for all pages:

Disallow: /*/feed/$
Disallow: /*/feed/rss/$
Disallow: /*/trackback/$

Read more about duplicate content and feeds at The Van Blog – be sure to check the comments, too.

Note: the above code seems to work for GoogleBot only (or for other bots, supporting the * wildcard.

Let me know if you have a better solution for blocking duplicate content with robots.txt.

How to avoid duplicate content

When developing your website, you need to remember a couple of things to keep your content more unique (both to the people and the search engines):

  • it is better to have less pages, but of more quality, because it’ll interest more people, attract links to the same place and drive more visitors from the search engines
  • have clean URLs: no dynamic parameters, less folders (it doesn’t matter much, whether you have words in your URLs or not, though, but it helps – a tiny, weeny bitty bit)
  • always write unique content, don’t copy it from others: it doesn’t give you credibility, won’t drive visitors to your site, and without a credit link it is also a theft of copyrighted material
  • when you link to pages, pick one URL for a page, and use it to link to it wherever you can (for example, always link to domain.com/, instead of also linking to domain.com, domain.com/index.php)

Read more on how you can get rid of duplicate content on the Google’s Official Blog.

Keeping your site content unique, you will get more visitors from the search engines, because all your incoming link weight will be associated with a single page, thus making it more visible. Partly, correct indexing will help get your real content pages out of the supplemental index. That being said, don’t be obsessed with supplemental results. Focus on your customers.

Conclusion

By knowing how you can keep your site unique, you can easily increase your chances of getting more search engine visitors. Granted, this process is fairly easy and can be done within a day with the right education. So, you are still not looking at your site? Here’s what you should look at:

  • whether your site has feeds
  • whether your site has pages with same content, but different URLs
  • whether accessing your non-www version gets you to the www version (or visa versa), or not – if not, check the .htaccess code above
  • your .htaccess file (in the root folder)
  • your robots.txt file (in the root folder as well)

Enjoy, and may Google be with you.

Comments

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles