Most of us understand how damaging duplicate content can be to a successful search marketing campaign and because of that, most search marketers do what they can to avoid these penalties, which can be pretty severe.
There have been mistakes based on session ids and other dynamic content issues, but for the most part, duplicate content is quite easy to stay away from, that is until someone figured out a hole in the MSN Live Search algorithm. A post appearing at an online marketing blog with a peculiar name – BoogyBonBon.com – revealed an algorithm anomaly that could wreak some havoc with MSN’s search index… not to mention the webmasters and site owners who have fallen victim to this exploit.
Essentially, the hole works by manipulating a site’s URL with the _GET URL function. By adding a parameter after the “?” (?test=dupecontent) and then saving these additional URLs into an .html file that will be crawled by Live’s search bot, you can make it look like a competitor’s site has duplicate content issues.
There is an additional step to take before you can drop this depth charge which involves an HTTP Status Codes Checker and the 200 OK response code.
These steps are, under normal circumstances, should be seen as merely additional URLs to the same page and not duplicate content. However, there is an apparent issue with MSN’s anti-spam algorithm, which the post elaborates on:
Once MSN has managed to find the new URLâs it will start to index the siteâs content. Unfortunately for your competitor; MSNâs anti-spam algo is so bad that its does not have the brains to just not count the new URLâs because of dupe content, but instead just removes the page or entire website from itâs index.
Now I haven’t tested this theory, and I’m not about to. But if this indeed accurate, and judging by the responses at Threadwatch (who pointed this out) there appears to be some merit, the Live.com developers need to address it quickly before it gets severely exploited.
Until Live.com’s algo is refined to ignore these attempts, there is a developer-side fix available, but you have to have PHP pages in order for it to work. BoogyBonBon has more:
In all your headers of the pages that have been attacked you will need to add the following code as well as change the part with yourdomain.com to be the page all users should be at.
if($_GET) {
ignore_user_abort(true);
header (âPragma: no-cacheâ);
header(âCache-Control: no-store, no-cache, must-revalidateâ);
header(âHTTP/1.1 301 Moved Permanentlyâ);
header(âLocation: http://www.yourdomain.com/ â);
header(âConnection: closeâ);
exit;
}
?>