Keeping track of your web server stats is an important responsibility for webmasters and SEOers alike. They can provide a lot of valuable marketing information as well as the knowledge concerning which search engines crawlers have hit your site and which ones haven’t. One thing they don’t show is whether or not the inspecting bot is legitimate or not.
Malicious programmers have been known to try and disguise the true intentions of their bots, often making them appear friendly. Because of this, the Live.com (MSN Search) blog posted a guide to help webmasters identify which visitors are actual MSN Search bots. The first part of the post reveals the bot names used MSN, as well as what each is responsible for:
MSNBot – Main web crawler
MSNBot-Media – Images & all other media
MSNBot-NewsBlogs – News and blogs
MSNBot-Products – Products & shopping
MSNBot-Academic Academic search
Not only does the blog entry supply the MSN bot name, they also provide an additional method of identification using the IP address and a reverse DNS lookup. Once the unknown bot is identified, you can tell whether or not MSN is responsible for it. The blog entry reveals more:
Once you have the host name (in this case, livebot-207-46-98-149.search.live.com), you can check that it really is coming from Live Search. The name of all live search crawlers will end with ‘search.live.com’. If the name doesnât end with âsearch.live.comâ, you know itâs not really our crawler.
Finally, you need to verify that the name is accurate. In order to do this, you can use Forward DNS to see the IP address associated with the host name. This should match the IP address you used in Step 2 â if it doesnât, it means the name was fake.
Once you compile a list of the fake bots hammering your site, you can use your site’s robot.txt file to try and keep them out.