Tuesday, November 5, 2024

Craigslist Stomps Out Spiders

A tweak to the craigslist.org robots.txt file now bans all spidering, including that done by Google or Yahoo, from taking place.

The popular online classifieds site Craigslist has apparently taken a harder line against having its content used by other sites. A thread on Search Engine Roundtable’s boards showed a copy of the robots.txt file for craigslist.org, where by default all user-agents have been restricted from delving into parts of the site.

Craigslist founder Craig Newmark hasn’t listed a reason as to why this was done on his personal blog. One poster called ‘gemini’ on the SER board suggested this could be to fight a particular type of abuse:

I can guess – people started using Craiglist like PRWeb – getting free links form an authority site. This won’t stop the real customers, but surely scare away link hunters.
With Craigslist being well-known and covered in the press internationally, they may not need the traffic search engines can bring. This isn’t the first time the site has banned outsiders from indexing its content. A newer classifieds site, Oodle.com, was told by Craigslist to quit pulling its listings into Oodle.

UPDATE! – It turns out that Craigslist isn’t blocking crawlers, spiders, search engines, etc. The robots.txt file is only blocking spiders from the sectional headers of Craigslist, and makes the site more effectively crawlable by the engines.


Email the author here.

Add to document.write(“Del.icio.us”) | DiggThis | Yahoo My Web

David Utter is a staff writer for Murdok covering technology and business.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles