Saturday, December 14, 2024

Did Google Unleash Additional Googlebots?

Share

Apparently, Google has begun using another spider in their scanning and indexing of web sites. News of a second Googlebot was discovered by a number of site owners who, while studying their site logs, noticed two Google spiders; with different IP address ranges; visited and scanned their respective sites.

Googlebots are coming! The Googlebots Are Coming!!
Have you had visits from more than one Googlebot? Discuss at WebProWorld.

News of the additional Googlebot was first noticed on the DigitalPoint forums, posted by digitalpoint himself. In his post, digital noticed that two Googlebots had visited his site and that each one had different IP addresses:

“The normal one:

66.249.64.47 – – [15/Sep/2004:18:59:12 -0700] “GET /robots.txt HTTP/1.0” 404 1227 “-” “Googlebot/2.1 (+http://www.google.com/bot.html)”

and also this one:

66.249.66.129 – – [15/Sep/2004:18:12:51 -0700] “GET / HTTP/1.1” 200 38358 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”

Aside from the slightly different user agent, it’s also HTTP 1.1. The IP address it uses is an IP block is normally just used for Mediapartners (AdSense spider), but it’s spidering a site without any AdSense.”

Once this thread was launched, scores of other posters shared their encounters with the second Googlebot. A DigitalPoint member named Redleg also noticed several visits from the new spider and also recorded the IP ranges of the new visitors, “Don’t remember the exact IP addresses (about 15-20 of them) but here’s the IP ranges: 66.249.78.* 66.249.64.* 66.249.79.*”

Many who checked their server logs noticed a number of visits from both Googlebots, with various IP ranges. Not only were there numerous visits, but also each bot performed a different kind of crawl than its “partner”. Over at the WebmasterWorld forums, a poster named Gomer noticed that one bot performed a complete site crawl while the other did more of a surface-type crawl. According to Gomer:

“The 66.249.64.X series was requesting pages that were fully indexed i.e., they have a page title and description. The 66.249.65.X series was requesting pages that were only partially indexed In my case, the 66.249.65.X were pages that exist on my server but I am trying to get Googlebot to stop indexing.”

As the realization of an additional Googlebot set in, speculation began concerning the motive of having two bots performing site scans. Because Google likes to keep the business concerning their search index, spiders, and anything having to with their search engine under tight wraps, educated guesses are all that can be done.

Brett Tabke posted an interesting thought concerning Google’s extensive crawling, “looks like “panic” based spidering as if an index needs to be rebuilt from the ground up in a short time period (aka: the old index didn’t work).” Another member believed these scans are apart of the PR re-calculation for the next PageRank update. Another poster, idoc, had also had an intriguing look at Google actions:

“I expect a lot of cloaking and redirect sites will be dropped soon from these new bot IPs and this crawl. It’s what I had in mind in the post about hijacks when I said I think Google is on it. They have been asking for file paths and filenames with extensions I have never used before. I am hopeful anyway.”

Longtime WMW poster claus suggested that these events might be because Google is preparing a new datacenter, while others thought the index may contain a glitch. However, Liane, in agreeing with Brett about these deep crawls being out of the ordinary. She stated, “Something must be causing this feeding frenzy and it wouldn’t surprise me if there was a glitch with the index. Google went nuts every day this past week on my site, but in the last 24 hours only one hit. Never had that before. Not that I can remember anyway I smell a “major” update in the offing… once they get things sorted.”

As it stands, the reasons behind Google’s scanning efforts are unknown. The only things that are certain have to di with using more than one crawler and that at least one of them performs a complete site scan. Is Google repopulating their index, or are they hunting out cloaked/doorway pages? Or are they finally getting around to doing another PR update? Like so many others have said, time will tell.

Chris Richardson is a search engine writer and editor for Murdok. Visit Murdok for the latest search news.

Table of contents

Read more

Local News