Friday, October 18, 2024

Quoth The Googlebot, “304!”

The latest news from Google Webmaster Central disclosed how the search engine’s spider saves a webmaster from unnecessary bandwidth costs.

Vanessa Fox’s discussion of more details about the Googlebot and how it retrieves, or does not retrieve, a page from a web server.

If a particular page causes the server to give Googlebot a ‘Not Modified 304’ response to an ‘If-Modified-Since’ query, the page will not be downloaded.

“This reduces the bandwidth consumed on your web server,” Fox wrote.

Presently, Google’s cache: operator shows the date when a page was last retrieved by Google.

They have now changed this information to assist webmasters:

This meant that even if we visited a page very recently, the cache date might be quite a bit older if the page hadn’t changed since the previous visit. This made it difficult for webmasters to use the cache date we display to determine Googlebot’s most recent visit. Consider the following example:

•  Googlebot crawls a page on April 12, 2006.

•  Our cached version of that page notes that “This is G o o g l e’s cache of http://www.example.com/ as retrieved on April 12, 2006 20:02:06 GMT.”

•  Periodically, Googlebot checks to see if that page has changed, and each time, receives a Not-Modified response. For instance, on August 27, 2006, Googlebot checks the page, receives a Not-Modified response, and therefore, doesn’t download the contents of the page.

•  On August 28, 2006, our cached version of the page still shows the April 12, 2006 date — the date we last downloaded the page’s contents, even though Googlebot last visited the day before.

We’ve recently changed the date we show for the cached page to reflect when Googlebot last accessed it (whether the page had changed or not). This should make it easier for you to determine the most recent date Googlebot visited the page.
As Google reindexes pages, the change from last retrieved to last accessed date will be made in the Google cache.

This will take some time to accomplish considering the billions of pages Google maintains in its Bigtable index.


Tag:

Add to Del.icio.us | Digg | Yahoo! My Web | Furl

Bookmark murdok:

David Utter is a staff writer for murdok covering technology and business.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles

Google san jose.