With all the back and forth going on about search index sizes, it would be nice to be able to test the claims made by the various engines. Well, it appears as if there may be a way to verify if an engine’s purported index size is accurate.
In a nifty little method shown by Danny Sullivan at the SEW blog, it appears there is a type of search query that, if done correctly, should provide a fairly accurate index count. In order to initiate the test, users must first conduct a search that returns no results. For demonstration purposes, Danny used a query consisting of the following letters: djfdkjkfjkdjdfk.
When a Google search of the djfdkjkfjkdjdfk letter string is done, neither Google, Yahoo nor MSN return any results. Now, in order to test the size of a search index, essentially, you have to perform a search that returns the exact opposite in terms of results. Because the djfdkjkfjkdjdfk query returns no results (meaning the index had no documents related to the query), a search asking for the exact opposite (find pages that have nothing to do with the djfdkjkfjkdjdfk query) should show every page in a search index.
As Danny indicates, the way to initiate such a search is to conduct another djfdkjkfjkdjdfk query, but this time, the letters need to be preceded by a minus sign. This tells the engine (in this case, Google and MSN) to show all pages that have nothing to do with the following letters. Because the first query returned zero results, the minus query should do the exact opposite; and in fact, it does, but only in MSN and Google. According to Danny, this technique did not work in Yahoo or Ask.
However, because it works with MSN and Google, queries like these can give users what seems like an accurate read on each search engine’s index. For Google, the negative search returns 9,570,000,000 pages, a hefty amount more than the 8,168,684,336 Google claims on their homepage. As for MSN, the negative query returns a 5,304,186,736 count, which goes hand-in-hand with the 5 billion reported by MSN.
It’s unfortunate that this method of index size testing won’t work on Yahoo. It’d be nice to see how accurate their 20 billion plus count is. Danny also mentions some issues with this method of testing in his post:
Of course, even if all the search engines make this technique work, it doesn’t necessarily mean we’ve got apples-to-apples comparisons. What depths are the pages indexed to? How well are duplicates removed? Are these pages actually indexed or just links to pages you know about? Those are just some of the issues.
Chris Richardson is a search engine writer and editor for Murdok. Visit Murdok for the latest search news.