Retaining search data helps Google improve its search results, despite privacy concerns that such retention could end up in the hands of the government via subpoena.
Google tweaks its algorithm to improve the answers it delivers in response to search queries. Though webmasters obsess about PageRank, over 200 “signals” determine how relevant the algorithm thinks a particular website is in search.
The search ad company’s chief economist, Hal Varian, returned to the official Google blog to talk about data. Varian said Google tweaks the algorithm on a weekly basis to improve the results that people receive.
He then launched into a long justification for Google to keep all the data they receive:
But in order to come up with new ranking techniques and evaluate if users find them useful, we have to store and analyze search logs. (Watch our videos to see exactly what data we store in our logs.)
What results do people click on? How does their behavior change when we change aspects of our algorithm? Using data in the logs, we can compare how well we’re doing now at finding useful information for you to how we did a year ago. If we don’t keep a history, we have no good way to evaluate our progress and make improvements.
The timing of Varian’s post places it close to the time when the European Commission decides whether or not Google’s DoubleClick acquisition should be permitted. In America, though the Federal Trade Commission already approved the deal, the FTC also recommended limited retention of customer data with regards to online behavioral targeting.
FTC’s unspoken message of “respect privacy or we’ll make you do it” holds out the possibility of federal regulation in response to privacy concerns. Like any business, Google chafes at the prospect of extra laws hindering its use of the data it receives. Varian’s post may be an argument against such additional oversight.