Saturday, December 14, 2024

Google Co-Op Topics – Annotating Web Content

Share

In May of 2006, among other announcements, Google announced Google Co-op.

This article is a follow-on article to a previous article, “Google Co-Op Overview”, which provided a high-level overview of Google Co-op. This article will go into one of the components of Google Co-Op, Topics, in more detail than was covered in the previous article.

Google Co-Op is important to users for several reasons. Google Co-Op allows users to contribute information that will help Google to improve search results for everyone. In addition, Google Co-Op allows an end-user to customize their own search experience so that information that is more relevant and trusted will appear at the top of the user’s search results. Users accomplish this by subscribing to “trusted” sources of information. Information from those trusted sources will appear at the top of a user’s search results for relevant searches.

Google Co-Op is a beta-test service now being offered by Google. Anyone with a Google account may participate. While still in its infancy, Google Co-Op represents Google’s efforts to embrace social web and social search concepts in a major way to help improve Google search results. Google Co-op consists of two things:

1. Topics, which are simply a means of labeling web content 2. Subscribed links, which are a means for users to subscribe to a particular web site’s content

Topics can further be sub-divided into two things:

1. The ability to create an entire categorization or labeling scheme 2. The ability to simply provide labels for web content, which Google calls annotations

The remainder of this article will focus on the annotations aspect of Google Topics.

Annotations to URLs

Annotating URLs is perhaps the easiest part of Google Co-Op to understand. It also requires the least amount of technical expertise to implement. A “topic” is simply Google’s way of saying “area of interest”. Topics are a labeling or categorization scheme. Topics allow users a way of providing labels (which may also be referred to as tags, or categories) for information on the web (represented by URLs). Labels may be provided for an entire web site, portions of a web site, or even a specific web page. These “labels” provide some indication of the topic or topics for a given web site or page. In essence, they provide additional information on what the web site is all about.

Anyone with a Google account can label web sites. Google refers to the process of providing labels for web sites as “Annotating URLs”. An annotation is simply the association of a label, or multiple labels, with a URL. For example, a travel site might get the label “destination_guide”.

Users may use labels for topics that Google already has under development, which include: health, destination guides, autos, computer & video games, photo & video equipment, and stereo & home theater. Users may also develop their own labels for topics. For example, if a user has an interest in “wine” they may develop labels for the topic wine, which may include “wine_regions”, “wine_types”, etc. They can then use these labels to annotate sites that deal with wine.

An end user may submit their annotations to Google in one of two formats: 1) in a tab-delimited format (which can be created using Microsoft Excel or any spreadsheet); or 2) in an XML file. Perhaps the easiest format for most users to deal with is simply to create a spreadsheet where the first column contains a URL or URL pattern, and the subsequent columns contain labels, one label to a column. Further information that may be associated with a URL in subsequent columns includes:

  • Score – a ranking of relevance from 0 to 1 (0 to 100%)
  • Comment
  • Attributes – user defined attributes which may only be included in the tab-delimited file format

Annotation Examples

URLLabelLabelLabelscorecommenthttp://www.travelsite.com/* sightseeingmuseumsshopping1Detailed destination information

If I were using an XML file to annotate the same travel related web site it might look something like this:

<Annotations>
<file>travelsite-annotations.xml</file>

<Annotation>
<about>http://www.travelsite.com/*</about>

<Label>
<name>sightseeing</name>
<score>1</score>
<Comment>Detailed destination information</Comment>
</Label>

<Label>
<name>museums</name>
<score>1</score>
<Comment>Detailed destination information</Comment>
</Label>

<Label>
<name>shopping</name>
<score>1</score>
<Comment>Detailed destination information</Comment>
</Label>

</Annotation>

</Annotations>

Conventions for Labels

There are some simple conventions that should be followed when labeling content. First it is important to understand that labels may be applied to URLs or wildcard URLs. Using wild cards makes it much easier to label a lot of content with a few statements. For example:

  • Labels applied to www.mywebsite.com/ would only apply to that specific page of the web site
  • Labels applied to www.mywebsite.com/* would apply to all URLs that starts with the URL “www.mywebsite.com”
  • Labels applied www.mywebsite.com/*tips would apply to all URLs that start with the URL “www.mywebsite.com” and contain the word “tips”

A single URL may have multiple labels. If using a tab-delimited file, each label must appear in its own column.

Labels should be all lower case with all punctuation and conjunctions (and, or) removed. For example, “hardware and software” would become “hardware_software”.

Labels should be as short as possible and as unambiguous as possible. Watch out for words that can mean multiple things.

Additional Information

There are many good places to find additional information. The first is the Google Co-Op Site (http://www.google.com/coop) where they have posted a Topics Developers Guide. The Google Co-Op FAQ is also helpful. There is also a good article entitled “How to Use Google Co-op” at Google Blogoscoped (http://blog.outer-court.com/archive/2006-05-11-n40.html).

Why is Labeling Content Important?

The process of labeling content will benefit everyone in several ways. Labels will provide Google with a vast amount of information about web sites, potentially down to a very granular, or individual page level. If an individual’s annotations are found to improve the quality of the search results, they will be shown to everyone. In essence, over time, Google will use annotations and other aspects of Google Co-Op to improve search results.

Conclusion

Annotating URLs is a relatively low effort task for individuals that can reap benefits for everyone – better and more relevant search results. While still in its infancy, and going through the growing pains that are normal for services that are in beta test, Google Co-op clearly has a lot of promise to enable Google to provide much more powerful and relevant search results to users.

Rob Pirozzi is a contract writer for CityTownInfo.com. CityTownInfo is a quick reference web site that provides statistics and indexes on thousands of cities and towns across the US, as well as articles, comments from local residents, and more.

Table of contents

Read more

Local News