SY: Hi Gideon, Thanks a lot for sparing some time with us. Can you please tell us a bit about GoogleAlert? How did the concept develop?
GG: I originally had the idea for Google Alert in January 2003. I was regularly using Google to see what people on the web were saying about my Macintosh shareware (www.sigsoftware.com). But when looking at a page of Google results, I could never remember which ones I’d seen before and which were new, and I wasted a lot of time revisiting pages. So I thought it would be great if there was a service that automatically tracked Google searches for new results.
I remembered having heard about the Google Web APIs when they were first launched. Once I realized that there was a legitimate way to automate Google querying I got excited about the idea and developed Googlert (as it was then called). Everything grew from there – the service’s popularity really took off and I took on a partner for the project. We garnered a lot of positive press coverage and feedback and kept building the service, implementing new features such as the online browser, advanced searches and RSS feeds.
Throughout this process, Google have been very helpful. They quickly provided us with a high-capacity Web APIs key so that each Google Alert user did not need to provide their own. Google also encouraged us to commercialize the service, granting us permission to provide the paid-for advanced Google Alert services we now offer.
SY: Why did you choose Google as the only search engine to make ‘alert’ of? Why not a number of search engines or a completely different one?
GG: First and foremost, Google is by far the most popular search engine and provides the most relevant and up-to-date results for our users. As far as I’m aware, it’s also the only general search engine with an API which provides a documented and legitimate route for building automated value-added services. While it would be technically easy to incorporate other search engines into our alerting system, we aren’t seeing demand from our users to do so, since the results we obtain from Google are so good.
SY: I’ve been using GoogleAlert for almost a year now, and to be honest, I’ve never missed out on a single change in my targeted keywords.
What is your current focus? Now that you’ve got the reliability of several users, how do you plan to improve on it?
GG: To achieve this degree of maturity, we had to develop a number of algorithms to ensure that the results we send in the alerts are relevant. Google Alert does a lot more than just examine the URL to determine whether a result has been seen before. For example, it reliably handles content that moves around or appears in multiple places on a site.
But there’s still work to be done. We’ve begun developing more complex textual analysis methods to further optimize the relevance of the results we report. Two technologies we recently released – SightPoint and FreshSearch – provide a taste of things to come. SightPoint rates the relevance of new results according to what the user has clicked on in the past. It uses Bayesian statistics, made popular by spam email filters, to learn the user’s personal interests. FreshSearch automatically filters out pages which newly appear on Google but are more than a month old by analyzing content on the reported page. We expect to continue honing and adding to these features in the future to make the service even more useful.
SY: I’ve seen that you guys have launched another service by the name of “CopyScape”. Many content publishers (including myself) find it extremely useful as to detecting plagiarism and making sure our content remains just like I said, ours.
This is a very unique and useful prototype. How did this concept develop?
GG: The idea for Copyscape began when several website owners told us they discovered instances of web plagiarism using Google Alert. In these cases, the text that had been plagiarized from their site still contained the site’s name, which they happened to be tracking with Google Alert. Google Alert emails have even been cited as evidence in court.
But a simple Google search for a site or company name is not enough – in most cases, plagiarized text excludes all references to the original source. So we decided to create a new service aimed specifically at detecting plagiarism – Copyscape is a prototype of the technology. At just one week old, it seems to have received a very warm reception from the web community, though I’ve no doubt that those engaging in plagiarism will feel a little differently!
SY: Ok, CopyScape and GoogleAlert are two extremely useful services you guys have made. Are we expecting anything else?
GG: We’re working on a number of different projects using the Google Web APIs, and will publicly release those which have commercial potential like Google Alert and Copyscape. By providing a programmatic interface to their comprehensive web index, Google has made it economically viable to develop innovative applications that require a view of the entire web.
SY: Thanks a lot Gideon for sparing some time with us, I wish you and your team the best of luck for the future and sure hope you guys go far!
GG: Thank you. It has been a pleasure.
Sid Yadav is the CEO of HoverScore, Inc. (www.hoverscore.com) — a web development firm founded in mid-2004. Being an expert in the search engine field, Sid has been practicing SEO consultation since the late 90s. Prior to his consultation career, Sid co-founded FeedPlex (www.feedplex.com) in April 2004 and currently operates as the CEO and President of the company.
Along with his corporate achievements, he is the all-time editor of Daily Rundown (www.dailyrundown.com) and is currently authoring a book by the name of “Dollar Dreamz” which is set to release at the end of 2004.