Friday, September 20, 2024

Yahoo Search Subtly Nudges Webmasters

One of biggest thorns in the side of all search engines is non-relevant search results. Whether the cause of these types of results are spammers or webmasters who aren’t conscientious enough to check their work, poor SERPs can damage the reputation of search engine almost irrevocably (who, exactly, would want to use an engine that gives irrelevant results?).

The big four have taken various steps to address this, which usually consists of various webmaster guidelines and FAQs. However, Yahoo Search is trying a different approach. On the Yahoo Search blog, it was revealed they now parse wildcard characters (*, $) included in robot.txt files. The request for Yahoo to do this was made at an SES conference (who says search engines don’t listen to their audience?).

The blog entry reveals the details of the wildcard allowances:

‘*’ – matches a sequence of characters

You can now use ‘*’ in robots directives for Yahoo! Slurp to wildcard match a sequence of characters in your URL. You can use this symbol in any part of the URL string you provide in the robots directive. For example,

User-Agent: Yahoo! Slurp
Allow: /public*/
Disallow: /*_print*.html
Disallow: /*?sessionid

While the $ informs Slurp to:

anchor the match to the end of the URL string. Without this symbol, Yahoo! Slurp would match all URLs against the directives, treating the directives as a prefix. For example:

User-Agent: Yahoo! Slurp
Disallow: /*.gif$
Allow: /*?$

The first command tells Slurp to disallow all files that end with the .gif extension. The second command informs Slurp to allow files ending with “?” to be included in the crawl.

If you noticed, in the * wildcard explanation, Priyank Garg used Session IDs as an example and this is what led to my opening statement about irrelevant search results. Obviously, Yahoo wants to rid their index of as many erroneous URLs as possible and accepting the * wildcard in robot.txt files helps in this process, provided webmasters are willing to implement these allowances.

In fact, I asked Yahoo senior PR manager Shelia Tran if this was their intention when they introduced this change. She replied:

We try our best to crawl and find the best unique content on webmaster’s sites. However, any help that webmasters can give in terms of hints and directives to prevent the crawler from crawling duplicate pages or crawler traps is welcome.

In other words, Yahoo wants webmasters to take an active role in preventing Slurp traps. Putting the appropriate commands in your site’s robot.txt file will also help them keep their SERPs relatively clean, which is something that benefits all search engine users.

Chris Richardson
Staff Writer | Murdok Blog

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles