Question: Will search engines crawl/index pages in subWebs? An example of this is http://www.companyname.com/subWeb/pagename.htm.
Answer: The quick-and-dirty answer to this question is yes. All search engines will crawl subdirectories (what the reader is referring to as “subWebs”) on a site as long as the links to the subdirectories have a navigation scheme and URL structure the search engines can follow.
Directory structure
Ideally, especially for a smaller site, the directory structure should be flat, with a single or no subdirectories for the actual Web pages. For a larger site, it is reasonable to have two or three subdirectories. From a search engine standpoint, a flat directory structure is best.
The exceptions are images, scripts, CGI-BIN, and style sheets. These should all be placed in subdirectories.
The URL structure communicates to both the search engines and your visitors the pages that you believe are the most important pages on your site. In other words, if you think a page is important, your URL will put be a “top-level” URL, with no subdirectories. A top-level Web page will generally have the following URL structure:
http://www.companyname.com/pagename.htm
A URL with a single subdirectory (i.e. secondary level) will generally have the following URL structure:
http://www.companyname.com/subWeb1/pagename.htm
Where:
- Companyname.com is the domain name.
- SubWeb1 is the name of subdirectory 1.
- Pagename.htm is the name of the Web page.
A URL with two subdirectories will generally have the following URL structure:
http://www.companyname.com/subWeb1/subWeb2/pagename.htm
Where:
- Companyname.com is the domain name.
- SubWeb1 is the name of subdirectory 1.
- SubWeb2 is the name of subdirectory 2.
- Pagename.htm is the name of the Web page.
And so on and so forth.
As a general rule, the search engines will crawl at least 3 subdirectory levels as long as your site has a navigation scheme and URL structure the spiders can follow. What is more important than the number of subdirectories, however, is whether or not other sites link to the content in your subdirectories. Therefore, your site has great content in subdirectory #4 and a lot of other sites link to that content, then the search engines will crawl it.
Search engine marketing gimmicks
Here’s a trick that many search engine marketers like to use that, I believe, is rather ineffective. Because search engine marketers know that search engines will crawl multiple subdirectories, they will purposely create a single subdirectory with a hyphenated keyword phrase to ensure that the search engines see this targeted keyword phrase.
For example, a company that sells organic teas might have the following URL and directory structure utilizing this strategy:
http://www.tranquiliteasorganic.com/Oolong-tea/Oolong.html
Where:
- tranquiliteasorganic.com is the domain name.
- Oolong-tea is the name of subdirectory 1 with the keyword phrase “oolong tea” separated by a hyphen
- Oolong.html is the name of the Web page.
Which URL structure would I recommend? Either:
http://www.tranquiliteasorganic.com/Oolong-tea/Oolong.html
or
http://www.tranquiliteasorganic.com/Oolong.html
In my opinion, I would not modify a subdirectory structure purely for search engine positioning because using keywords in the URL or domain name is either: (a) not important at all, or (b) very miniscule.
So my answer? It depends on the site. If there are numerous types of organic oolong teas, and if the site dedicates a considerable number of pages with unique, high-quality content on oolong tea on them, then I would recommend creating a subdirectory. Likewise, I would expect the site to have subdirectories on ALL of the teas they offer, purely for consistency and usability reasons.
However, since I have a very difficult time believing there is a tremendous amount of unique, high-quality content about oolong teas, I doubt that that an additional subdirectory is necessary.
Using the Robots Exclusion Protocol
On a database-driven site, it is quite common to put similar (or the same) content in different subdirectories because it enhances the user experience.
For example, let’s use the fictional tea site again. Suppose the site contains enough unique, high-quality content to have subdirectories for each type of tea. So the URL structures for oolong tea, green tea, and tea accessories can look like the following, respectively:
http://www.tranquiliteasorganic.com/Oolong-tea/Oolong.html
http://www.tranquiliteasorganic.com/Green-tea/Green.html
http://www.tranquiliteasorganic.com/Tea-accessories/accessories.html
If oolong tea and green tea are available as loose tea, then it’s logical for a page about a tea infuser (that’s the mesh thing that holds the loose tea in the hot water) to be in Oolong-tea and Green-tea subdirectories as well as the Tea-accessories subdirectory. From a usability and user-experience perspective, this is a good strategy.
However, from a search engine perspective, this content would be considered redundant. One reason search engines don’t “like” many database-driven sites is that the same content is often delivered over and over again.
So if the tea infuser page is available on all three subdirectories, will the search engines consider that redundant content and possible penalize the site for delivering redundant content? In all likelihood, the search engines will display the page that contains the most links to it and not display the other pages.
At the same time, there are plenty of unethical search engine marketers who will take this strategy to the extreme and produce a lot of redundant content for the same information. So there is always the possibility of a spam penalty.
To be 100 percent safe, I would place a robots.txt file (Robots Exclusion Protocol) on the redundant content. I would carefully analyze site statistics data to see which subdirectory that is the most commonly used. That would be the subdirectory I would NOT place the robots.txt file on.
In this situation, using the robots.txt file solves two problems. First, it communicates to the search engines that you are not deliberately delivering redundant content. Second, the user experience is not compromised because the relevant content is still available on the appropriate subdirectories.
Conclusion
Generally speaking, search engines do not have any problems crawling sites with subdirectories. If you find that dividing your site into subdirectories benefits the user experience, then, by all means, create them.
But don’t create subdirectories purely for search engine visibility. There are far more effective strategies that are less time consuming and that will deliver your site better ROI (return on investment).
This reader question raises issues that is hotly debated in the search engine marketing industry – when should a site use subdirectories, subdomains, or mini-sites? Should Web site owners create URLs utilizing targeted keyword phrases? Should subdirectory names contain keyword phrases? That’s a whole other article.
Shari Thurow is Marketing Director at Grantastic Designs, Inc., a full-service search engine marketing, web and graphic design firm. This article is excerpted from her book, Search Engine Visibility (http://www.searchenginesbook.com) published in January 2003 by New Riders Publishing Co. Shari can be reached at shari@grantasticdesigns.com.
Shari Thurow Answers SEO Questions: Click Here For Free Answers