Thursday, September 19, 2024

Google Indexes Document’s First 101k

How big are your web pages? If you’re creating especially long, text-heavy pages you might consider breaking up your site into smaller pieces. Into 101k size pieces to be exact, according to GoogleGuy.

Mark Carey reported that GoogleGuy said, “we’ll typically index the first 101K of a web page — in practice, more content of a page can be indexed (e.g. PDFs), but if you keep your main content under 100K or so, that’s the safest.

Remember that Google’s not indexing your images (well, they are, but not in the same index as their web pages), so a page that’s over 101k is enormous.

If your pages run over 101k without images you should find a way to break them up some. There’s a good chance they’re hard for your site visitors to navigate anyhow.

If you absolutely have to have more than 101k on a page, make sure the indexibles are above the 101k line.

Mark Carey widened the 101k discussion, and brought up a question that some of you may be able to answer: does Googlebot continue crawling a page’s links after the 101k mark?

    Suppose a page is 150K in size consisting of mostly links. Will Google simply stop crawling the page after 101K, thus not following the links at the bottom of the page? Or, does Google index only the first 101K, but continue to follow the remainder of the links on the page?

    I have read claims on both sides of the debate, but never tried to test it myself. The answer can have a significant impact on large sitemaps. Nobody cares if the entire 200K sitemap in indexed, but we certainly care that all of the links are crawled.

Mark also said that, “the 101K limit has been known for some time.”

Garrett French is the editor of murdok’s eBusiness channel. You can talk to him directly at WebProWorld, the eBusiness Community Forum.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles

City cape coral. There are a number of factors and features to consider when evaluating a new help desk software.