Home / SEO Office Hours / Crawl Budget / Page 2

Crawl Budget

A crawl budget is allocated to every site and determines how many pages and resources can be crawled by search engines. Our SEO Office Hours Notes below cover recommendations for the optimization of crawl budget, and provide insights from Google about how crawl budget is controlled.

For more on this topic, see our Guide to Crawl Budget.

Use Log Files to Identify Crawl Budget Wastage & Issues With URL Structure

When auditing eCommerce sites, John recommends first looking at what URLs are crawled by Googlebot. Then identify crawl budget wastage and perhaps change the site’s URL structure to stop Googlebot crawling unwanted URLs with parameters, filters etc.

13 Jul 2018

Small to Medium-Sized Sites Don’t Have to Worry About Crawl Budget

Sites with ‘a couple hundred thousand pages’ or fewer don’t need to worry about crawl budget, Google will be able to crawl them just fine.

6 Apr 2018

4xx Errors Don’t Mean Your Crawl Budget is Being Wasted

Seeing Googlebot crawling old 404/410 pages doesn’t mean your crawl budget is being wasted. Google will revisit these when there is nothing else on the site to be crawled, which is a sign of capacity to crawl more.

20 Feb 2018

Google AdsBot Crawling Doesn’t Impact Crawl Budget For Organic Search

If Google AdsBot is crawling millions of ad pages then this won’t eat into your crawl budget for organic search. John recommends checking for tagged URLs in any ad campaigns to reduce ad crawling.

9 Jan 2018

A Noindex Reduces Crawl Rate

A page with a noindex tag will be crawled less frequently.

6 Oct 2017

URL Duplication Issue on Larger Sites as Google More Likely to Miss New Content

URL duplication is more of an issue on larger sites, as opposed to small or medium sites. This is because Google is less likely to be able to crawl the whole site and new content is more likely to be missed if there is a lot of duplication.

25 Aug 2017

Adding Noindex To Pages Further Down In Paginated Series Is Fine

It is ok to noindex further down in a paginated series. This can be noindexing all pages after the first couple or first hundred, it is up to the webmaster. However, crawl budget will not be impacted as Googlebot still crawls these noindex pages.

11 Aug 2017

Use Nofollow to Stop Googlebot Crawling Too Far

Google recommends using internal nofollow links to stop Googlebot crawling too far in one direction e.g. endless calendar links, faceted navigation and pagination.

7 Jul 2017

Google Establishes URL Patterns on Larger Sites to Focus Crawling

Google tries to establish URL patterns to focus on crawling important pages and choose which ones to ignore when crawling larger sites. This is done on a per-site basis and they don’t have platform-specific rules because they can be customised with different behaviour.

30 May 2017

Redirect Chains Impact Usability and Crawling Efficiency

Google won’t penalise websites for having redirect chains but they can become a usability issue, particularly if they move between hostnames which can take longer. Google only crawls 5 redirects at a time, and will continue crawling a chain later on, but this many redirects should be avoided for any important URLs.