Crawl Budget
A crawl budget is allocated to every site and determines how many pages and resources can be crawled by search engines. Our SEO Office Hours Notes below cover recommendations for the optimization of crawl budget, and provide insights from Google about how crawl budget is controlled.
For more on this topic, see our Guide to Crawl Budget.
Use Log Files to Identify Crawl Budget Wastage & Issues With URL Structure
When auditing eCommerce sites, John recommends first looking at what URLs are crawled by Googlebot. Then identify crawl budget wastage and perhaps change the site’s URL structure to stop Googlebot crawling unwanted URLs with parameters, filters etc.
Small to Medium-Sized Sites Don’t Have to Worry About Crawl Budget
Sites with ‘a couple hundred thousand pages’ or fewer don’t need to worry about crawl budget, Google will be able to crawl them just fine.
4xx Errors Don’t Mean Your Crawl Budget is Being Wasted
Seeing Googlebot crawling old 404/410 pages doesn’t mean your crawl budget is being wasted. Google will revisit these when there is nothing else on the site to be crawled, which is a sign of capacity to crawl more.
Google AdsBot Crawling Doesn’t Impact Crawl Budget For Organic Search
If Google AdsBot is crawling millions of ad pages then this won’t eat into your crawl budget for organic search. John recommends checking for tagged URLs in any ad campaigns to reduce ad crawling.
A Noindex Reduces Crawl Rate
A page with a noindex tag will be crawled less frequently.
URL Duplication Issue on Larger Sites as Google More Likely to Miss New Content
URL duplication is more of an issue on larger sites, as opposed to small or medium sites. This is because Google is less likely to be able to crawl the whole site and new content is more likely to be missed if there is a lot of duplication.
Adding Noindex To Pages Further Down In Paginated Series Is Fine
It is ok to noindex further down in a paginated series. This can be noindexing all pages after the first couple or first hundred, it is up to the webmaster. However, crawl budget will not be impacted as Googlebot still crawls these noindex pages.
Use Nofollow to Stop Googlebot Crawling Too Far
Google recommends using internal nofollow links to stop Googlebot crawling too far in one direction e.g. endless calendar links, faceted navigation and pagination.
Google Establishes URL Patterns on Larger Sites to Focus Crawling
Google tries to establish URL patterns to focus on crawling important pages and choose which ones to ignore when crawling larger sites. This is done on a per-site basis and they don’t have platform-specific rules because they can be customised with different behaviour.
Redirect Chains Impact Usability and Crawling Efficiency
Google won’t penalise websites for having redirect chains but they can become a usability issue, particularly if they move between hostnames which can take longer. Google only crawls 5 redirects at a time, and will continue crawling a chain later on, but this many redirects should be avoided for any important URLs.