Crawl Budget
A crawl budget is allocated to every site and determines how many pages and resources can be crawled by search engines. Our SEO Office Hours Notes below cover recommendations for the optimization of crawl budget, and provide insights from Google about how crawl budget is controlled.
For more on this topic, see our Guide to Crawl Budget.
Google Crawl Budget is Limited to a Server
Google limits the crawl rate for sites on the same server so that it doesn’t overload the server when crawling these sites.
Google Shopping Crawling Counts Towards a Crawl Budget
Google Shopping verification crawling Comes from the overall Crawl Budget
Parameter Handling Improves Crawl Efficiency
Parameter handling prevents URLs being crawled so is better than canonical tags for crawl efficiency.
Increase Crawl Budget by Increasing Server Capacity
You can help to increase your crawl budget by making sure it doesn’t return server errors when Google crawls.
Orphaned Pages may be Noindexed
Orphaned pages will be re-crawled, taking up some crawl budget, and usually remain indexed, and can still show up in search results but if there are no internal links then they will be considered unimportant, and they might eventually fall out of the index.
Prevent Excessive Crawling on Filters, Sort Orders and Pagination with Nofollow
Add nofollow to filtered, sorted and paginated results pages to prevent excessive crawling.
Redirect Chains Slow Crawling
Redirect chains cause latency which can slow down crawling, particularly if there are more than 5 steps which will be rescheduled to be crawled later.
Crawl Rate is Based on Pages Google Wants to Update
Crawl rate is somewhere between minimum list of pages Google wants to update, and the maximum number of pages they think it’s safe to crawl without impacting performance. Any new pages discovered can be crawled provided there is some remaining budget, but might get queued up for the next day.
Google Queues Large Volumes of New URLs
If Google discovers a part of your site with a large number of new URLs, it may queue the URLs, generate a Search Console error, but continue to crawl the queued URLs over an extended period.
Small Sites Don’t Need to Worry About Crawl Budget
If you have a ‘reasonably’ sized site (under several thousand pages), you don’t need to worry about crawl budget.