Crawl Errors
A crawl error occurs when a search engine crawler is unable to reach a page on a website — this will prevent the page from appearing within search results. These errors could be due to site-wide or individual URL errors and may arise for several reasons. Our SEO Office Hours Notes below cover how Google Search deals with crawl errors, along with best practice guidance from Google for dealing with crawl errors.
Canonicalizing Paginated Pages Back to Main Page Can Cause Crawling & Indexing Issues
Canonicalizing pages in a paginated set back to the first page can be problematic because Google may see that these are different pages and ignore the rel canonical. Alternatively, if Google does follow the rel canonical to the main page, this could mean links and content on other pages might be missed.
GSC New Performance Report & Old Crawl Errors Report Show Googlebot Data at Different Stages
The Performance report in the new Search Console shows the end of the pipeline of search analytics, taking into account errors that have been reprocessed. So this shows different data to the Crawl Errors report in the old Search Console which shows the unprocessed errors.
Google Doesn’t Cache Pages With Server Errors
Googlebot doesn’t cache pages which return a server error, so it wouldn’t be possible to see what Googlebot saw in the GSC errors report.
Crawl Errors May Take Months to Clear
Broken URLs which are being shown as Crawl Errors may not be crawled frequently and may take many months to be re-crawled before they are dropped. Unless they are important pages which should be indexed, they can be ignored.
GSC Crawl Error Priority Ranked According to User Relevancy
Crawl error priority ranks errors based on how relevant they are for users e.g. URLs that users are more likely to find on the site.
Temporary Performance Issues Generate Crawl Errors for Working Pages
Sometimes crawl errors are reported by Google for pages which appear to be working when tested later and may be difficult to reproduce. You can check the server logs to see if this was a temporary issue with server performance and can be avoided in the future.
Search Console Crawl Errors May Take a Year to Update
It takes some time to report crawl errors in Search Console, and detect when they have updated, potentially up to a year for some URLs.
Google Periodically Recrawl Pages with Crawl Errors
Google will sometimes retry pages which have previously thrown up crawl errors, even over a number of years, to make sure they are not missing anything new. If you see old URLs showing up as crawl errors, it’s not something you need to resolve.
Google Supports URLs up to 2000 characters
Google supports URLs up to 2000 characters.
Broken Schema Markup will be Ignored
If Google can’t recognise markup due to errors then it won’t be used, but markup isn’t used for rankings.