Crawling
Before a page can be indexed (and therefore appear within search results), it must first be crawled by search engine crawlers like Googlebot. There are many things to consider in order to get pages crawled and ensure they are adhering to the correct guidelines. These are covered within our SEO Office Hours notes, as well as further research and recommendations.
For more SEO knowledge on crawling and to optimize your site’s crawlability, check out Lumar’s additional resources:
Crawling Might be Blocked by IP Blacklisting, Network Configuration or Bot Protection
Google doesn’t block the crawling of a specific domain, but you could be on an IP which has been blacklisted or a network issue. Check if other sites hosted on the same server are having similar problems. Sometimes a bot protection on a server can cause issues.
Google Sometimes Makes Requsts with If-modified-since headers
Googlebot sometimes makes requests using an if-modificed-since request, in which case a 304 response is fine.
Hamburger’ Menus Don’t Affect Crawling
From a crawling perspective, ‘Hambuger’ style menus are OK
Googlebot Doesn’t See Content Changes Based on Session Information
Googlebot loads every page without any session information, so if you show content such as titles based on session information like an HTML 5 history, it won’t be seen.
Mobile Interstitial Penalty is Calculated on Recrawl
The mobile interstitial penalty is calculated in real time as pages are crawled.
Whitelisting Googlebot for First-click-free Isn’t Cloaking
With First Click-Free, you can whitelist the Googlebot user agent to see all articles, and it won’t be considered cloaking
Google Doesn’t Crawl with a Referrer or Cookies
Googlebot doesn’t include a referrer URL when crawling, and doesn’t use cookies.
Use Nofollow on Links to Noindex Pages to Reduce Crawling
You can add a nofollow on links to noindex pages to reduce the liklehood of them being crawled.
404 Content Isn’t Seen by Google
Google doesn’t look at the content of a 404 page
Don’t Prevent Embedded File caching
If you prevent JS, CSS and image caching, such as a nocache header tag, Google will need to keep requesting the files for rendering, which may slow down crawling of the site.