Home / SEO Office Hours / Crawling / Page 8

Crawling

Before a page can be indexed (and therefore appear within search results), it must first be crawled by search engine crawlers like Googlebot. There are many things to consider in order to get pages crawled and ensure they are adhering to the correct guidelines. These are covered within our SEO Office Hours notes, as well as further research and recommendations.

For more SEO knowledge on crawling and to optimize your site’s crawlability, check out Lumar’s additional resources:

Combine Separate CSS, JS & Tracking URLs to Increase Googlebot Requests to Your Server

To improve site speed and allow Googlebot to send requests to your server more frequently, reduce the number of external URLs that need to be loaded. For example, combine CSS files into one URL, or as few URLs as possible.

16 Nov 2018

Test How Search Engines can Crawl Internal Linking Using Crawling Tools

John recommends using tools like Lumar to test how your internal linking is set up and whether there are any technical issues which would prevent Googlebot from crawling certain pages on your website.

16 Nov 2018

Discovered – Currently not indexed’ GSC Report Pages Have No Value for Crawling & Indexing

Google knows about pages in the ‘Discovered – currently not indexed’ report in Google Search Console but hasn’t prioritised them for crawling and indexing. This is usually due to internal linking and content duplication issues.

16 Nov 2018

Update Last Modified Date in Sitemap & Use Validate Fix in GSC to Get Pages Crawled Sooner

If technical issues cause pages to show incorrectly (e.g. serving a blank page), you can get Googlebot to crawl these sooner by submitting sitemap files with the last modification date set to when the affected pages were restored. You can also click ‘validate fix’ on pages with errors in Search Console to get Googlebot to recrawl these pages faster.

5 Oct 2018

Googlebot is Limited to Crawling a Couple of Hundred MB for Each HTML Page

Most sites shouldn’t worry about the size of their pages being too much for Google to crawl, as John explained that the cut off size for each page’s HTML is a couple of hundred MB.

21 Sep 2018

Googlebot Doesn’t Use Sites’ Internal Search Features to Find Pages

Googlebot doesn’t know what to search for on a site, so doesn’t use a site’s internal search for content discovery. The rare exception to this will be if a site isn’t crawlable normally and pages can only be discovered through internal search.

21 Sep 2018

For Mobile-first, Ranking Fluctuations Are Caused by Google Recrawling and Reprocessing a Site

If a site experiences ranking fluctuations after being switched to mobile-first indexing, this is because Google will need to recrawl and reprocess the site to update the index.

21 Sep 2018

Block Ads From Being Crawled to Avoid Ranking For Unintended Queries

Ads which are inline with the main text of a page can be picked up by Google as part of the content of that page. This could cause the page to rank for queries related to the text in the ad. John recommends blocking the ads from passing pagerank and using JavaScript to block them by robots.txt.

18 Sep 2018

Google Indirectly Interprets Charts & Graphs to Understand Context

Google doesn’t interpret charts and graphs to see if the numbers or information is useful and correct. However, indirect signals are collected (like text on the page, titles, descriptions, alt text etc.) to understand the context of the page.

24 Aug 2018

Google Crawls Using Local IP Addresses For Countries Where They Are Frequently Blocked

Google will crawl with local IP addresses particularly for countries where US IP addresses are frequently blocked e.g. South Korea.

13 Jul 2018

Back 8/19 Next

Crawling

Combine Separate CSS, JS & Tracking URLs to Increase Googlebot Requests to Your Server

Test How Search Engines can Crawl Internal Linking Using Crawling Tools

Discovered – Currently not indexed’ GSC Report Pages Have No Value for Crawling & Indexing

Update Last Modified Date in Sitemap & Use Validate Fix in GSC to Get Pages Crawled Sooner

Googlebot is Limited to Crawling a Couple of Hundred MB for Each HTML Page

Googlebot Doesn’t Use Sites’ Internal Search Features to Find Pages

For Mobile-first, Ranking Fluctuations Are Caused by Google Recrawling and Reprocessing a Site

Block Ads From Being Crawled to Avoid Ranking For Unintended Queries

Google Indirectly Interprets Charts & Graphs to Understand Context

Google Crawls Using Local IP Addresses For Countries Where They Are Frequently Blocked

Get the best digital marketing & SEO insights, straight to your inbox