Home / SEO Office Hours / Crawling / Page 9

Crawling

Before a page can be indexed (and therefore appear within search results), it must first be crawled by search engine crawlers like Googlebot. There are many things to consider in order to get pages crawled and ensure they are adhering to the correct guidelines. These are covered within our SEO Office Hours notes, as well as further research and recommendations.

For more SEO knowledge on crawling and to optimize your site’s crawlability, check out Lumar’s additional resources:

Use a Crawling Tool to Assess & Compare a Site Before & After a Migration

Before launching a site migration, John recommends using a crawling tool to get a full picture of your site’s status and signals (such as internal linking, canonicals, etc.) both before and after the migration to compare.

10 Jul 2018

Videos Blocking Googlebot May Still be Crawled and Indexed

Blocking Googlebot from crawling a video may still result in a video snippet appearing in search if the video file is embedded from a different location, if some Google datacentres haven’t yet seen the updated version or if the video URL has parameters attached.

12 Jun 2018

Internal Linking Causes Google to Crawl Canonicalised Pages

Check your internal linking if you see Google crawling canonicalised pages.

29 May 2018

It’s Normal to See Fluctuations in GSC ‘Time Spent Downloading a Page’ Report

Seeing fluctuations in the ‘Time Spent Downloading a Page’ report in GSC is perfectly normal, as Googlebot sometimes discovers new areas of a site to crawl and can decide to crawl more URLs.

29 May 2018

Googlebot Doesn’t Replay Cookies

If you provide cookies to Googlebot it won’t use them again when it returns to crawl your site, so bear this in mind when using cookies to group users for A/B testing and make sure Googlebot is always put in the same group.

29 May 2018

Use ‘Validate’ Option In The New Search Console to Get Pages Recrawled

You can get Google to crawl your pages again if you go to the indexing report in the new Search Console and request the issues to be validated, where Google will recrawl these pages to check they’ve been fixed.

15 May 2018

Google Will Crawl Sitemaps That Have Been Removed from GSC

It’s not enough to remove an old sitemap file from GSC to prevent it from being crawled, you need to remove it from the server to prevent Google from finding and crawling it. John recommends fixing the sitemap file if possible though.

15 May 2018

Ensure All Product Pages Can be Crawled With Considered Use of Noindex

eCommerce sites with facets should be careful which pages are noindexed because this may make it difficult for Googlebot to crawl individual product pages e.g. noindexing all category pages. Webmasters might consider noindexing specific facets or deciding that everything after a certain number of pages in a paginated set be noindexed.

13 Apr 2018

Small to Medium-Sized Sites Don’t Have to Worry About Crawl Budget

Sites with ‘a couple hundred thousand pages’ or fewer don’t need to worry about crawl budget, Google will be able to crawl them just fine.

6 Apr 2018

Google Will Remember & Recrawl Noindexed Pages

Noindexed pages will be remembered and crawled by Google. They should be removed from sitemaps.

6 Apr 2018

Back 9/19 Next

Crawling

Use a Crawling Tool to Assess & Compare a Site Before & After a Migration

Videos Blocking Googlebot May Still be Crawled and Indexed

Internal Linking Causes Google to Crawl Canonicalised Pages

It’s Normal to See Fluctuations in GSC ‘Time Spent Downloading a Page’ Report

Googlebot Doesn’t Replay Cookies

Use ‘Validate’ Option In The New Search Console to Get Pages Recrawled

Google Will Crawl Sitemaps That Have Been Removed from GSC

Ensure All Product Pages Can be Crawled With Considered Use of Noindex

Small to Medium-Sized Sites Don’t Have to Worry About Crawl Budget

Google Will Remember & Recrawl Noindexed Pages

Get the best digital marketing & SEO insights, straight to your inbox