Home / SEO Office Hours / Crawling / Page 7

Crawling

Before a page can be indexed (and therefore appear within search results), it must first be crawled by search engine crawlers like Googlebot. There are many things to consider in order to get pages crawled and ensure they are adhering to the correct guidelines. These are covered within our SEO Office Hours notes, as well as further research and recommendations.

For more SEO knowledge on crawling and to optimize your site’s crawlability, check out Lumar’s additional resources:

Google Only Needs to Crawl Facet Pages That Include Otherwise Unlinked Products

For eCommerce sites, if Google can access and crawl all of your products through the main category page then it won’t need to crawl any of the facets. However, facets should be made crawlable if they contain products that aren’t linked to from anywhere else on the site.

16 Apr 2019

Use Crawlers Like Lumar to Understand Which Pages Can be Crawled

John recommends using crawlers like Lumar and Screaming Frog to understand which product pages Google can crawl on an e-commerce site.

22 Mar 2019

Personalization is Fine For Google But US Version Will be Indexed

It is fine to personalize content to your users, but it is important to be aware that Googlebot crawls from the US and will index the content it crawls from the US version of the page. John recommends having a sizeable amount of content that is consistent across all versions of the page if possible.

22 Mar 2019

Google Caches CSS & JS Files so Doesn’t Need to Continuously Fetch Them

Google caches things like CSS files so that it doesn’t have to fetch them again in the future. Combining multiple CSS files into one can help Googlebot with this process, as can minifying JavaScript.

8 Feb 2019

A Sitemap File Won’t Replace Normal Crawling

A sitemap will help Google crawl a website but it won’t replace normal crawling, such as URL discovery from internal linking. Sitemaps are more useful for letting Google know about changes to the pages within them.

5 Feb 2019

Pages Blocking US Access Also Need to Block Googlebot to Avoid Cloaking

If you need to block content from being accessed in the US or California, then you would need to block Googlebot as well, otherwise Google will see this as cloaking. One option might be to provide some general information that can be seen by visitors in the US.

11 Jan 2019

Block Staging Sites From Being Crawled by Google

You should block Google from indexing your staging site as it can cause problems. You can block access based on Googlebot’s user agent, or using robots.txt.

21 Dec 2018

Crawling But Not indexing Pages is Normal for Pages with Content Already on Other Indexed Pages

It’s normal for Google to crawl URLs, but not index them if they are not considered useful for search, such as index and archive pages which have content already indexed on other pages. This has been the case for a long time, but these pages have become more visible recently due to the ‘Crawled – currently not indexed’ report in Search Console.

30 Nov 2018

Blocking Proxy IP Addresses is Fine for Google

Choosing to block proxy IP addresses from crawling or accessing a website won’t cause any problems for SEO as long as Googlebot can crawl the site, but you may lose out on additional users discovering your website.

27 Nov 2018

Pages With Long Download Times Reduce Googlebot’s Crawl Budget

If a page takes a long time to download then this will use up crawl budget for Googlebot, meaning it will have less time to crawl other pages on your site. Look at the ‘time spent downloading a page’ report in GSC to spot these issues.

16 Nov 2018

Back 7/19 Next