Home / SEO Office Hours / Crawling / Page 18

Crawling

Before a page can be indexed (and therefore appear within search results), it must first be crawled by search engine crawlers like Googlebot. There are many things to consider in order to get pages crawled and ensure they are adhering to the correct guidelines. These are covered within our SEO Office Hours notes, as well as further research and recommendations.

For more SEO knowledge on crawling and to optimize your site’s crawlability, check out Lumar’s additional resources:

404 Pages Crawled Less Than Noindex

For expired/removed content, John says that Google prefer a 404 as it results in less crawling than a noindex.

27 Oct 2015

Fetch & Render Shows Results for a Googlebot and Browser User Agent

The Fetch and Render tool shows you 2 different renders, one for Googlebot which used the Googlebot user agent, and one for users which used a browser user agent. If JS/CSS is disallowed for Googlebot, it may not be able to render all the content in the same way.

27 Oct 2015

Clean HTML and Structured Data Helps Google Understand Content

Clean HTML and structured markup help Google better understand context

11 Sep 2015

URLs in JavaScript May Be Crawled

JavaScript variables which look like URLs may be crawled, which can generate server errors. But you can ignore them, or block with robots.txt

19 May 2015

HTML Crawling Faster Than JavaScript for Page Discovery

JavaScript processing takes longer than pure HTML crawling, so isn’t suitable for fast discovery of pages. John says ‘it takes another cycle or two longer to process’.

24 Apr 2015

Image Re-Crawling Takes Longer After a URL Change

Images are not crawled very frequently, so when you migrate them to new URLs/domains, it will take a lot longer than pages, perhaps months.

27 Mar 2015

Wildcard Subdomain Configuration Causes Crawl Issues

Using wildcard subdomains can make a site difficult to crawl.

23 Dec 2014

CSS and JS Crawling Is Important for Mobile Compatability

Allowing your CSS and JavaScript files to be crawlable does affect desktop pages, but is more important for mobile pages as they need to test for mobile compatibility.

16 Dec 2014

Noindex Pages Can’t Accumulate PageRank

Noindex pages can’t accumulate pagerank for the site, even though the pages can be crawled. So this isn’t an advantage over disallowing.

7 Nov 2014

Disallowed URLs Don’t Pass PageRank

If a URL is disallowed in robots.txt, it won’t be crawled, and therefore can’t pass any pagerank.

12 Sep 2014

Back 18/19 Next