Home / SEO Office Hours / Crawling / Page 17

Crawling

Before a page can be indexed (and therefore appear within search results), it must first be crawled by search engine crawlers like Googlebot. There are many things to consider in order to get pages crawled and ensure they are adhering to the correct guidelines. These are covered within our SEO Office Hours notes, as well as further research and recommendations.

For more SEO knowledge on crawling and to optimize your site’s crawlability, check out Lumar’s additional resources:

URLs in JavaScript May be Crawled

Google won’t see any content which is loaded via an onclick event. But they will find URLs inside JavaScript code itself and try to crawl them. It has to be loaded onto the page be default without an onclick in order for Google to see it.

17 May 2016

Google Identifies Boilerplate Content

John discusses how Google tries to understand the structure of pages to understand the standard boiler-plate elements of a page.

8 Mar 2016

Hidden Content get Less Weight

Google tries to detect any content which isn’t visible when rendered and give it less weight than content which is visible.

8 Mar 2016

Google Queues Large Volumes of New URLs

If Google discovers a part of your site with a large number of new URLs, it may queue the URLs, generate a Search Console error, but continue to crawl the queued URLs over an extended period.

8 Mar 2016

Important URLs are Crawled Before Unimportant URLs

Google doesn’t start crawling unimportant URLs until it thinks it has crawled the important pages.

8 Mar 2016

Google Ignores Content on 404 Pages But Recrawls Them

Google ignores all content on pages which return a 404 status, but will continue to crawl them periodically.

8 Mar 2016

HTML sitemaps help indexing and crawling

If you have a complicated website, providing a mapping of your category pages can help Google to find pages and understand the structure of a website.

26 Feb 2016

Duplicate Content Makes Large Sites Harder to Crawl

For large websites, duplicate content makes it harder to crawl.

6 Feb 2016

Boilerplate Content Makes it Harder to Find Relevant Content

If your navigation is very large, it can add a lot of text to the page which might make it harder for Google to identify the parts of the page which are relevant. Google is trying to identify boilerplate elements which it can ignore, but the harder this is, the more likely that genuine content might not get classified as relevant.

2 Dec 2015

Googlebot Doesn’t Support HTTP2

Googlebot doesn’t support HTTP2 only crawling, so your website still needs to work for HTTP.

1 Dec 2015

Back 17/19 Next

Crawling

URLs in JavaScript May be Crawled

Google Identifies Boilerplate Content

Hidden Content get Less Weight

Google Queues Large Volumes of New URLs

Important URLs are Crawled Before Unimportant URLs

Google Ignores Content on 404 Pages But Recrawls Them

HTML sitemaps help indexing and crawling

Duplicate Content Makes Large Sites Harder to Crawl

Boilerplate Content Makes it Harder to Find Relevant Content

Googlebot Doesn’t Support HTTP2

Get the best digital marketing & SEO insights, straight to your inbox