Home / SEO Office Hours / Crawling / Page 13

Crawling

Before a page can be indexed (and therefore appear within search results), it must first be crawled by search engine crawlers like Googlebot. There are many things to consider in order to get pages crawled and ensure they are adhering to the correct guidelines. These are covered within our SEO Office Hours notes, as well as further research and recommendations.

For more SEO knowledge on crawling and to optimize your site’s crawlability, check out Lumar’s additional resources:

Google Doesn’t Take Into Account Browser When Crawling

Google looks at how pages are rendered by Googlebot from a desktop and mobile view but does not take into account browser type.

22 Aug 2017

JS Content Indexed Depending If Visible to Googlebot On Page Load

JavaScript content that is known by Google when the page is loaded is indexable (e.g. tabbed content). If JavaScript is used to load content on the page, Googlebot would likely miss this content.

11 Aug 2017

Google Crawls Forms On Reference Sites to Discover Content

Google has ways of crawling forms on reference/document repository type sites as it’s important that they can crawl results which may include individual pages or documents. However, Google tends to avoid crawling through "calculator" forms to find content as expects that content following form will be found elsewhere on the site e.g. insurance rates forms.

2 Aug 2017

Google Attempts to Map Non-Unicode Fonts to Unicode Version

Google tries to recognise instances where non-unicode fonts are used and attempts to map them internally to a unicode version of that page so they can index one page. This is the case for the most popular font in Burma. However, Google struggles to deal with non-unicode fonts when entered in search.

2 Aug 2017

Place Links to Important Pages Higher in a Site Hierarchy

Make sure links to your most important pages are relatively high up in a site hierarchy so Googlebot and users can reach them quickly and so Google can pass Pagerank more quickly.

7 Jul 2017

Use Nofollow to Stop Googlebot Crawling Too Far

Google recommends using internal nofollow links to stop Googlebot crawling too far in one direction e.g. endless calendar links, faceted navigation and pagination.

7 Jul 2017

Google Can Take Time to See Pages with NoIndex Tag

Google can take time to drop pages with noindex tag from index, especially on larger sites or if the URL is blocked by robots.txt

7 Jul 2017

Structured Data Shouldn’t Differ Between HTML and JavaScript

Google first crawls and indexes raw html and then the rendered HTML. If structured data differs between two could be confusing for Google.

30 Jun 2017

Google Follows 5 Steps in a Redirect Chain at a Time

Googlebot will follow up to five redirects at a time, and crawl further redirect steps at a later date to find where redirect ends up, but Google recommends redirecting directly from the original URL to the final URL.

30 Jun 2017

Links Within Primary Content Provide More Context but Less Weight than Sitewide Links

Googlebot differentiates between boilerplate content in headers, sidebars and footers for indexing. Links within the primary content provide more context than sitewide links, but sitewide links to pass more weight.

30 Jun 2017

Back 13/19 Next

Crawling

Google Doesn’t Take Into Account Browser When Crawling

JS Content Indexed Depending If Visible to Googlebot On Page Load

Google Crawls Forms On Reference Sites to Discover Content

Google Attempts to Map Non-Unicode Fonts to Unicode Version

Place Links to Important Pages Higher in a Site Hierarchy

Use Nofollow to Stop Googlebot Crawling Too Far

Google Can Take Time to See Pages with NoIndex Tag

Structured Data Shouldn’t Differ Between HTML and JavaScript

Google Follows 5 Steps in a Redirect Chain at a Time

Links Within Primary Content Provide More Context but Less Weight than Sitewide Links

Get the best digital marketing & SEO insights, straight to your inbox