Deepcrawl is now Lumar. Read more.
DeepcrawlはLumarになりました。 詳細はこちら

Crawling

Before a page can be indexed (and therefore appear within search results), it must first be crawled by search engine crawlers like Googlebot. There are many things to consider in order to get pages crawled and ensure they are adhering to the correct guidelines. These are covered within our SEO Office Hours notes, as well as further research and recommendations.

For more SEO knowledge on crawling and to optimize your site’s crawlability, check out Lumar’s additional resources:

Use Mobile Friendly Test to Check if Googlebot Can Access Page

Use the mobile-friendly test as an easy check to see if Googlebot can access a page. This will fetch the page with a Googlebot user agent and show you a screenshot of what was found.

20 Oct 2017

Show Paywalled Content to Googlebot Based on User Agent & IP Lookup

It’s OK to show Googlebot paywall pages with class names and schema markup based on user agent. You can also combine that with an IP lookup to recognise when Googlebot is looking at a page as opposed to another crawler.

20 Oct 2017

Google Mainly Uses GET Request For Normal Crawling & Indexing

Google pretty much only uses GET requests for normal crawling and indexing. However, that doesn’t mean you’ll never see POST and HEAD requests in your server logs, but probably they’re a lot rarer.

17 Oct 2017

Google Does Some Scrolling on Pages

Google does some scrolling on a page to make that there is nothing that would otherwise be missed.

17 Oct 2017

Blocking US IPs Likely to Block Googlebot. Having at Least Some US Accessible Content is Recommended

Other than the US, Google only crawls from a handful of other countries. If you block US IP addresses you’re probably blocking Googlebot, but you can test this with Fetch & Render or by checking log files. John recommends having at least some content accessible from the US, so that Googlebot and US users can go to your site.

3 Oct 2017

Include Shared Content Block For Pages That Vary Dependent on Location

If content served varies dependent on location then John recommends having a shared content block across all variations as Google primarily crawls from IP addresses geo-located to San Francisco.

22 Sep 2017

Google May Implement HTTP/2 Crawling as Sites Start Adopting Functionality

Google doesn’t crawl with HTTP/2, as it isn’t like a browser and wouldn’t see the same speed effects, but they would be able to cache things differently. Google engineers may decide to implement HTTP/2 for Googlebot as more sites start adopting HTTP/2 functionality, like Push.

19 Sep 2017

Panda is Continuous But Doesn’t Run On Crawl

Panda does run continuously, and not to a timetable, but it does take a bit of time to collect relevant signals. John assumes that you would see the effects as Google reprocesses the bulk of a website, the frequency of which varies from site to site.

19 Sep 2017

Only Change URLs When Absolutely Necessary as Can Cause Drop in SERPs

John recommends against removing old fashioned URL suffixes, like .html, as Google will treat these as new URLs and will recrawl and reindex them having to learn a new structure. This will lead to a significant dip in SERPs for a period of time until the URLs have been recrawled and reindexed.

25 Aug 2017

For A/B Testing Show Googlebot Version Most Users Will See

When A/B testing, Google recommends showing Googlebot the version that most users are seeing. If doing 50/50 testing, it is up to webmasters which version to show to Googlebot but Google recommend against randomly varying the displayed version as it will make it difficult for Google to index the page.

22 Aug 2017

Back 12/19 Next