Crawling
Before a page can be indexed (and therefore appear within search results), it must first be crawled by search engine crawlers like Googlebot. There are many things to consider in order to get pages crawled and ensure they are adhering to the correct guidelines. These are covered within our SEO Office Hours notes, as well as further research and recommendations.
For more SEO knowledge on crawling and to optimize your site’s crawlability, check out Lumar’s additional resources:
URL Removal Tool Hides Pages But Doesn’t Impact Crawling or Indexing
The URL Removal Tool only hides a page from the search results. Nothing is changed with regards to the crawling and indexing of that page.
Using International IP Redirects Will Prevent Google From Finding Other Versions of A Site
If you are redirecting based on international IP addresses, Google is likely to only see the redirect to the English version and would drop all of the other versions.
External Links Are More Critical for Initial Content Discovery & Crawling
External links are useful for helping Google find and crawl new websites, but they become less important to Google once it has already discovered the site in question.
Images Implemented Via Lazy Loading Can be Used Like Any Other Image on a Page
Images implemented via the lazy load script can be added to structured data and sitemaps without any issues, as long as they are embedded in a way that Googlebot is able to pick up.
Google Doesn’t Need To Be Able To Crawl The Add to Cart Pages of A Site
It is not essential for Google to crawl the add to cart pages on e-commerce sites, so this shouldn’t affect a site’s performance in search for purchase intent queries.
Googlebot Does Crawl From a Handful of Regional IPs
Googlebot does crawl from a small number of regional IPs, particularly in countries where they know it is hard to crawl from the US.
An Updated User Agent is Expected to Reflect The New Modern Rendering Infrastructure
Google has been experimenting with the current user agent settings and is re-thinking the set u. John expects some changes to be announced in the future around an updated user agent so that it reflects the new modern rendering infrastructure.
The Site Diversity Update Won’t Affect How Subdomains Are Crawled
The new change that was launched to show more diversity of sites in the search results won’t impact the way subdomains are currently crawled and processed, it will only impact how they are shown in the search results.
News Site Shown in Forum Snippets Can Reformat Comment Section or Block Comments From Crawling
If a news site is being shown in forum snippets and this is problematic for you, either reformat the comments sections in a way that demotes the importance of this content or block comments from being crawlable by Google.
Google Has an Upper Limit of Around 5,000 Internal Links Per Page For Crawling
Sites don’t normally exceed Google’s upper crawl limit for links on a page as it is quite high at around 5,000 links per page. However, John recommends only having necessary internal links so PageRank isn’t diluted too much.