Crawling
Before a page can be indexed (and therefore appear within search results), it must first be crawled by search engine crawlers like Googlebot. There are many things to consider in order to get pages crawled and ensure they are adhering to the correct guidelines. These are covered within our SEO Office Hours notes, as well as further research and recommendations.
For more SEO knowledge on crawling and to optimize your site’s crawlability, check out Lumar’s additional resources:
Google Establishes URL Patterns on Larger Sites to Focus Crawling
Google tries to establish URL patterns to focus on crawling important pages and choose which ones to ignore when crawling larger sites. This is done on a per-site basis and they don’t have platform-specific rules because they can be customised with different behaviour.
URL Parameters Make it Easier for Search Engines to Crawl your Site
URL Parameters make it easier for Google to understand the URL structure, and identify which URLs to ignore. Having parameters in the url path can make it harder for Google to understand.
Quality Algorithms are Used to Influence Crawling and Indexing Speed
Quality algorithms are used to influence other algorithms such as those which control crawling and indexing speed.
Google Periodically Recrawl Pages with Crawl Errors
Google will sometimes retry pages which have previously thrown up crawl errors, even over a number of years, to make sure they are not missing anything new. If you see old URLs showing up as crawl errors, it’s not something you need to resolve.
Google Doesn’t Always Crawl Lazy-loaded Images
Google will not always crawl images which are implemented using lazy loading.
Google Crawl Budget is Limited to a Server
Google limits the crawl rate for sites on the same server so that it doesn’t overload the server when crawling these sites.
Add Images to Sitemap to Provide Google With More Information
Images can be added for each page of your site within your sitemap along with captions and alt-tags using special sitemap extensions.
Google Detects if a Site Completely Changes Content
If you buy an old domain and change the content, Google will treat it like a new website.
URL Parameters Help Crawling and Indexing
URL parameters in URLs make it easier for Google to understand URLs for crawling and indexing. If you put everything into the path of the URL it can be harder for Google to crawl them properly.
Googlebot Might Use Cookies When Required
Google tries to crawl in a stateless way, but on rare occassions Google might use a cookie if the content doesn’t work without it.