Indexing
In order for web pages to be included within search results, they must be in Google’s index. Search engine indexing is a complex topic and is dependent on a number of different factors. Our SEO Office Hours Notes on indexing cover a range of best practices and compile indexability advice Google has released in their Office Hours sessions to help ensure your website’s important pages are indexed by search engines.
Mixed Migrations May Cause Google to Index HTTP or HTTPS URLs
Forgetting to update your sitemap files following a HTTPS migration could cause some pages to be indexed with the HTTP URL and some HTTPS.
Privacy Policy and Terms of Service Pages Should be Indexable
Privacy policy and terms of service are normal content that people might want to find in search, so they should be indexable.
Geo-targeting Doesn’t Restrict Pages to a Specific Country
Geo-targeting in Search Console indicates to Google that a page is more relevant for a specific country, and it may rank higher for local search queries, not that it will be removed for other countries.
Old Pages Can Still Rank If the Content Is Useful
Old sites can still be useful and rank in search even if they haven’t been updated in years, as long as they are still relevant. Pages can still appear in search even if they aren’t mobile friendly.
Show Paywalled Content to Googlebot Based on User Agent & IP Lookup
It’s OK to show Googlebot paywall pages with class names and schema markup based on user agent. You can also combine that with an IP lookup to recognise when Googlebot is looking at a page as opposed to another crawler.
May Take Time to Index Content for Single Page App Setup While Google Picks up JS Rendered Version
Google indexes the HTML version of a page first then the rendered version. John says that in future these two things will be done more or less at the same time. An example where this difference might be more noticeable is with a single page app setup where one HTML file is served to all pages which has no content and then the content is only later picked up through JavaScript rendering.
Tabbed Content Loaded On-Click Won’t Be Indexed
Content in tabs is fine for mobile as long as it is loaded when page is loaded and not when the tab is clicked on, otherwise it won’t be indexed.
Google Mainly Uses GET Request For Normal Crawling & Indexing
Google pretty much only uses GET requests for normal crawling and indexing. However, that doesn’t mean you’ll never see POST and HEAD requests in your server logs, but probably they’re a lot rarer.
Prevent InfiniteScroll Content Being Indexed by Blocking Onscroll Script with Robots.txt
If you need to prevent onscroll loaded content from being indexed, as with pages using infinitescroll, put the script that’s executed with the onscroll behind a robots.txt block.
Most HTTPS Migrations Take a Day to Change in Index
A HTTPS migration is easier for Google to process than most other types of migrations because it keeps the same domain and same URLs. If a site is restructured with changes to internal linking or the domain name, it means Google has to think about a lot more. However, HTTPS is still a big change and takes time to be processed by Google – most take a day or so to switch over in Google’s index.