Indexing
In order for web pages to be included within search results, they must be in Google’s index. Search engine indexing is a complex topic and is dependent on a number of different factors. Our SEO Office Hours Notes on indexing cover a range of best practices and compile indexability advice Google has released in their Office Hours sessions to help ensure your website’s important pages are indexed by search engines.
Large Numbers of Noindex Pages are OK
A large number of noindex pages are OK, provided you don’t want them to rank in search results.
Duplicate Content Filtering is Query Dependent
Duplicate content may still be indexed but filtered out of search results for queries where it would results in an identical snippet.
Noindex Pages are Dropped Immediately After they are Processed
Noindex pages are dropped from the index immediately after they are processed, however processing can take some time to complete due to technical issues with Google.
Videos Hosted on Google Drive ‘should’ be Indexable
John thinks that Google will index videos hosted on Google drive and show a video snippet for the page which links to it.
Iframes Can be Used to Block Content
Iframes can be used to block content from being crawled.
RSS with Pubsubhubbub to get URLs Indexed
You can use an RSS feed with Pubsubhubbub to ping Google with new URLs as an alternative to Sitemaps.
Google Learns Which URL Parameters Return Irrelevant Pages
Google learns which parameters are returning irrelevant pages partly based on canonicalised URLs.
Use Sitemap Index Count to Measure Indexing
Site: is not an accurate measure of indexed pages, and could be off by ‘orders of magnitude’. The Sitemap Index count in Search Console is accurate for the specific URLs submitted.
There Is No Way to See Which AMP Pages Are Indexed in Google
AMP pages don’t appear in the index so site: queries won’t show them. Search console will only show you a total count, and a list of pages with problems. John suggests using analytics data to see pages with visits from Google to infer the pages are indexed.
Limit Indexing Internal Search Pages
You can index internal search results pages, provided they are generally useful to users and are limited to specific terms.