Indexing
In order for web pages to be included within search results, they must be in Google’s index. Search engine indexing is a complex topic and is dependent on a number of different factors. Our SEO Office Hours Notes on indexing cover a range of best practices and compile indexability advice Google has released in their Office Hours sessions to help ensure your website’s important pages are indexed by search engines.
Use Info: Query to See If a URL is Indexed
Use info: search operator with a URL to check if a specific page has been indexed.
Content in Iframes May be Indexed on the Embedding Page
Pages embedded within an iframe on another page may be indexed as content on the embedding page as it will be seen when the page is rendered. You can use X-Frame-Options to prevent browsers from embedding a page which Google will respect.
Internal Search Pages Should Not Be Indexable
Google recommends you block internal search from being indexed as will likely increase number of pages indexed for that site and can be be inefficient for crawling and indexing
Index Status in Search Console is Updated a Few Times a Week
The Index Status data in Search Console is updated 2-3 times a week.
Split up Sitemaps up to Identify Pages Indexed by Google
There is no way to get information on which specific URLs are indexed in Google. If you want to see what URLs have been indexed by Google, you can split the sitemap up into smaller parts. However, you shouldn’t focus on getting high numbers of URLs indexed, but more on the relevance of indexed pages and content.
The unavailable_after Meta Tag Tells Google when to Drop URLs from the Index
If you know when a page will expire, you can use the unavailable_after meta tag to tell Google when they should remove a URL from the index without them having to be recrawled.
The URL Removal Tool Blocks Pages Appearing in SERPs but Doesn’t Prevent Indexing
The URL removal tool in Search Console to hide individual pages from appearing in search results but doesn’t stop them being indexed. You can also use to the tool to remove all URLs under a shared path. You shouldn’t use the tool for general maintenance, only for something critical you want removed as quickly as possible.
Quality Algorithms are Used to Influence Crawling and Indexing Speed
Quality algorithms are used to influence other algorithms such as those which control crawling and indexing speed.
Google Filters Identical Duplicates During indexing, and Near Duplicates From Search Results Pages
When Google recognises identical pages, it will choose one version to index, and when pages are similar, only one may show up in search results. Google looks at factors such as rel canonicals, redirects and internal and external linking when identical pages are crawled to decide which one to index.
Near Identical Pages with HREFLANG may be Rolled Together
If you have identical pages which only differ a very small amount, such as a currency, Google may roll the pages together, but use HREFLANG to decide which one to show in search results.