Indexing
In order for web pages to be included within search results, they must be in Google’s index. Search engine indexing is a complex topic and is dependent on a number of different factors. Our SEO Office Hours Notes on indexing cover a range of best practices and compile indexability advice Google has released in their Office Hours sessions to help ensure your website’s important pages are indexed by search engines.
Google Does Not Index 404 Pages
If a page returns a 404 error code, Google will not index the page’s content. However, if the page has recently become a 404 page and Google has not crawled the page to see this, the page will still appear in search results. This error could also occur if the server displays a 404 page, but the code shown to crawlers is still a 200 status.
Signals are Kept for 4xx or 5xx Error Pages Previously Dropped from the Index When They Are Re-added
If your pages displayed a 4xx or 5xx error for a while and were dropped from the index but become available again after a month or so, for example, Google will be able to return them to the search results in the same state they were before. They won’t have to start trying to rank from nothing.
Likely That Content Won’t Be Indexed if Not Showing Up in Google Testing Tools
If Google’s testing tools are able to fetch all of the different resources for a page, but there is content missing in the rendered output, it is likely that this content won’t be able to be indexed.
More or Less Every New Website is Rendered When Google Crawls it For the First Time
Nearly every website goes through the two waves of indexing when Google sees it for the first time, meaning it isn’t indexed before it has been rendered.
URL Removal Tool Hides Pages But Doesn’t Impact Crawling or Indexing
The URL Removal Tool only hides a page from the search results. Nothing is changed with regards to the crawling and indexing of that page.
There Isn’t a Separate Index for Mobile and Desktop Indexing
Google have one main index where either the mobile or desktop version of a site is contained, this is the version which will then be shown in search results. However, if you have a seperate mobile site, Google will always show this version to users on a mobile device.
Disallowed Pages With Backlinks Can be Indexed by Google
Pages blocked by robots.txt cannot be crawled by Googlebot. However, if they a disallowed page has links pointing to it Google can determine it is worth being indexed despite not being able to crawl the page.
Google May Index Redirected URLs if Served in Sitemap Files
Redirects and sitemaps are both signals that Google uses to select preferred URLs. If you redirect to a destination URL but the source URL is in a sitemap, this is giving Google conflicting signals about which URL you want to be shown in search
Internal Search Results Pages Should be Blocked Unless They Provide Unique Value
Internal search result pages should be blocked from crawling because it could overload the site’s server and they tend to be low quality. However, there may be instances where it makes sense to have these pages indexed if they provide value.
Ensure all Key Content is Available if You Are Streaming Content
If a site is streaming content progressively to a page, John would recommend ensuring all key content is available immediately due to the method used to render content. Any additional content which is useful for users but not critical to be indexed can then be streamed progressively.