Notes from the Google Webmaster Hangout on 8th July 2016.
Googlebot Requires a Consistent Response for Different User Agents
You need to provide the same response for all users based in the US, including Googlebot, or the content might not be indexed.
Hreflang Requires Multiple Crawls
Google needs to crawl pages with hreflang multiple times to learn that the markup is correct, so there can be a delay of up to a month to be recognised, such as after you have migrated URLs.
Don’t Disavow Links to Old Content
You don’t need to disavow links for old content which is no longer on the site, you should only do it for links from low quality sites.
404s Are Recrawled Periodically
Google will remember your 404 URLs for a long time, and periodically recrawl them to see if they are still 404. These will be reported in search console, but are perfectly fine.
Last Modified In Sitemaps Aids Crawling
Google thinks the Last Modified date in an XML Sitemap can be very useful to help them recrawl URLs, and they also support RSS and Atom feeds.
Increased Crawl Rates Can Be Caused by Authority, Server Performance or Duplicate Content
If you experience a temporarily high crawl rate, it might be caused because of an increase in authority, or Google thinks the server can handle an increased load, or they might be finding a lot of duplicate content caused by things like URL parameters.
Consistent Traffic Patterns Are Probably User Behaviour
If you see spikes in traffic referred from Google during certain times of day and week, it’s probably caused by user behaviour rather than Google.
URLs Are Only Used to Identify Pages
Google uses URLs mainly to identify pages, so grouping pages by path structure doesn’t make a difference.
Google May Use Contact Information
Google doesn’t look for contact information on a page for rankings, but John says it could be useful if Google wants to contact you directly to notify you of a serious issue which they occasionally do.
Google Ignores Irrelevant Sitemap Content
Google will ignore any information in Sitemaps which it doesn’t recognise, so you can include additional information for other purposes.
Sites Inaccessible to the US Won’t Be Indexed
If a website can only be viewed in a country outside the US, Google will not be able to crawl and index the site, and it won’t rank anywhere.
PageRank to 404 Pages is Lost
Any PageRank acquired for pages which return a 404 is lost.
Changing Servers Resets Crawl Rate
If you move to a new ‘server infrastructure’, Google may reduce the crawl rate until it has confidence that it can crawl at a faster rate.
Limit Links On a Page to 3000
Google webmaster guidelines recommends a maximum of 3000 links on a page, and anything over that are likely to be ignored.
Copied Content Can Outrank the Original Source
If someone copies content from your site, they might rank above you, for example when the original page is missing a good title tag and the copy provides more context.
Use a Single Language per Page
Multiple languages on the same page makes it harder for Google to understand when a page is relevant to a particular audience so John recommends using a single language per page, and use hreflang to connect the pages.
Redirect Deprecated Mobile Sites
If you want to remove a mobile site because you now have a responsive site, you should ideally redirect to manage bookmarks for users, but Google doesn’t really care as they’ll drop the mobile URLs when they recrawl the desktop URLs without the mobile rel tag.
Check Hosted Executables for Malware
If you are hosting executable files on your website, you need to be careful to ensure they are not harmful and don’t include malware, using tools like www.virustotal.com.