Sitemaps
A sitemap is a list of all of the live URLs which exist on a site and is used to inform search engine crawlers of the most important pages and therefore which ones should be crawled and indexed.
There are several things to consider when creating sitemaps, as well as understanding how search engines view them. We cover a range of these topics within our SEO Office Hours Notes below, along with best practice recommendations and Google’s advice on sitemaps.
For more on sitemaps and SEO, check out our article: How to Improve Website Crawlability with Sitemaps.
Use Sitemaps With Last Modified for Expired Content
Use a last modified date with a regularly updated Sitemap to help get expired pages picked up more quickly.
Split up Sitemaps up to Identify Pages Indexed by Google
There is no way to get information on which specific URLs are indexed in Google. If you want to see what URLs have been indexed by Google, you can split the sitemap up into smaller parts. However, you shouldn’t focus on getting high numbers of URLs indexed, but more on the relevance of indexed pages and content.
Mobile URLs Won’t Need to be Included in Sitemaps for Mobile-first Indexing
It won’t be necessary to include mobile pages in sitemaps for Mobile-first indexing. You should continue to include on your canonical desktop pages. Google’s goal is to keep everything as similar as possible while indexing content on mobile pages.
Sitemap Errors don’t Impact Rankings but can Slow Down Indexing
Sitemaps help Google improve crawling and indexing of sites. If a sitemap can’t be properly processed, Google may take longer to index pages as have to rely on normal crawling and indexing to find those pages.
Add Images to Sitemap to Provide Google With More Information
Images can be added for each page of your site within your sitemap along with captions and alt-tags using special sitemap extensions.
Sitemap Index Counts Report the Exact Submitted URLs
Sitemap Index counts report the exact URL you sumbit including trailing slashes. If Google chooses to index a different copy of the same page, the submitted URL wouldn’t be reported as indexed.
Hreflang Data in Sitemaps May be Shown as ‘No Return’ Error Until the URLs are Recrawled
Until pages have been recrawled, you might see hreflang ‘no return hreflang’ errors in search console for hreflang data added to a Sitemap.
Identical Last Modified Dates in Sitemaps for all Pages will be ignored
If all the last modified dates in Sitemaps are identical, Google will assume they are incorrect.
Google Automatically Chooses a Preferred Domain Based on Multiple Signals
Google uses many factors to determine the canonical domain including redirects, canonicals, internal links, sitemaps and backlinks, so choosing a preferred domain in Search Console is not critical.
Sitemaps Require Absolute URLs and Recommends for Internal Links
Relative URLs are fine for internal linking, but Google requires absolute URLs for Sitemaps. Google now recommends using absolute URLs for internal linking to avoid any uncertainty about the preferred protocol.