Getting URLs Crawled
So you already have a website and some of its pages are ranking on Google, great!
But what about when your website pages or URLs change? What about new, redirected or expired URLs?
We could sit back and wait for Google to crawl these updated URLs, or we could try and proactively help Google find and crawl them so the changes are reflected in Google’s SERPs as soon as possible.
Here are a few different ways to achieve this:
1. Link from key indexed pages
If you link to new URLs from existing pages, Google will discover these pages automatically. How well this works depends on website architecture. For example, it is great for blogs, where the latest content appears at the top of the page just waiting to be discovered, but it is not particularly efficient for e-commerce or advert-orientated websites where a large amount of new links could be added at the bottom of the web page.
If Google finds a new link to an old page, it’s more likely to crawl that page more frequently, so adding new links to re-activated pages should get them discovered more quickly.
As a side note: even if a URL is not linked, but just included in text, Google may still discover it.
2. Redirect from another URL
In a similar way to being linked, if an existing URL is crawled again and it redirects to a new URL, this new URL will be crawled.
When you redevelop, migrate or change your website platform, you should also redirect all your images, JavaScript and CSS files as well so the new URLs can be discovered more quickly.
3. Sitemaps
Sitemaps were invented to help websites get their pages discovered if they were not crawlable; a common problem in the earlier days of the web.
To ensure optimal crawl frequency, break down the sitemaps as much as possible and put new or updated content into separate sitemaps.
If you want Google to see your redirected URLs, such as after a URL change, you can submit the old URLs in a sitemap to help Google re-crawl them more quickly.
You can also submit an XML sitemap with expired pages to help get them removed from Google’s index more quickly. It’s best to put them into a separate sitemap file so you can view them separately to other indexable URLs.
4. RSS
An RSS feed is effectively the same as a Sitemap, and can be submitted as such in Search Console. They will not however, be discovered automatically so have to be submitted manually.
5. Pubsubhubbub
This is the fastest way to get content discovered, and can be used for any content type.
You add a hub link to your RSS feed, which is discovered by Google after the next crawl. Google will then subscribe to the feed by providing a ping URL and cease crawling. The PubSubHubbub script then pings Google every time a new item is added to the feed, to invite Google to now crawl the feed.
6. Submit URL
Google have a ‘Submit URL’ tool for to submit individual URLs to Google’s index, although this doesn’t scale well, so is only useful for small websites with not many pages.
https://www.google.com/webmasters/tools/submit-url?hl=en_uk
7. Fetch as Google
After using the ‘Fetch as Google’ tool, you get the option to Submit to Index.
Select ‘Crawl only this URL’ to submit one individual URL to Google for re-crawling. You can submit up to 500 individual URLs per month in this way.
Select ‘Crawl this URL and its direct links’ to submit the URL and all the other pages the URL links to, for re-crawling. You can submit up to 10 requests of this kind per month.
This is also an effective way to get an updated Robots.txt file discovered more quickly.
8. App Indexing API
If you have a mobile app, you can push content to Google using their App Indexing API.
https://developers.google.com/app-indexing/android/publish
What NOT to do
Many people have speculated that Google used emails to discover new URLs. After much testing however, it was made clear they do not.
https://magicseoball.com/does-google-use-gmail-for-url-discovery/
Google Analytics
Google do not use Google Analytics data for URL discovery, but they have historically discovered URLs which were hard-coded into JavaScript.
JavaScript Links
Google is now able to render JavaScript and can therefore discover new JavaScript links. One downside though, is that JavaScript processing takes longer than pure HTML crawling (‘it takes another cycle or two longer to process’ says John.
In conclusion, you shouldn’t rely on JavaScript generated content to get URLs indexed quickly.
So there you have it, a few different ways to get URLs crawled. Don’t feel tied down to one method, you can mix and match to get the results you want.