We’re back with our monthly round-up of the key takeaways from Google Search Central’s SEO Office Hours. As always, you can browse our full library of SEO Office Hours takeaways, or read on below for advice from the most recent sessions.
APIs & Crawl Budget: Don’t block API requests if they load important content
An attendee asked whether a website should disallow subdomains that are sending API requests, as they seemed to be taking up a lot of crawl budget. They also asked how API endpoints are discovered or used by Google.
John first clarified that API endpoints are normally used by JavaScript on a website. When Google renders the page, it will try to load the content served by the API and use it for rendering the page. It might be hard for Google to cache the API results, depending on your API and JavaScript set-up — which means Google may crawl a lot of the API requests to get a rendered version of your page for indexing.
You could help avoid crawl budget issues here by making sure the API results are cached well and don’t contain timestamps in the URL. If you don’t care about the content being returned to Google, you could block the API subdomains from being crawled, but you should test this out first to make sure it doesn’t stop critical content from being rendered.
John suggested making a test page that doesn’t crawl the API, or uses a broken URL for it, and see how the page renders in the browser (and for Google).
Skip directly to this part of the session video below:
Get more key takeaways on crawl budget in our SEO Office Hours Library.
Use rel=”canonical” or robots.txt instead of nofollow for internal linking
A question was asked about whether it was appropriate to use the nofollow attribute on internal links to avoid unnecessary crawl requests for URLs that you don’t wish to be crawled or indexed.
John replied that it’s an option, but it doesn’t make much sense to do this for internal links. In most cases, it’s recommended to use the rel=canonical tag to point at the URLs you want to be indexed instead, or use the disallow directive in robots.txt for URLs you really don’t want to be crawled.
He suggested figuring out if there is a page you would prefer to have indexed and, in that case, use the canonical — or if it’s causing crawling problems, you could consider the robots.txt. He clarified that with the canonical, Google would first need to crawl the page, but over time would focus on the canonical URL instead and begin to use that primarily for crawling and indexing.
Skip directly to this part of the session video below:
Get more key takeaways on canonicalization in our SEO Office Hours Library.
Mismatch in number of indexed URLs shown in site:query vs. GSC
One interesting question was why the Google search results of a site:query don’t match what Search Console shows for the same website. John responded that there are slightly different optimizations for site:query.
When site:query is used in Google search to determine the number of indexed URLs, Google just wants to return a number as quickly as possible and this can be a very rough approximation. If you need an exact number of URLs that are indexed, he clarified that you should use Search Console to get this information. GSC is where Google provides the numbers as directly and clearly as possible. These can fluctuate, but overall the number shown in Search Console for the indexing report is the number of URLs you have indexed for a website — and is likely to be more accurate than the site:query results shown in the SERPs.
The site:query result is only a rough approximation of pages indexed
Skip directly to this part of the session video below:
Get more key takeaways on Google Search Console in our SEO Office Hours Library.
What is the difference between JavaScript and HTTP redirects?
John explained that, in general, Google strongly prefers server-side redirects (301 or 302 redirects, for example) to JavaScript redirects.
If you use JavaScript to generate the redirect, Google first has to render the Javascript and see what it does, and then see the redirect and follow it. If you can’t do a server-side redirect, you can still use JavaScript, but it just takes longer for Google to process. Using a meta-refresh redirect is another option but again, this will take longer as it needs to be figured out by Google.
Skip directly to this part of the session video below:
Get more key takeaways on redirects in our SEO Office Hours Library.
You can’t force Google to show a specific URL as a sitelink in the SERPs
Sitelinks are additional results that are sometimes shown below a search result in Google.
John clarified that there are no meta tags or structured data that would force or recommend a specific URL to appear as a sitelink in the SERPs. Google’s systems try to figure out what is related or relevant when looking at a web page. He recommended having a good website structure, clear internal links, and to include clear titles to support sitelinks. There is no guarantee that it will yield a sitelink in the search results, but it helps Google to figure out what content is related and to choose a site link based on that information.
Skip directly to this part of the session video below:
Get more key takeaways on Google SERP features in our SEO Office Hours Library or in our article, “State of the SERPs: 7 Enhanced SERP Features & Rich Result Types to Know”.