Duplicate Content
What is duplicate content? Duplicate content occurs when there is the exact same (or very similar) content appearing in multiple places on a website.
There are several SEO issues that can occur when a website has duplicate content, including crawl budget issues, search engine indexing issues, index bloat, keyword cannibalization, and canonical tag issues.
Our SEO Office Hours recaps below compile best practices Google has recommended for websites dealing with duplicate content issues.
(See Lumar’s full guide to duplicate content for even more actionable tips on how SEOs can address duplicate content issues.)
For even more on website content best practices for SEO, read our Guide to Optimizing Website Content for Search — or explore our Website Intelligence Academy resources on SEO & Content.
Avoid Having Domains Additionally Accessible as CDN Subdomains
If the same content exists on a main domain and as a subdomain of a CDN, it can be indexed separately. This also means Google will have to crawl more to see the same amount of content. Use redirects, canonical tags, internal linking and sitemaps to set a preferred version.
Copyright Violations & Duplicate Content Affect How Google Assesses the Rest of Your Site
If the majority of your content is flagged for something like DMCA copyright violations, Google may decide that the rest of your content isn’t high enough quality to show to users either.
Combine Duplicate Pages Across Owned Sites Into One Page
If you have duplicate pages across different sites, try grouping them into one page and listing the different locations where that service or product is available so you have one strong page to rank with.
GSC May Not Show Data For Your Other Same Language Sites if Content is Identical
Hreflang data may only appear for one of your sites in Search Console if the content is identical across a collection of same language sites e.g. UK and US. Use the ‘Inspect URL’ tool to check for issues like this.
Google Folds Together Different Country Versions in Search Unless Content is Unique
With different country versions of sites on different ccTLDs, Google will fold these together in search unless they have unique content. John recommends providing localised content on these different ccTLDs to make them as relevant as possible to users as well as consulting with experts in this area.
Canonicalise Duplicate Pages Between Your Sites so They’re Not Seen as Doorway Pages
Use the canonical tag if you are offering the same products on lots of different sites so Google doesn’t suspect that these are doorway pages.
Make Same Language, Different Country Page Versions Unique to Avoid Being Folded Together
International sites with different country versions with the same language can be problematic if Google folds them together in the index e.g. German and Austrian sites with the same content. John recommends making the content on these versions as different as possible, however this isn’t always possible, like with product pages. Webmasters can check if pages are being folded together by using an info: query to check the canonical version.
Use canonicalization Instead of Noindex for Duplicate Content
John recommends using rel=canonical instead of noindex in order to deal with duplicate content in the best way. This way the signals from both page versions can be combined rather than dropping all the signals from the noindexed page.
Duplicate PDFs Are Seen as Duplicate Content
Duplicate PDFs are seen in the same way as duplicate content. For duplicate PDFs Google would pick one to show in the search results.
Google Can Proactively Assume Duplicate Pages Before Crawling Them
Google will sometimes assumes that pages are duplicates before crawling them. This can happen when you have multiple parameters for your URLs that don’t actually change the content being served.