Duplicate Content
What is duplicate content? Duplicate content occurs when there is the exact same (or very similar) content appearing in multiple places on a website.
There are several SEO issues that can occur when a website has duplicate content, including crawl budget issues, search engine indexing issues, index bloat, keyword cannibalization, and canonical tag issues.
Our SEO Office Hours recaps below compile best practices Google has recommended for websites dealing with duplicate content issues.
(See Lumar’s full guide to duplicate content for even more actionable tips on how SEOs can address duplicate content issues.)
For even more on website content best practices for SEO, read our Guide to Optimizing Website Content for Search — or explore our Website Intelligence Academy resources on SEO & Content.
HTML & AMP Pages Containing the Same Content Will Not Be Negatively Seen As Duplicate Content
Having the same content on both HTML and AMP pages is not negatively seen as duplicate content by Google. However, it can lead to competition between the pages within search results. To avoid this, John recommends concentrating the value of both pages using the relevant rel alternate link and canonical tag.
International Websites on Separate Subdomains Will Not Be Penalized for Duplicate Content
Google will not penalize international websites that exist on separate subdomains if they have duplicate content. Instead, it will recognise the pages are identical and in most cases index both, but will only pick one URL to show in search results.
Pages with Internally Duplicated Content Are Indexed Separately but Folded Together in Search
Google will index pages with duplicate blocks of text separately but will work out which of those pages is most relevant to show for each query and will show just one of them in the search results.
Directory Sites Should Have Unique, Valuable Content to Perform in Search
To rank in search, directories should provide unique information that would make users want to visit that site instead of going straight to the website of the business that they want contact details for.
Increase in Duplicate Content Could be Due to Changes in GSC Reporting
Increases in the number of duplicate pages reported in Search Console could be because the tool only started reporting on that in the second part of last year, not that Google is suddenly seeing these pages differently.
Syndicated Content May Rank Higher Than Original Source
If content is syndicated, webmasters need to accept the possibility that other websites publishing your content may rank higher. The canonical tag can help indicate to Google which page should be indexed but this may be overridden.
Merging Internal or External Pages on One Topic Will Result in Higher Rankings
If separate sites that rank well already and both focus on the same topic or service merge, then they will see an increase in rankings as Google sees an even stronger page than before.
Sites Borrowing Content From Other Sites Will Outrank Them if They Provide Additional Value
If another website uses an image on your site but writes additional engaging, descriptive content, then that page would rank above your original page due to the added value they are providing, despite you being the original source.
Google May Fold Together Similar or Duplicate Hreflang Versions
Google may fold together hreflang versions of a page if the content is similar or the same, as it doesn’t make sense for both versions to be indexed.
Internally Duplicated Content Isn’t Penalised But Can Waste Crawl Budget
Google doesn’t penalise sites for duplicating content internally but it can waste crawl budget.