Duplicate Content
What is duplicate content? Duplicate content occurs when there is the exact same (or very similar) content appearing in multiple places on a website.
There are several SEO issues that can occur when a website has duplicate content, including crawl budget issues, search engine indexing issues, index bloat, keyword cannibalization, and canonical tag issues.
Our SEO Office Hours recaps below compile best practices Google has recommended for websites dealing with duplicate content issues.
(See Lumar’s full guide to duplicate content for even more actionable tips on how SEOs can address duplicate content issues.)
For even more on website content best practices for SEO, read our Guide to Optimizing Website Content for Search — or explore our Website Intelligence Academy resources on SEO & Content.
Google Recognises Unique Content
Google tries to recognised unique content on a page which are not on other pages on the web, and highlight that in search, and content which is duplicated on other pages on the same website, which is not considered the primary content on the page.
Similar Country Pages May Be Treated as Duplicates
Country specific pages which are very near duplicates may still be folded together, even if the pages include hreflang tags. If you include more unique content for each version, it is less likely to happen.
Google Boosts Original Content Sources but Ranks the Most Relevant Sources
Google only gives a minor boost to the originator of content, but will rank the sources which it thinks are the most relevant.
Duplicate Blocks of Content Won’t Affect Individual Pages
Individual chunks of content which are duplicated across many pages will be detected and Google will only show a single result in search, but it won’t affect any individual page.
Use Info: Queries to Find Duplicate Pages
If you search for a URL with an info: query, and Google shows an alternative URL, it shows that Google thinks the pages are equivalent.
Re-Writing Content into Different Languages Is OK
Translating content into different languages, or adding additional information is OK. Auto-translation, or just swapping out individual words is not OK.
Google Distinguishes Primary Content from Boiler Plate Content
Google detects boiler plate content which appears in the site in the page header, side navigation or footer and treats this separately to the primary content.
Canonicals on Shared Content to the Original Source Consolidate Authority
Canonical on Shared Content Should consolidate backlink authority on the original.
Allow Googlebot to see AB Test
If running an AB test, make sure to treat Googlebot in the same way. If you use multiple URLs, ensure that both versions are crawlable but canonicalise to the main version.
Solve Duplication with Redirects, Canonical and Linking
John recommends using redirects, canonical tags, and consistent internal linking to the primary page to solve duplication. He says Google are against using robots.txt to prevent content duplication, because Google can’t recognise the pages are duplicated if they cannot crawl it.