Duplicate Content
What is duplicate content? Duplicate content occurs when there is the exact same (or very similar) content appearing in multiple places on a website.
There are several SEO issues that can occur when a website has duplicate content, including crawl budget issues, search engine indexing issues, index bloat, keyword cannibalization, and canonical tag issues.
Our SEO Office Hours recaps below compile best practices Google has recommended for websites dealing with duplicate content issues.
(See Lumar’s full guide to duplicate content for even more actionable tips on how SEOs can address duplicate content issues.)
For even more on website content best practices for SEO, read our Guide to Optimizing Website Content for Search — or explore our Website Intelligence Academy resources on SEO & Content.
Different Country Versions of Content Can Dilute Value of Original Content
Creating different country versions of pages in the same language can dilute the value of the original content and mean that other sites may rank for this term. John recommends working with people who have experience of working with large international sites so you can get the balance right of having pages that are useful to visitors in different countries while not the diluting value of these pages.
Google Considers Duplicate Text Snippets as Natural
Repeating text snippets across a site is perfectly natural. When someone searches for text included in that snippet, Google will try to return the most relevant page. In these cases, it is important to consider what page you want ranking in search. You may want to consider A/B testing with different category pages.
Incorrect Google Caching Suggests a Different Canonical Has Been Chosen
If the cached content from a different page is displayed on a page, this could be because Google has determined that the two pages are duplicates and has canonicalised one to the other.
Manual Actions Are Given for Sites Made Up Entirely of Duplicate Content
Website rankings are only harmed by duplicate content when a site is made up purely of duplicate content. The Google algorithms and web spam team will only remove sites from the index in this circumstance.
Google Indexes Duplicate Pages But Only Shows the Most Relevant One
For a duplicate shared page across two websites or brands that don’t want to canonicalise, Google would index both but only show one depending on overall relevancy and personalisation factors such as location.
Add Unique Content So International Duplicate Pages Can Be Indexed Separately
Including some unique content on international duplicate pages, such as local addresses and currencies, will prevent them being folded together and allow them to be indexed separately.
Location Name High in URL Path Raises Suspicion About Doorway Pages
Location names high up in the URL path raises suspicions that there are going to be a lot of doorway pages on the site, where the full content is being duplicated across a large number of cities. The web spam team might take action on these if there is nothing unique across these city pages.
Google Folds Together Sites With Same Server, Content & URL Paths
If Google finds sites using the same server and have same content and URL paths, these will likely be seen as identical and folded together in search.
Aggregator Content Can Outrank Orginal Source if Latter is Lower Quality
Original source of content can be outranked by aggregator sites with duplicate content if former is seen as lower in quality by Google. In this case John recommends working on improving the overall quality of a site to prevent this from happening.
URL Duplication Issue on Larger Sites as Google More Likely to Miss New Content
URL duplication is more of an issue on larger sites, as opposed to small or medium sites. This is because Google is less likely to be able to crawl the whole site and new content is more likely to be missed if there is a lot of duplication.