Duplicate content is one of the biggest barriers towards the success of a website.
No matter if you have duplicate content in one domain or across multiple domains, no matter if you have copied content from another source or another source has copied content from your website, if your website get’s flagged for having duplicate content, it will significantly reduce the trust of search engines. Search engines, especially Google does not like websites with too much duplicate content since duplicate content creates a very bad user experience and is a violation of Google’s webmaster quality guidelines. If your website ends up having a lot of pages with duplicate content, your business is in serious trouble. Your website will lose search rankings, traffic will decline, your website will get classified as a low-quality content farm and in the worst case, your website will eventually get hit by an algorithmic penalty and get delisted from search engines.
What is Duplicate Content?
Before we discuss other aspects of duplicate content, let’s first understand what is duplicate content and how it affects SEO.
“Duplicate content” is content which is available through multiple URL’s. Content which is available in two locations is called duplicate content. If two pages in your website have exactly the same content in it, then both these pages are said to have “Duplicate content” since page number 1 has content that is available in page number 2 and vice versa. Again, if a page on your website has the same content as of the page on another website, then both these pages are said to host duplicate content.
Duplicate content is not just limited to textual content but is also applicable for other rich media content such as “Images”, “Videos”, “Documents” and “Presentations”. If two pages have the same textual and rich media elements aligned in exactly the same way, it is a strong signal that one of the pages has duplicated the content of another page and re-published the content on its website.
Now you may wonder, how will Google figure out which one is the original source and which one is the duplicate copy?
Google (and other search engines) have more than one ways to figure out who is the original source and who is the imposter. They use a variety of. signals to determine which could be most likely the original creator of a piece of content and which one could be the duplicate. This includes website authority trust, relevance, popularity and overall history. Typically, the sites which are low quality in nature, the sites which publish hundreds of thousands of pages on a given day and create multiple versions of the same website every other month are the ones which have low trust scores with Google. These are the sites which often duplicate content from other “legitimate” editorial sites. These are the sites which scrape or syndicate content without proper attribution or permission.
Google knows the pattern with which these sites operate and they are smart enough to figure out which one is the original source and which one is the duplicate. Also, it is worth noting that when a site duplicates content from a legitimate editorial site, they also end up duplicating the links from the site they are copying the content from. What happens is that their page links to several pages on the source site, which is again a strong signal that this website is actually duplicating content from a “Legitimate” source site.
How does Google Handle Duplicate Content?
In this video, ex-Google employee Matt Cutts discusses how Google handles duplicate content for websites that have too many pages with duplicate content. He describes the negative effects of duplicate content and how it hurts the SEO fo a website
Here is what Matt Suggests
If Google finds two pages which are actually kind of identical, rather than showing both the pages in search results, Google will show one page in search result which it thinks is the original source. On a broader way, duplicate content is not treated as spam. Rather, it is treated as a cluster of pages which are identical and only one of them deserves to rank in search results while the others are hidden because of the fact that they provide no additional value to the user since the content in the page is identical to a page which is already ranking on search results. So if you are simply taking an RSS feed and auto blogging all the posts from that RSS feed into your blog, you are not adding any additional value. You are simply creating content which is already out there and hence this behavior is marked as “Spam”. In these cases, the content on your website is duplicate and the pattern with which you create content is also flagged to be spammy, so your website will surely get flagged by Google web spam team or by an algorithmic filter.
As long as you are not creating, re-using the same boilerplate content from another source and mass producing pages or blog posts on your website, trying to rank for every other possible phrase with pages with no additional value, you need not worry about duplicate content issues
Here is Google’s official guidelines on how to deal with duplicate content on your website.
How Much Duplicate Content is Acceptable?
There is no precise way to answer this question – how much duplicate content is acceptable? The most preferred way to answer this is – Try to have no duplicate content at all. Minimize duplicate content as much as possible because it is a strong negative seo signal. That said, there are situations when you have no choice but to have some duplicate content on your website. This is quite common for e-Commerce websites or websites with product catalogs when adding new products to a specific category creates duplicate content which is unavoidable.
These are corner cases and Google understands that you are not deliberately creating duplicate content on your website for increasing your website’s traffic through search engines. If the duplicate content is less than 4% of all the unique content available on your site, you are fine. However, if the level goes more than 10%, you might be in serious trouble since if 1/10th of your website is duplicate in nature, your website is most likely to get flagged for violating Google webmaster quality guidelines.
Also, note that duplicate content under multiple domains or subdomains is not acceptable. It does not matter if you are the owner of these domains or if you are not the owner if the same content is accessible through multiple locations, Google and the search engines won’t be very happy about it. If you must have duplicate content on your website for user experience and other purposes and not for attracting traffic from search engines, make sure you use the rel=canonical element to prevent duplicate content or make use of Robots meta tag to “Noindex and Nofollow” the content which is duplicate.
If you have a forum on your website which hosts user-generated content, you have to keep it in moderation and ensure that the people posting content as user comments or forum threads are original and not copy pasted from other websites. It is okay to use a paragraph from credible sources such as Wikipedia but if every other post in your portal or forum turns out to have content which is available elsewhere, then these pages will get flagged as low-quality spammy pages containing duplicate content. This will affect the health of your entire website so be very careful with user-generated content and keep them under editorial moderation.
How to Prevent or Fix Duplicate content Issues on your website?
Here are few tips and suggestions to prevent duplicate content on your website and stay compliant with Google webmaster quality guidelines
- Don’t hire cheap, low-quality writers and content creators -If your website is fairly new and you are low on business capital, it is often tempting to hire cheap writers online who produce fly by night content just for the sake of creating content, with no added value. It has been found that these low quality writers often plagiarize content from other sources and rehash content from different sites. Hence, it is a good idea not to hire cheap low-quality writers, article banks, and outsourcing companies to produce the content for your website. It is a good idea to hire qualified writers with a background in your industry, who know the subject well, people who are genuine and people whom you can trust.
- Don’t try to achieve success overnightMost new business owners try to achieve success overnight. In this pursuit, they start creating content and try to have as many pages possible on the website, hoping that this will help improve the traffic of the website and the business will grow. What happens is in this pursuit, they create tons of low-quality landing pages, thin blog posts and often end up publishing duplicate content in a hurry to rank for profitable keywords as fast as possible.
- Don’t try to fool search engines and think that only you are smartIt is a human tendency to look for shortcuts and achieve the same thing with minimum labor as possible. But also consider the fact that you are not the only smart guy in the room. Google and other search engines are pretty smart themselves and they can detect patterns in your content. So if you try deceptive ways of publishing content which looks original but actually is duplicate in nature, you will get caught, sooner than later. Don’t try to fool search engines, play it fair.
- Don’t outsource content in the initial days.Say NO to content outsourcing. It just doesn’t work and no matter how hard you try, you will end up with people who somehow try to cheat you by publishing rehashed or duplicate content. Instead, put your money, time and efforts in building a strong editorial team from the ground up and try to retain the team and maintain core principles of creating high-quality engaging content for your audience. Outsourcing content from people or companies usually don’t work in the long haul, an inbound team is more likely to help you succeed than an offshore outbound content team.
When your business gets bigger and broader and there is a lot of cash flow in your business, then you may consider hiring an offshore content team with high quality writers. But in the initial days, stick to people in your region and create an in house content team whom you can supervise.
- Always, proofread all the content and check for plagiarism before publishing.No matter how credible your writers are, it is always a good idea to proofread the content and check for plagiarism before the article foes live on the website. There are situations when the author of an article may have copied content by mistake and forgot to attribute the source to the original author. Hence, always proofread and check for plagiarism, no matter how credible the author is and no matter how much you trust him/her. Make it a routine and stick with it.
- Minimize pages with Similar content. Consolidate them into a single pageThere is practically no point in having lots of similar pages with the same boilerplate content, just because you want to rank for a specific keyword phrase or a location-based search query. Don’t create pages with little original content and lot of boilerplate text in it, it will get flagged as duplicate because if a page does not offer substantial value to users and has blocks of similar text found elsewhere in the website, it is very close to having a page with duplicate content.
If you have too many pages with similar content, get rid of them or consolidate them into a single page. Ask yourself – why does this page exist on my website and what additional value it provides to my customers?
- Use a fixed linking patternIf you link to pages on your website like – www.example.com/page/ , use this format for all internal links. Don’t link to random pages with different linking patterns such as www.example.com/page/index.html, sometimes these links can cause duplicate content issues on your website.
- Understand how the content management system of your website works.It is very important to understand how the content management system of your website works. If you are using WordPress, Joomla, Drupal, Magento or any other content management system, invest some of your time to truly understand how each and every feature of the CMS works. Understand what happens behind the scenes when you publish a new page, understand the most common scenarios when duplicate content issues are found in the CMS and investigate common issues which creates mirror pages that may give rise to duplicate content issues. If you are not sure, post a question in forums and let other people inspect your website for potential duplicate content issues.
- It helps to use duplicate content checker tools once in a while.There are lots of free duplicate content tools out there which you can use to scan and find duplicate content on the site. SEO review, CopyScape, Siteliner are some good online tools for checking duplicate content on your website.
All in all, it is very important not to create any duplicate content on your website in the first place and also routinely monitor your website for any duplicate content issues. Duplicate content is a serious SEO threat and not paying attention can sometimes remove the entire website from Google search results.
Be Sure to read our SEO Guide which contains useful information about SEO and we have discussed in detail key SEO Concepts with examples.