Directory   Forum   Reviews   Join Blog Affiliates  
Friends?: Twitter - Plurk - StumbleUpon - MyBlogLog - Sphinn - Digg

Go Back   Blog Forum - Bloggeries > Blogging Basics of the Blogosphere > Blogging Basics

Blogging Basics Come here to learn the basics of blogging.

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 08-03-2007, 12:18 PM
jshaffstall's Avatar
jshaffstall jshaffstall is offline
Moderator
 
Join Date: Apr 2007
Location: Columbus, OH
Posts: 339
iTrader: (0)
jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future
Default Duplicate Content 101

Duplicate content has been mentioned a few times, so I thought I'd write up my current understanding of it all to help anyone who isn't aware of the issue. And I hope that those who are more aware of it than I am chime in to correct any misunderstanding I have!

What is duplicate content?

It's the exact same content that appears in more than one web page.

What's the issue?

Google will pick one of the pages to be the primary source, and stick the other pages in the supplemental index. When people search on the topic, if there's a page in the primary index that'll be used and the pages in the supplemental index won't show up in search results.

Is this a problem if I write all my own content?

Yes. Let's say you write a blog post and put it into three categories. Depending on how you have your blog setup, the full text of that blog post can appear in as many as six places: your main page, the post page, the three category pages, and in your feed.

Google will pick one of those places to be the primary page, and the rest will go into the supplemental index.

How does Google choose which is primary?

You've got me. It'd be nice if their algorithm gave priority to the post page, but it doesn't. I've seen feeds show up in Google search results, which is a shame. When you click on a feed link in Google search results, you don't get taken to the web page with the content, but to a feed subscription page. Chances are good very few people will actually subscribe just to read a page they think they might be interested in.

It's especially sad when those feeds rank highly in Google's results. That's a lot of traffic that isn't going to the blog.

What can I do?

To start, pick a theme that shows only excerpts in your category pages. This means the full post content will now only show up on your main page, the post page, and the feed.

If the category you want only has full post text on category pages, Vandelay shows how to modify it to show excerpts in this post: SEO Basics for Blogs

You could change your feed to only show excerpts, but I don't recommend that. Many readers feel that's a sign of disrespect, and some will unsubscribe from feeds that only show excerpts.

Use robots.txt to prevent indexing of feeds

There's no reason to have a feed indexed in search results. You can use a robots.txt file to prevent feeds from being indexed. This is a text file you upload to the main directory of your domain.

Here's a portion of a robots.txt you could start with:

Code:
sitemap: http://www.onlineopportunity.org/sitemap.xml

# This rule means it applies to all user-agents
User-agent:  *
 
# Disallow all directories and files within
Disallow: /cgi-bin/
Disallow: /contact/
Disallow: /wp-admin/
Disallow: /wp-includes/
 
# The Googlebot is the main search bot for google
User-agent: Googlebot
 
# Disallow all files ending with these extensions
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.wmv$
Disallow: /*.tar$
Disallow: /*.tgz$
Disallow: /*.cgi$
Disallow: /*.xhtml$
 
# Disallow Google from parsing indididual post feeds and trackbacks..
Disallow: /*/feed/$
Disallow: /*/feed/rss/$
Disallow: /*/trackback/$
I'm not a robots.txt expert, and the above was taken from another site's example. If you're interested in what it all means, there's a tutorial here: Clockwatchers Web Hosting - robots.txt Tutorial

The basic goal is to prevent Google from indexing feeds of any sort.

Isn't there an easier way?

There's a Wordpress plugin that will do much the same thing, at: Duplicate Content Cure Plugin for Wordpress - SEOlogs.com

What's left?

You now have two sources of duplicate content. The full text of the post appears on your main page and your post page. The excerpts of the post appears in each category page (three in our above example, but more if you assigned the post to more categories).

The full text duplication on the main page and the post page doesn't seem to be much of an issue. The post will fall off your main page in short order if you're posting frequently, at which point the post page becomes the only copy of the full text.

The duplication between categories is more of an issue. The plugin linked to above solves this by not allowing category pages to be indexed.

In my opinion that's a mistake, because category pages make nice landing pages for the keyword the category uses. The only other solution, which I use, is to limit the number of categories each post can be in to 1. This means you'll have no duplication of excerpts between category pages.

If you're using the plugin you do need to modify it a bit to allow category pages to be indexed, if you're going with the one category per post technique. The plugin page itself gives details on this.

If you don't think you can limit posts to one category each, use the plugin with no changes.

What now?

By this point you should have only one primary copy of the full text of a post on your blog, and in Google's search results. It will take some time for Google to fully expunge old supplemental pages from the index, but eventually you'll be able to use a tool like the SEO Plugin for Firefox at SEO for Firefox: Free Search Engine Optimization Software Extension for Firefox to see that you have very few pages in the supplemental index.

What have I missed?

Those of you who have done more of this than me, what have I missed or gotten wrong?

Jay
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!Stumble this Post!Twit this!
Reply With Quote
  #2 (permalink)  
Old 08-03-2007, 03:54 PM
Bloggeries's Avatar
Bloggeries Bloggeries is offline
Administrator
 
Join Date: Dec 2006
Location: Blogosphere at Large!
Posts: 5,908
iTrader: (1)
Bloggeries has a reputation beyond repute Bloggeries has a reputation beyond repute Bloggeries has a reputation beyond repute Bloggeries has a reputation beyond repute Bloggeries has a reputation beyond repute Bloggeries has a reputation beyond repute Bloggeries has a reputation beyond repute Bloggeries has a reputation beyond repute Bloggeries has a reputation beyond repute Bloggeries has a reputation beyond repute Bloggeries has a reputation beyond repute
Default Re: Duplicate Content 101

Thanks for this quality indepth post on the issue of Duplicate content. It's opened my eyes. Bloggeries has 15,800 indexed pages on google and ~14,000 in supplemental; not totally sure what exactly this means. Now I have to determine how many of the indexed ones actually the primary page intended. This could take a while
__________________
- Rob - Blogs For Sale | Home Business Blog | Blogging Jobs
Are you a blogger / vlogger or podcaster? Why haven't you joined? Click to JOIN NOW!
If you add me on twitter - @Bloggeries; send me a PM so I can add you back!
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!Stumble this Post!Twit this!
Reply With Quote
  #3 (permalink)  
Old 08-03-2007, 04:07 PM
jshaffstall's Avatar
jshaffstall jshaffstall is offline
Moderator
 
Join Date: Apr 2007
Location: Columbus, OH
Posts: 339
iTrader: (0)
jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future
Default Re: Duplicate Content 101

Quote:
Originally Posted by Bloggeries View Post
Thanks for this quality indepth post on the issue of Duplicate content. It's opened my eyes. Bloggeries has 15,800 indexed pages on google and ~14,000 in supplemental; not totally sure what exactly this means. Now I have to determine how many of the indexed ones actually the primary page intended. This could take a while
I never actually bothered too much trying to figure out whether the right ones were indexed or not, I just removed the secondary pages from the index by telling Google not to index them.

It took about three or four weeks for the index to purge itself, but now what comes up is always the primary page.

I do still have four pages in the supplemental index. One of these days I'll have to track down which four those are.

Jay
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!Stumble this Post!Twit this!
Reply With Quote
  #4 (permalink)  
Old 08-05-2007, 02:50 PM
Prettylady's Avatar
Prettylady Prettylady is offline
Established
 
Join Date: Aug 2007
Location: Canada
Posts: 29
iTrader: (0)
Prettylady is on a distinguished road
Default Re: Duplicate Content 101

Google announced that they have done away with the "supplemental results."
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!Stumble this Post!Twit this!
Reply With Quote
  #5 (permalink)  
Old 08-05-2007, 03:47 PM
jshaffstall's Avatar
jshaffstall jshaffstall is offline
Moderator
 
Join Date: Apr 2007
Location: Columbus, OH
Posts: 339
iTrader: (0)
jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future
Default Re: Duplicate Content 101

For those interested in reading the announcement, it's at: Official Google Webmaster Central Blog: Supplemental goes mainstream

Based on my reading, Google has not done away with supplemental results. They have stopped labeling results as supplemental.

The supplemental index still exists, and still works much the same as it always has. Pages will still end up there, and those pages will not turn up in search results for competitive keywords.

According to this page: How to Find if a Page is in Google’s Secret Supplemental Results. - Jim Boykin’s Internet Marketing Blog , pages in the supplemental index may not be completely indexed, so not all the keywords on the page will be used by Google even on long tail keywords.

Still seems like a good idea to keep pages primary until they really do get rid of supplemental results entirely.

Jay
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!Stumble this Post!Twit this!
Reply With Quote
  #6 (permalink)  
Old 08-05-2007, 03:58 PM
Prettylady's Avatar
Prettylady Prettylady is offline
Established
 
Join Date: Aug 2007
Location: Canada
Posts: 29
iTrader: (0)
Prettylady is on a distinguished road
Default Re: Duplicate Content 101

Correct, but you will have no way of knowing what are supplemental and what are not, however as you said it is good to apply the fixes to your blog. Perhaps I should have said Google have done away with the "supplemental results" label from the search results within Google.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!Stumble this Post!Twit this!
Reply With Quote
  #7 (permalink)  
Old 08-05-2007, 04:25 PM
jshaffstall's Avatar
jshaffstall jshaffstall is offline
Moderator
 
Join Date: Apr 2007
Location: Columbus, OH
Posts: 339
iTrader: (0)
jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future
Default Re: Duplicate Content 101

Quote:
Originally Posted by Prettylady View Post
Correct, but you will have no way of knowing what are supplemental and what are not, however as you said it is good to apply the fixes to your blog. Perhaps I should have said Google have done away with the "supplemental results" label from the search results within Google.
I think you're right, so tools like the SEO plugin for Firefox will no longer be able to tell you how many pages you have supplemental.

Probably not a big issue, if you approach it from the perspective of eliminating duplicate content from the ground up.

Jay
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!Stumble this Post!Twit this!
Reply With Quote
  #8 (permalink)  
Old 08-05-2007, 04:35 PM
Prettylady's Avatar
Prettylady Prettylady is offline
Established
 
Join Date: Aug 2007
Location: Canada
Posts: 29
iTrader: (0)
Prettylady is on a distinguished road
Default Re: Duplicate Content 101

It makes it harder, because we will not know how many pages have fallen into Google hell. Some have suggested that Supplemental Results be moved to the Webmaster Center. See

Google Hides Supplemental Results Label in Google.com: Webmaster Reaction
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!Stumble this Post!Twit this!
Reply With Quote
  #9 (permalink)  
Old 08-05-2007, 04:50 PM
jshaffstall's Avatar
jshaffstall jshaffstall is offline
Moderator
 
Join Date: Apr 2007
Location: Columbus, OH
Posts: 339
iTrader: (0)
jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future jshaffstall has a brilliant future
Default Re: Duplicate Content 101

Quote:
Originally Posted by Prettylady View Post
It makes it harder, because we will not know how many pages have fallen into Google hell.
I guess I don't see that as a problem. Now that I know how to keep pages out of supplemental (except for those pesky 4), I can apply that process on a site/blog without worrying about how many are in supplemental or not, and have confidence that I've removed the major cause of supplemental pages.

At least until Google changes their algorithm. ;-)

Jay
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!Stumble this Post!Twit this!
Reply With Quote
  #10 (permalink)  
Old 08-05-2007, 05:07 PM
Prettylady's Avatar
Prettylady Prettylady is offline
Established
 
Join Date: Aug 2007
Location: Canada
Posts: 29
iTrader: (0)
Prettylady is on a distinguished road
Default Re: Duplicate Content 101

Well from the buzz throughout blogs, forums etc many still see it as a big issue.