I was called to dig this topic further when I read John Chow (again ?? :)) post where he lost his 1st ranking in Google for keyword ‘make money online’. For those who never heard of John Chow before, (I’m not promoting him ok:), you can type ‘make money online’ on google search box, and enter. There you go. John Chow is in front page. This is so evil genius
Few weeks back, he has lost his 1st rank for this keyword. I’m not going to give full sequence of what was happen to his blog (you can read if from there), but I’m more interested on finding by SEORefuge who has predict correctly on what was happen on John Chow’s blog
SEORefuge has predicted earlier on their blog that, he might do some changes on his blog’s robots.txt file. This changed has resulting his favorite keyword not ranked 1st in Google. SEORefuge prediction turned to be true when they comparing cache version of John Chow robots.txt file when the rank is dropping, and what robots.txt file looks like now. Its amazing, how powerful is robots.txt file in determining your rank for certain keyword in search engine!!
Robots.txt file
Frankly speaking, I never used robots.txt file before, except for my XML sitemap (for auto discovery purpose) and google adsense bot from Google Webmaster tool. To know more about robots.txt file, I’m suggesting you browsing over to Wikipedia since the official robots.txt website content is not so much up to date
Google Supplemental Index
Other than robots.txt file, another factor that might affect your ranking in search engine is called Google Supplemental Index. This is not something new, but more something that I ignored before. I never really care about it (poor me). Basically, Supplemental Index is where the unworthy pages end up. Some SEO expert (and blogger) point out that, the more you page indexed fall into Supplemental Index, the less search engine will bring visitors to you website due to the frequency update on Supplemental Index is not as frequent as the main index
There is an interesting articles by Nathan from Not So Boring Life about how to get rid of Google Supplemental Index. From his article, he teaches how to identify how many of your pages are in Supplemental Index and how to get rid of it. I summarize here what on in his article.
To identify which pages are in Supplementary Index
- Run this query on google (site:www.yoursite.com *** -view)
- Using Aron Wall’s SEO toolbar
Get rid of Supplemental Index
To get rid of it, you have to exclude unnecessary contents of you website from indexed by search engine. The unnecessary contents or files would be your image, profile,plugin and etc.
If your website is a blog type, you can read further his article since it was written specifically for that. My website is not a blog type , but basically I get an idea from his robots.txt file
How to exclude dynamic pages from getting indexed.
1 thing that I notice is, half of my website indexed (use this query site:www.yoursite.com *** -view) is dynamic pages such as print function, comment, and RSS. It account roughly half from the total pages indexed. I have dig around again and found few articles about how to exclude them. Hrmm, even robots.txt official website and Wikipedia didn’t mention about it.
This is how i exclude the dynamic pages. The reason why I put both ways is because, I’m not sure which one is correct. According to this 2 sources below, it works as expected.
User-agent: *
Disallow: /*?
Disallow: /?
UPDATE:16-6-2007: I have confirmed the correct way should be the first one.
http://www.google.com/support/webmasters/bin/answer.py?answer=35303&hl=en
http://www.webmasterworld.com/forum93/534.htm
http://forums.digitalpoint.com/showthread.php?t=106
More reading on Google Supplemental Index
There are a lot of discussion about Supplemental Index. I grab few quote and post made by folks from Google. The way the try explaining to webmaster is, Google Supplemental Index is not bad. Your thought??
Post from Adam Lasnik of google
Pages are in the supplemental results because we still wanted to be able to show them to users, but the pages didn’t have enough PageRank to make it into our main index (which is more extensive and updated with greater frequency).
Quote from Mat Cutt of Google (quote’s by other forumer)
having supplemental results these days is not such a bad thing. In your case, I think it just reflects a lack of PageRank/links. We’ve got your home page in the main index, but if you look at your site … you’ll see not a ton of links … So I think your site is fine … it’s just a matter of we have to select a smaller number of documents for the web index. If more people were linking to your site, for example, I’d expect more of your pages to be in the main web index.
(post of Mat Cutt of google in his blog)
That statement still holds. It’s perfectly normal for a website to have pages in our main web index and our supplemental index. If a page doesn’t have enough PageRank to be included in our main web index, the supplemental results represent an additional chance for users to find that page, as opposed to Google not indexing the page
Getting more *quality* backlinks is generally a good way to get more of your pages in the main index.
Notes: after implementing robots.txt to exclude unnecessary folders from getting indexed, please wait for 2-3 weeks before you will see the result. btw, I’m not implementing what I wrote above on this blog, but on my other website