The collaboration between Google, Yahoo and Microsoft has agreed on the new sitemap protocol called cross domain sitemap, where you can hosted multiple sitemaps for multiple domains on a single domain. I love seeing them collaborated on something that beneficial to webmasters. This is very helpful to everyone, especially those who has many websites to be managed.
Robots.txt and sitemap is essential part of any blog or websites nowadays. It provide maximum exposure and website-friendly to the search engine robot crawler by telling them which one you should index and which one shouldn’t . I’m not going it in detail, but you can googling or wikipeding to find more about these 2 files.
Ok, back to the main topic, how does the new protocol help webmaster? The reason is quite obvious. All in 1!! You control everything from 1 place, save your precious time and energy.
How it works?
The sample below from sitemap.org should provide you better idea about it.
Let say you have 3 websites and each of them, has it’s own sitemap hosted on it’s own domain
www.host1.com with Sitemap file sitemap-host1.xml
www.host2.com with Sitemap file sitemap-host2.xml
www.host3.com with Sitemap file sitemap-host3.xml
Now, with the new protocol, you can host all 3 sitemaps on single host, e:g sitemaphost.com for a better control. This is how it looks like in sitemaphost.com’s robots.txt file
Sitemap: http://www.sitemaphost.com/sitemap-host1.xml
Sitemap: http://www.sitemaphost.com/sitemap-host2.xml
Sitemap: http://www.sitemaphost.com/sitemap-host3.xml
Finally, update host1.com’s robots.txt at http://www.host1.com/robots.txt to have this line below
Sitemap: http://www.sitemaphost.com/sitemap-host1.xml
Why you must update this part? By modifying the robots.txt file on www.host1.com and having it point to the Sitemap on www.sitemaphost.com, you have implicitly proven that you own www.host1.com. In other words, whoever controls the robots.txt file on www.host1.com trusts the Sitemap at http://www.sitemaphost.com/sitemap-host1.xml to contain URLs for www.host1.com. The same process can be repeated for the other two hosts.
Don’t forget to validate your robots.txt and sitemap
How to ping search engine once you have updated your website’s sitemap?
Each search engines has different method. Check out information below
Microsoft - http://webmaster.live.com/ping.aspx?siteMap=[Your sitemap URL]
Google - Follow this guide of how to resubmit your sitemap to Google
www.google.com/webmasters/tools/ping?sitemap=sitemap_url
Yahoo - http://search.yahooapis.com/SiteExplorerService/V1/ping?sitemap=[your website URL] . This service applied only if you have your website submitted to Yahoo Siteexplorer
update 5-May-2008 : correcting the missing ‘sitemap‘ on the robot.txt file in sitemaphost
Related Posts
- Shopping for domain name
- Exposed your website with Google Webmaster
- Traffic from MSN live
- Remove your listing from Google Suplemental Index
- My 1 year old domain name
- My IMHO best promotion/traffic tool
- weekly notes
- Top 9 validators for blogger
- My website ranked 8th on google
- Win ipod for naming hockey website
| If you like this post then please consider subscribing to my full feed RSS. You can also subscribe by Email and have new posts sent directly to your inbox. |



this sounds like jargon to me. heheh tak paham pokcik eh
at the moment, i prefer to maintain everything separately. hmm talking about sitemap and robot.txt, i haven’t got any robot.txt on any of my site yet.
[...] Original post by Blogjer is about bloggers [...]
They should just agree on a common platform. That would make things easier.
uhu… SEO… those three letters bothers me, having to re-check you every move to make sure it follows ‘the right’ SEO.
nyway, since this is more on the technical site, i guess i’ll have to do it.
thanks for the info.
huhu..xpham jgk..erm nape kekadang xleh komen kat cni yek
Let me know if you’re unable to do so. I check my spam folder everyday
[...] Read the rest of this great post here [...]
You’re right, I’ve tested several sites and the robot.txt and xml sitemap can make a big difference in how deep and often bots crawl.
It’s so easy to put the robots.txt files in the folder where they go that I don’t see much advantage in putting them all on one domain.
The advantage is only seen, if you have many websites, where you have to manually ftp or login into cpanel one by one
good info
Too deep for a newbie like me.
great information thank you! so now i am able to add to live google and yahoo. but for ask.com you need the reference in robots.txt. when i had in my file the following: http://www.rokdd.de with Sitemap file alle-seiten.xml a server error occurs.. any hints? thank you
What is the error message?
Btw, the declaration of you XML in your robots.txt is incorrect. It should be this way
sitemap: http://rokdd.de/alle-seiten.xml < - correct
rokdd.de with Sitemap file http://rokdd.de/alle-seiten.xml <- wrong
And you seems disallow any of you site to get indexed as well
User-agent: *
Disallow: /
okay wow fast response
the first syntax i already know but i want to use the robots.txt for more as one domain. i guess that the following syntax should be correct:
rokdd.de with Sitemap file alle-seiten.xml
however google said that is not understood that syntax
thanks for your help
Took me a while to successfully create a cross-domain sitemap and I had to ask for help for it, but it’s very interesting to see this working so fine. Anybody else managed to do this by himself?:)