cross domain sitemap via robots.txt

Advertisement

The collaboration between Google, Yahoo and Microsoft has agreed on the new sitemap protocol called cross domain sitemap, where you can hosted multiple sitemaps for multiple domains on a single domain. I love seeing them collaborated on something that beneficial to webmasters. This is very helpful to everyone, especially those who has many websites to be managed.

Robots.txt and sitemap is essential part of any blog or websites nowadays. It provide maximum exposure and website-friendly to the search engine robot crawler by telling them which one you should index and which one shouldn’t . I’m not going it in detail, but you can googling or wikipeding to find more about these 2 files.

Ok, back to the main topic, how does the new protocol help webmaster? The reason is quite obvious. All in 1!! You control everything from 1 place, save your precious time and energy.

How it works?
The sample below from sitemap.org should provide you better idea about it.

Let say you have 3 websites and each of them, has it’s own sitemap hosted on it’s own domain

www.host1.com with Sitemap file sitemap-host1.xml
www.host2.com with Sitemap file sitemap-host2.xml
www.host3.com with Sitemap file sitemap-host3.xml

Now, with the new protocol, you can host all 3 sitemaps on single host, e:g sitemaphost.com for a better control. This is how it looks like in sitemaphost.com’s robots.txt file

Sitemap: http://www.sitemaphost.com/sitemap-host1.xml
Sitemap: http://www.sitemaphost.com/sitemap-host2.xml
Sitemap: http://www.sitemaphost.com/sitemap-host3.xml

Finally, update host1.com’s robots.txt at http://www.host1.com/robots.txt to have this line below

Sitemap: http://www.sitemaphost.com/sitemap-host1.xml

Why you must update this part? By modifying the robots.txt file on www.host1.com and having it point to the Sitemap on www.sitemaphost.com, you have implicitly proven that you own www.host1.com. In other words, whoever controls the robots.txt file on www.host1.com trusts the Sitemap at http://www.sitemaphost.com/sitemap-host1.xml to contain URLs for www.host1.com. The same process can be repeated for the other two hosts.

Don’t forget to validate your robots.txt and sitemap

How to ping search engine once you have updated your website’s sitemap?
Each search engines has different method. Check out information below

Microsoft – http://webmaster.live.com/ping.aspx?siteMap=[Your sitemap URL]

Google – Follow this guide of how to resubmit your sitemap to Google
www.google.com/webmasters/tools/ping?sitemap=sitemap_url

Yahoo – http://search.yahooapis.com/SiteExplorerService/V1/ping?sitemap=[your website URL] . This service applied only if you have your website submitted to Yahoo Siteexplorer

update 5-May-2008 : correcting the missing ‘sitemap‘ on the robot.txt file in sitemaphost

Related Posts


Advertisement

16 Responses to “cross domain sitemap via robots.txt”

  1. KNizam says:

    this sounds like jargon to me. heheh tak paham pokcik eh

  2. azwanhadzree says:

    at the moment, i prefer to maintain everything separately. hmm talking about sitemap and robot.txt, i haven’t got any robot.txt on any of my site yet.

  3. seo lad says:

    They should just agree on a common platform. That would make things easier.

  4. MK says:

    uhu… SEO… those three letters bothers me, having to re-check you every move to make sure it follows ‘the right’ SEO.
    nyway, since this is more on the technical site, i guess i’ll have to do it.
    thanks for the info.

  5. nUUr says:

    huhu..xpham jgk..erm nape kekadang xleh komen kat cni yek

  6. Josef says:

    You’re right, I’ve tested several sites and the robot.txt and xml sitemap can make a big difference in how deep and often bots crawl.
    It’s so easy to put the robots.txt files in the folder where they go that I don’t see much advantage in putting them all on one domain.

  7. zaki says:

    Let me know if you’re unable to do so. I check my spam folder everyday

  8. zaki says:

    The advantage is only seen, if you have many websites, where you have to manually ftp or login into cpanel one by one

  9. Raymond Chua says:

    Too deep for a newbie like me. :)

  10. kreauter says:

    great information thank you! so now i am able to add to live google and yahoo. but for ask.com you need the reference in robots.txt. when i had in my file the following: http://www.rokdd.de with Sitemap file alle-seiten.xml a server error occurs.. any hints? thank you :)

  11. zaki says:

    What is the error message?

    Btw, the declaration of you XML in your robots.txt is incorrect. It should be this way

    sitemap: http://rokdd.de/alle-seiten.xml < - correct
    rokdd.de with Sitemap file http://rokdd.de/alle-seiten.xml <- wrong

    And you seems disallow any of you site to get indexed as well

    User-agent: *
    Disallow: /

  12. kreauter says:

    okay wow fast response :)

    the first syntax i already know but i want to use the robots.txt for more as one domain. i guess that the following syntax should be correct:

    rokdd.de with Sitemap file alle-seiten.xml

    however google said that is not understood that syntax :(

    thanks for your help

  13. Took me a while to successfully create a cross-domain sitemap and I had to ask for help for it, but it’s very interesting to see this working so fine. Anybody else managed to do this by himself?:)

Trackbacks/Pingbacks

  1. cross domain sitemap via robots.txt | videositemap.com
  2. cross domain sitemap via robots.txt | Domains Yahoo

Leave a Reply