By now, you've probably seen the news splashed all over Techmeme about the decision by Google, Microsoft and Yahoo! to support SiteMaps. In case you haven't, you can check out these articles:
- Microsoft's Live Search Blog
- Yahoo!'s Search Blog
- Search Engine Watch
- Official Google Webmaster Central Blog
- Google Blogoscoped
The news of the agreement is everywhere. What you won't find easily is a layperson-friendly explanation of what SiteMaps are. That is, unless you happen to be a Global Nerdy reader.
What is a SiteMap, Anyway?
A SiteMap is an XML file that contains information about one or more URLs on a site. The information on the URLs is for the benefit of search engines and other applications that crawl or index the site.
The minimum information that a SiteMap must provide is a list of URLs. Normally, a search engine would have to explore all the links on your site to build a map of it; a SiteMap listing all the public URLs on your site would ensure that your site got indexed more quickly and that no URL got missed.
If you're so inclined, you can provide the following additional information for any URL in your SiteMap:
- “Last Modified” date and time. This is the date and time on which the page corresponding to the URL was last updated.
- Change Frequency. This describes how likely the page corresponding to the URL is to change. This value can be one of:
- always (Used to describe pages that are different every time they are accessed)
- hourly
- daily
- weekly
- monthly
- yearly
- never (Used to describe pages that are archived)
Note that these are hints to the search engine and not commands — a search engine may be programmed to very occasionally crawl a page whose change frequency is declared as “never”.
- Priority. The priority of the page corresponding to the URL relative to other pages in the site, on a scale of 0.0 (lowest) to 1.0 (highest), with the default being 0.5. Note that this is a relative scale; giving all your site's URL a priority rating of 1.0 simply tells the search engine that no page on your site is more important than any other.
Keep in mind that SiteMaps are simply a way of giving search engines a listing of URLs and few bits of information about them. They're not directives to be followed by the search engines or their spiders. What each search engine does with the information in your site's SiteMap will vary from engine to engine.
That's really all there is to SiteMaps if you choose to ignore the XML gobbledeegook (and if you really want that gobbledeegook, see the SiteMaps Protocol page). Like the last agreement between the major search engine players — the agreement to support the “nofollow” attribute for links — the technical component is incredibly simple; the notable thing is the cooperation between the major players.