I have heard several people say that they think your sitemap controls which pages from your website are on Google. In reality, XML sitemaps have little to do with which pages Google chooses to index. It is very common for Google to index pages that are not included in an XML sitemap.
XML sitmaps can help Google discover all the URLs on your site. Googlebot will usually crawl all the pages submitted in an XML sitemap. However sitemaps are not the only way that Googlebot discovers pages. The most common way for Googlebot to discover pages is by following links from other pages on your site. Even if a URL is not in a sitemap, Google can discover it through the pages that link to it.
Once Googlebot crawls a page, Google must decide whether or not to index the page. XML sitemaps have little to do with that decision. Google bases its decision of whether to index a page on:
- Whether or not the page appears to be legitimate opposed to spam or nonsense
- How much reputation the page has
- Whether or not the content is original
Sitemaps don't affect these much. Including a page in a sitemap doesn't give any reputation to the page. Only linking to the page from other pages and sites can do that.
Sitemaps can slightly impact what gets indexed when Googlebot finds duplicate content. If Google finds two URLs on your site with the same content it will prefer to index the one included in the XML sitemap.
A corellary to this SEO myth is that removing a page from the XML sitemap won't deindex the page from Google. If you want to remove a page from Google you need to
noindex the page or
disallow it in
Similarly, changing the URLs on your site involves more than XML sitemaps. It isn't sufficient to regenerate the sitemap with the new URLs. Changing URLs and preserving URLs requires implementing redirects from the old URLs to the new ones.
This article was written as part of a series about SEO myths.