Sitemaps are a collection of all the URLs on a page just like WAHF's adoption page is a list of all dogs available for adoption.
What is a sitemap?
A sitemap is essentially a list of all of the URLs found on a website. It acts as a map for search engines to tell them which content is available on the site and gives them a clear path to access it.
Sitemaps also help search engines to index pages faster and are particularly valuable for websites who have a large number of pages with a very deep architecture, as well as sites that frequently add or change content.
A sitemap can contain a maximum of 50,000 URLs, and when this limit is exceeded, it is necessary to split URLs across multiple sitemaps. These sitemaps are then combined into one large sitemap index, which is essentially a sitemap for sitemaps.
A sitemap index is particularly useful for large websites, with multiple sections and categories which can be broken down into smaller, logical sitemaps.
What you should include in sitemap
It's important to only include SEO relevant pages in a sitemap, this is because they are the pages that you would like to be crawled, which may not be every page on a website. If the website is a large website, including only relevant pages can also help with crawl budget.
Including only SEO pages, helps search engines crawl more efficiently and intelligently and will ensure better indexation of your key pages.
Examples of pages you would not want to include in a sitemap include; non-canonical pages, duplicated pages, paginated pages, parameter pages and site search result pages.
The elements of a sitemap
There are a couple of elements which make up a sitemap. Some of these are compulsory, while some are just optional.
The first compulsory tag is the loc tag, which should include the canonical version of a URL. It should also reflect the protocol of the site, for example http or https and www. or non www.
last mod tag
This tag is optional, but is recommended as it informs search engines when a page was last modified. Most search engines use this data to discover when a page changed and if it is necessary for them to re-crawl it. However, it's important to only use the tag when changes have been made, rather than using it to trick search engines.
While not used widely by search engines anymore, the change frequency tag was used to hint to search engines how frequently a page will change.
Another optional tag, the priority tag is used as a hint to how important a page is compared to other URLs on a site. It runs on a scale of 0.0 to 1.0, with the larger the number the higher importance of the page. This tag is widely ignored by search engines.