Add discover valid sitemaps utility (port from JS)

Summary

Port the discoverValidSitemaps() utility from Crawlee JS to Python.

JS source: packages/utils/src/internals/sitemap.ts — #3392

How it works in JS

async function* discoverValidSitemaps(
    urls: string[],
    options?: { proxyUrl?: string; httpClient?: BaseHttpClient }
): AsyncIterable<string>

Group input URLs by hostname
For each domain, discover sitemaps from (in order):
- Sitemap: entries in robots.txt
- Input URLs that match /sitemap\.(xml|txt)(\.gz)?$/i
- HEAD-request probing of /sitemap.xml, /sitemap.txt, /sitemap_index.xml (fallback)
Deduplicate and process domains concurrently

Returns an async iterable yielding sitemap URLs as discovered.

What Python already has

Sitemap.try_common_names() — probes /sitemap.xml and /sitemap.txt for a single URL (missing /sitemap_index.xml)
RobotsTxtFile.find() + get_sitemaps() — fetches and extracts Sitemap: entries from robots.txt

What's missing: the orchestrating function that combines these steps, groups by hostname, validates via HEAD requests, detects direct sitemap URLs from input, and processes domains concurrently.