Sitemap Crawler

A webapp that crawls URLs from a sitemap looking for specific strings on pages.

Features

  • Input main sitemap URL (e.g., https://example.com/sitemap.xml)
  • Automatically finds and crawls "posts" sitemap if available
  • Falls back to main sitemap if no posts sitemap found
  • Searches for "extendedrecipe" string on each page
  • Crawls up to 50 URLs from the posts sitemap
  • Shows real-time progress and posts sitemap detection
  • Reports first match found or summary of all crawled pages

Structure

  • backend/ - Hono API server
  • frontend/ - React frontend
  • shared/ - Shared types and utilities

API Endpoints

  • POST /api/crawl - Start crawling a sitemap
  • GET / - Serve the frontend

Usage

  1. Enter a main sitemap URL (e.g., sitemap.xml or sitemap_index.xml)
  2. Click "Start Crawling"
  3. App will look for a posts sitemap within the main sitemap
  4. Watch real-time progress and posts sitemap detection
  5. Get results when complete