Backend API

Hono-based API server for the Sitemap Crawler application.

Endpoints

POST /api/crawl

Crawls a main sitemap, looks for a posts sitemap, and searches for "extendedrecipe" string on pages.

Request Body:

{ "sitemapUrl": "https://example.com/sitemap.xml" }

Response:

{ "found": true, "foundUrl": "https://example.com/page-with-string", "totalCrawled": 25, "errors": [], "crawledUrls": ["url1", "url2", ...], "postsSitemapUrl": "https://example.com/post-sitemap.xml", "postsSitemapFound": true }

Features:

  • First looks for posts sitemap in main sitemap
  • Falls back to main sitemap if no posts sitemap found
  • Limits crawling to maximum 50 URLs
  • Stops immediately when target string is found
  • Returns detailed error information
  • Includes list of all crawled URLs
  • Shows which sitemap was actually crawled

GET /

Serves the frontend application.

Static File Serving

  • /frontend/* - Frontend assets
  • /shared/* - Shared utilities

Implementation Details

  • First attempts to find posts sitemap from main sitemap index
  • Looks for URLs containing "post" and ending with .xml
  • Falls back to crawling main sitemap if no posts sitemap found
  • Uses simple regex parsing for sitemap XML
  • Includes User-Agent header for better compatibility
  • Case-insensitive string matching
  • Comprehensive error handling