Hono-based API server for the Sitemap Crawler application.
Crawls a main sitemap, looks for a posts sitemap, and searches for "extendedrecipe" string on pages.
Request Body:
{
"sitemapUrl": "https://example.com/sitemap.xml"
}
Response:
{
"found": true,
"foundUrl": "https://example.com/page-with-string",
"totalCrawled": 25,
"errors": [],
"crawledUrls": ["url1", "url2", ...],
"postsSitemapUrl": "https://example.com/post-sitemap.xml",
"postsSitemapFound": true
}
Features:
- First looks for posts sitemap in main sitemap
- Falls back to main sitemap if no posts sitemap found
- Limits crawling to maximum 50 URLs
- Stops immediately when target string is found
- Returns detailed error information
- Includes list of all crawled URLs
- Shows which sitemap was actually crawled
Serves the frontend application.
/frontend/*
- Frontend assets
/shared/*
- Shared utilities
- First attempts to find posts sitemap from main sitemap index
- Looks for URLs containing "post" and ending with .xml
- Falls back to crawling main sitemap if no posts sitemap found
- Uses simple regex parsing for sitemap XML
- Includes User-Agent header for better compatibility
- Case-insensitive string matching
- Comprehensive error handling