A TypeScript scraper designed to extract property listing URLs from webpages, specifically optimized for Zillow-style property listings.
scraper.ts
- Main scraper functions and utilitiesscraper-api.ts
- HTTP API endpoint for the scrapertest-scraper.ts
- Test file with sample usageREADME.md
- This documentationimport { scrapeListingUrls, getListingUrls } from "./scraper.ts";
// Get full listing data (URL + address)
const listings = await scrapeListingUrls("https://www.zillow.com/san-francisco-ca/");
console.log(listings);
// Output: [{ url: "https://...", address: "123 Main St..." }, ...]
// Get just the URLs
const urls = await getListingUrls("https://www.zillow.com/san-francisco-ca/");
console.log(urls);
// Output: ["https://www.zillow.com/homedetails/...", ...]
The scraper is also available as an HTTP endpoint:
GET Request (Test with sample data):
curl https://your-val-town-url.web.val.run
POST Request (Scrape a real URL):
curl -X POST https://your-val-town-url.web.val.run \ -H "Content-Type: application/json" \ -d '{"url": "https://www.zillow.com/san-francisco-ca/"}'
The scraper looks for HTML elements matching this pattern:
<a href="[URL]" data-test="property-card-link" [other-attributes]> <address>[ADDRESS]</address> </a>
It uses flexible regex patterns to handle variations in:
Main function that fetches a webpage and extracts all listing URLs and addresses.
Parses HTML content to extract listing data without making HTTP requests.
Convenience function that returns just the URLs as an array of strings.
Utility function to filter out invalid URLs.
The scraper includes comprehensive error handling for:
When scraping websites:
{ "success": true, "count": 2, "listings": [ { "url": "https://www.zillow.com/homedetails/1020-Pierce-St-A-San-Francisco-CA-94115/2113064552_zpid/", "address": "1020 Pierce St #A, San Francisco, CA 94115" }, { "url": "https://www.zillow.com/homedetails/456-Oak-St-San-Francisco-CA-94102/123456789_zpid/", "address": "456 Oak St, San Francisco, CA 94102" } ] }