houseSearchSF
Val Town is a collaborative website to build and scale JavaScript apps.
Deploy APIs, crons, & store data – all from the browser, and deployed in milliseconds.
https://shapedlines--9cf06f844d5d11f0a67276b3cceeab13.web.val.run
A TypeScript scraper designed to extract property listing URLs from webpages, specifically optimized for Zillow-style property listings.
scraper.ts
- Main scraper functions and utilitiesscraper-api.ts
- HTTP API endpoint for the scrapertest-scraper.ts
- Test file with sample usageREADME.md
- This documentation
- Extracts listing URLs and addresses from HTML content
- Handles both absolute and relative URLs
- Includes proper error handling and validation
- Provides both programmatic API and HTTP endpoint
- User-Agent spoofing to avoid bot detection
import { scrapeListingUrls, getListingUrls } from "./scraper.ts";
// Get full listing data (URL + address)
const listings = await scrapeListingUrls("https://www.zillow.com/san-francisco-ca/");
console.log(listings);
// Output: [{ url: "https://...", address: "123 Main St..." }, ...]
// Get just the URLs
const urls = await getListingUrls("https://www.zillow.com/san-francisco-ca/");
console.log(urls);
// Output: ["https://www.zillow.com/homedetails/...", ...]
The scraper is also available as an HTTP endpoint:
GET Request (Test with sample data):
curl https://your-val-town-url.web.val.run
POST Request (Scrape a real URL):
curl -X POST https://your-val-town-url.web.val.run \ -H "Content-Type: application/json" \ -d '{"url": "https://www.zillow.com/san-francisco-ca/"}'
The scraper looks for HTML elements matching this pattern:
<a href="[URL]" data-test="property-card-link" [other-attributes]> <address>[ADDRESS]</address> </a>
It uses flexible regex patterns to handle variations in:
- Attribute order
- CSS classes
- Whitespace
- Relative vs absolute URLs
Main function that fetches a webpage and extracts all listing URLs and addresses.
Parses HTML content to extract listing data without making HTTP requests.
Convenience function that returns just the URLs as an array of strings.
Utility function to filter out invalid URLs.
The scraper includes comprehensive error handling for:
- Network failures
- Invalid HTML
- Missing elements
- Malformed URLs
When scraping websites:
- Be respectful of rate limits
- Check robots.txt
- Consider the website's terms of service
- Add delays between requests for large-scale scraping
- Use appropriate User-Agent headers
{ "success": true, "count": 2, "listings": [ { "url": "https://www.zillow.com/homedetails/1020-Pierce-St-A-San-Francisco-CA-94115/2113064552_zpid/", "address": "1020 Pierce St #A, San Francisco, CA 94115" }, { "url": "https://www.zillow.com/homedetails/456-Oak-St-San-Francisco-CA-94102/123456789_zpid/", "address": "456 Oak St, San Francisco, CA 94102" } ] }
HTTP
- calculateTransitAPI.tsxshapedlines--9c…13.web.val.run
- scrapeCraigslistAPI.tsxshapedlines--bf…13.web.val.run
- scrapeZillowAPI.tsxshapedlines--2a…13.web.val.run
- scrapeZillowBrowserless.tsxshapedlines--31…84.web.val.run
- scrapeZillowForRentSF.tsxshapedlines--46…d9.web.val.run
- scrapedHouses.tsxshapedlines--d3…13.web.val.run
- scraper-api.tsshapedlines--80…d9.web.val.run
Email