Public
Like
scrape-hws
Val Town is a collaborative website to build and scale JavaScript apps.
Deploy APIs, crons, & store data β all from the browser, and deployed in milliseconds.
https://wxw--2280f0403a6411f0a8529e149126039e.web.val.run
A Val Town application that scrapes posts from Reddit's /r/hardwareswap subreddit and stores them in a Supabase database.
- π Automated scraping of /r/hardwareswap posts using Reddit's official OAuth API
- ποΈ Stores posts in Supabase PostgreSQL database
- π« Duplicate detection to avoid storing the same post twice
- π Detailed logging and statistics
- β° Runs on a cron schedule (configurable in Val Town UI)
- π Secure OAuth authentication with automatic token refresh
- Go to Reddit App Preferences
- Click "Create App" or "Create Another App"
- Fill out the form:
- Name: Your app name (e.g., "Val Town Scraper")
- App type: Select "script"
- Description: Optional description
- About URL: Leave blank or add your website
- Redirect URI: Use
http://localhost:8080(required but not used)
- Click "Create app"
- Note down your Client ID (under the app name) and Client Secret
- Create a new project in Supabase
- Go to the SQL Editor in your Supabase dashboard
- Copy and paste the contents of
database-schema.sqland run it - Go to Settings > API to get your project URL and anon key
Set these environment variables in your Val Town settings:
Supabase:
SUPABASE_URL: Your Supabase project URL (e.g.,https://your-project.supabase.co)SUPABASE_ANON_KEY: Your Supabase anon/public key
Reddit API:
REDDIT_CLIENT_ID: Your Reddit app's client IDREDDIT_CLIENT_SECRET: Your Reddit app's client secretREDDIT_USER_AGENT: Optional custom user agent (defaults to "Val Town Reddit Scraper 1.0")
- Set
reddit-scraper.tsas a cron trigger in Val Town - Configure the schedule in the Val Town web UI (recommended: every 30 minutes)
- Example cron expressions:
- Every 30 minutes:
*/30 * * * * - Every hour:
0 * * * * - Every 15 minutes:
*/15 * * * *
- Every 30 minutes:
The posts table contains:
id: Primary key (auto-increment)reddit_id: Unique Reddit post IDreddit_original: Full Reddit post data as JSONtitle: Post titlecreated_at: When the post was created on Redditupdated_at: When the record was last updated in our database
You can manually trigger the scraper by running the reddit-scraper.ts val.
Once configured as a cron job, it will automatically:
- Authenticate with Reddit using OAuth client credentials
- Fetch the latest 25 posts from /r/hardwareswap
- Check for duplicates in the database
- Save new posts to Supabase
- Log statistics about the scraping session
The scraper uses Reddit's Client Credentials OAuth flow:
- Authenticates using your app's client ID and secret
- Receives an access token from Reddit
- Uses the token to make authenticated API requests
- Automatically refreshes the token if it expires
This approach is more reliable than using Reddit's public JSON endpoints and respects Reddit's rate limits.
- Reddit allows 60 requests per minute for OAuth applications
- The scraper fetches 25 posts per run, well within limits
- Recommended cron schedule: every 30 minutes or longer
Check the Val Town logs to monitor:
- Number of new posts scraped
- Number of duplicates skipped
- Any errors during scraping
- Performance metrics
- Missing environment variables: Ensure all required Reddit and Supabase credentials are set
- Database connection errors: Verify your Supabase credentials and that the table exists
- Reddit OAuth errors: Check your Reddit app credentials and ensure the app type is "script"
- Rate limiting: Reddit may temporarily block requests if rate limits are exceeded
- Duplicate key errors: The scraper checks for duplicates, but race conditions might occur
Missing Supabase credentials: Set SUPABASE_URL and SUPABASE_ANON_KEYMissing Reddit credentials: Set REDDIT_CLIENT_ID and REDDIT_CLIENT_SECRETOAuth error: Check your Reddit app credentials and app typeReddit API error: May indicate rate limiting or API issuesError saving post: Check Supabase connection and table schema
You can query your scraped data directly in Supabase:
-- Get recent posts
SELECT title, created_at, reddit_original->>'score' as score
FROM posts
ORDER BY created_at DESC
LIMIT 10;
-- Search posts by title
SELECT title, created_at
FROM posts
WHERE title ILIKE '%gpu%'
ORDER BY created_at DESC;
-- Get posts by author
SELECT title, created_at
FROM posts
WHERE reddit_original->>'author' = 'username'
ORDER BY created_at DESC;
Feel free to modify the scraper to:
- Add more subreddits
- Include additional post metadata
- Add data processing or analysis features
- Integrate with other services