Public
Like
scrape-hws
Val Town is a collaborative website to build and scale JavaScript apps.
Deploy APIs, crons, & store data β all from the browser, and deployed in milliseconds.
https://wxw--2280f0403a6411f0a8529e149126039e.web.val.run
A Val Town application that scrapes posts from Reddit's /r/hardwareswap subreddit and stores them in a Supabase database.
- π Automated scraping of /r/hardwareswap posts using Reddit's official OAuth API
- ποΈ Stores posts in Supabase PostgreSQL database
- π« Duplicate detection to avoid storing the same post twice
- π Detailed logging and statistics
- β° Runs on a cron schedule (configurable in Val Town UI)
- π Secure OAuth authentication with automatic token refresh
- Go to Reddit App Preferences
- Click "Create App" or "Create Another App"
- Fill out the form:
- Name: Your app name (e.g., "Val Town Scraper")
- App type: Select "script"
- Description: Optional description
- About URL: Leave blank or add your website
- Redirect URI: Use
http://localhost:8080
(required but not used)
- Click "Create app"
- Note down your Client ID (under the app name) and Client Secret
- Create a new project in Supabase
- Go to the SQL Editor in your Supabase dashboard
- Copy and paste the contents of
database-schema.sql
and run it - Go to Settings > API to get your project URL and anon key
Set these environment variables in your Val Town settings:
Supabase:
SUPABASE_URL
: Your Supabase project URL (e.g.,https://your-project.supabase.co
)SUPABASE_ANON_KEY
: Your Supabase anon/public key
Reddit API:
REDDIT_CLIENT_ID
: Your Reddit app's client IDREDDIT_CLIENT_SECRET
: Your Reddit app's client secretREDDIT_USER_AGENT
: Optional custom user agent (defaults to "Val Town Reddit Scraper 1.0")
- Set
reddit-scraper.ts
as a cron trigger in Val Town - Configure the schedule in the Val Town web UI (recommended: every 30 minutes)
- Example cron expressions:
- Every 30 minutes:
*/30 * * * *
- Every hour:
0 * * * *
- Every 15 minutes:
*/15 * * * *
- Every 30 minutes:
The posts
table contains:
id
: Primary key (auto-increment)reddit_id
: Unique Reddit post IDreddit_original
: Full Reddit post data as JSONtitle
: Post titlecreated_at
: When the post was created on Redditupdated_at
: When the record was last updated in our database
You can manually trigger the scraper by running the reddit-scraper.ts
val.
Once configured as a cron job, it will automatically:
- Authenticate with Reddit using OAuth client credentials
- Fetch the latest 25 posts from /r/hardwareswap
- Check for duplicates in the database
- Save new posts to Supabase
- Log statistics about the scraping session
The scraper uses Reddit's Client Credentials OAuth flow:
- Authenticates using your app's client ID and secret
- Receives an access token from Reddit
- Uses the token to make authenticated API requests
- Automatically refreshes the token if it expires
This approach is more reliable than using Reddit's public JSON endpoints and respects Reddit's rate limits.
- Reddit allows 60 requests per minute for OAuth applications
- The scraper fetches 25 posts per run, well within limits
- Recommended cron schedule: every 30 minutes or longer
Check the Val Town logs to monitor:
- Number of new posts scraped
- Number of duplicates skipped
- Any errors during scraping
- Performance metrics
- Missing environment variables: Ensure all required Reddit and Supabase credentials are set
- Database connection errors: Verify your Supabase credentials and that the table exists
- Reddit OAuth errors: Check your Reddit app credentials and ensure the app type is "script"
- Rate limiting: Reddit may temporarily block requests if rate limits are exceeded
- Duplicate key errors: The scraper checks for duplicates, but race conditions might occur
Missing Supabase credentials
: Set SUPABASE_URL and SUPABASE_ANON_KEYMissing Reddit credentials
: Set REDDIT_CLIENT_ID and REDDIT_CLIENT_SECRETOAuth error
: Check your Reddit app credentials and app typeReddit API error
: May indicate rate limiting or API issuesError saving post
: Check Supabase connection and table schema
You can query your scraped data directly in Supabase:
-- Get recent posts SELECT title, created_at, reddit_original->>'score' as score FROM posts ORDER BY created_at DESC LIMIT 10; -- Search posts by title SELECT title, created_at FROM posts WHERE title ILIKE '%gpu%' ORDER BY created_at DESC; -- Get posts by author SELECT title, created_at FROM posts WHERE reddit_original->>'author' = 'username' ORDER BY created_at DESC;
Feel free to modify the scraper to:
- Add more subreddits
- Include additional post metadata
- Add data processing or analysis features
- Integrate with other services