Public
Like
scrape-hws
Val Town is a collaborative website to build and scale JavaScript apps.
Deploy APIs, crons, & store data β all from the browser, and deployed in milliseconds.
Viewing readonly version of main branch: v7View latest version
A Val Town application that scrapes posts from Reddit's /r/hardwareswap subreddit and stores them in a Supabase database.
- π Automated scraping of /r/hardwareswap posts
- ποΈ Stores posts in Supabase PostgreSQL database
- π« Duplicate detection to avoid storing the same post twice
- π Detailed logging and statistics
- β° Runs on a cron schedule (configurable in Val Town UI)
- Create a new project in Supabase
- Go to the SQL Editor in your Supabase dashboard
- Copy and paste the contents of
database-schema.sqland run it - Go to Settings > API to get your project URL and anon key
Set these environment variables in your Val Town settings:
SUPABASE_URL: Your Supabase project URL (e.g.,https://your-project.supabase.co)SUPABASE_ANON_KEY: Your Supabase anon/public key
- Set
reddit-scraper.tsas a cron trigger in Val Town - Configure the schedule in the Val Town web UI (recommended: every 30 minutes)
- Example cron expressions:
- Every 30 minutes:
*/30 * * * * - Every hour:
0 * * * * - Every 15 minutes:
*/15 * * * *
- Every 30 minutes:
The posts table contains:
id: Primary key (auto-increment)reddit_id: Unique Reddit post IDreddit_original: Full Reddit post data as JSONtitle: Post titlecreated_at: When the post was created on Redditupdated_at: When the record was last updated in our database
You can manually trigger the scraper by running the reddit-scraper.ts val.
Once configured as a cron job, it will automatically:
- Fetch the latest 25 posts from /r/hardwareswap
- Check for duplicates in the database
- Save new posts to Supabase
- Log statistics about the scraping session
If you prefer to use Reddit's official API instead of the JSON endpoint:
- Create a Reddit app at https://www.reddit.com/prefs/apps
- Add these environment variables:
REDDIT_CLIENT_IDREDDIT_CLIENT_SECRETREDDIT_USER_AGENT
- Modify the scraper to use the official Reddit API
Check the Val Town logs to monitor:
- Number of new posts scraped
- Number of duplicates skipped
- Any errors during scraping
- Performance metrics
- Missing environment variables: Ensure
SUPABASE_URLandSUPABASE_ANON_KEYare set - Database connection errors: Verify your Supabase credentials and that the table exists
- Reddit rate limiting: The scraper uses a 25-post limit and respectful user agent
- Duplicate key errors: The scraper checks for duplicates, but race conditions might occur
Missing Supabase credentials: Set the required environment variablesReddit API error: Check if Reddit is accessible and not rate limitingError saving post: Check Supabase connection and table schema
You can query your scraped data directly in Supabase:
-- Get recent posts
SELECT title, created_at, reddit_original->>'score' as score
FROM posts
ORDER BY created_at DESC
LIMIT 10;
-- Search posts by title
SELECT title, created_at
FROM posts
WHERE title ILIKE '%gpu%'
ORDER BY created_at DESC;
-- Get posts by author
SELECT title, created_at
FROM posts
WHERE reddit_original->>'author' = 'username'
ORDER BY created_at DESC;
Feel free to modify the scraper to:
- Add more subreddits
- Include additional post metadata
- Add data processing or analysis features
- Integrate with other services