Automatically ingest GitHub accounts of everyone who interacts with your repo into Clay.
This val tracks anyone who:
- Created an issue
- Reacted to an issue
- Commented on an issue
- Reacted to an issue's comments
- Starred a repo
- Forked a repo
-
Click
Remix
on the top-right to get a copy of it -
Set up a Clay workbook with a
Webhook
column -
Copy your Clay workbook's
Webhook
URL -
Set that as
CLAY_WEBHOOK_URL
in this val'sEnvironment variables
on the left sidebar -
In config.ts, configure the following settings:
Required:
GITHUB_REPO
- The repository to track (format:"owner/repo"
)
Optional (for testing or limiting API usage):
ISSUE_LIMIT
- Maximum number of issues to fetch (undefined
to fetch all issues)MAX_STARGAZERS
- Maximum number of stargazers to fetch (undefined
to fetch all)MAX_FORKS
- Maximum number of forks to fetch (undefined
to fetch all)MAX_COMMENTS_PER_ISSUE
- Maximum comments to fetch per issue (undefined
to fetch all)FETCH_REACTIONS
- Set totrue
to track reactions on issues and comments (Note that this will dramatically slow down the val. Default:false
)
Optional (deduplication & data management):
ENABLE_DEDUPLICATION
- Prevents sending duplicate users to Clay by tracking them in a database (default:true
)MAX_ISSUE_NUMBERS_PER_INTERACTION
- Limits the number of issue numbers stored per interaction type to keep Clay payloads manageable (default:20
)
-
To test it out immediately, navigate to main.ts and click
Run
. -
That's it! The cron will run on your repo every 30 minutes from now on.
- config.ts - Configuration settings for the scraper
- database.ts - SQLite database for tracking sent users
- clay.ts - Send data to Clay
- github.ts - Collects engaged users and fetches usernames
- main.ts - Cron trigger that orchestrates the scraping and Clay integration
This scraper automatically tracks which users have been sent to Clay to prevent
duplicates. The feature is controlled by the ENABLE_DEDUPLICATION
flag in
config.ts (enabled by default).
How it works:
- GitHub usernames are stored in a SQLite database after being successfully sent to Clay
- On subsequent runs, the scraper checks the database and only sends new users
- The database persists across all cron runs, so users are only sent once
- Each user is tracked by their GitHub username and source repository
Database details:
- Users are tracked in the
tracked_users
table - Stores: username, source repo, and timestamp of first encounter
- The database automatically initializes on the first run
To disable deduplication: Set ENABLE_DEDUPLICATION = false
in
config.ts to send all engaged users on every run (useful for
testing or if you want to manage deduplication in Clay instead).
On larger repos, you may get rate-limited by GitHub. To mitigate this, Val Town
uses a proxied fetch that reroutes requests
using a proxy vendor so that requests obtain different IP addresses. It also
automatically retries failed requests several times. Note that Note that using
std/fetch
will be significantly slower than directly calling the Javascript
Fetch API due to extra network hops.