GitHub Engagement to Clay

Automatically ingest GitHub accounts of everyone who interacts with your repo into Clay.

This val tracks anyone who:

Created an issue
Reacted to an issue
Commented on an issue
Reacted to an issue's comments
Starred a repo
Forked a repo

Setup

Click Remix on the top-right to get a copy of it
Set up a Clay workbook with a Webhook column
Copy your Clay workbook's Webhook URL
Set that as CLAY_WEBHOOK_URL in this val's Environment variables on the left sidebar
In config.ts, configure the following settings:

Required:
- GITHUB_REPO - The repository to track (format: "owner/repo")
Optional (for testing or limiting API usage):
- ISSUE_LIMIT - Maximum number of issues to fetch (undefined to fetch all issues)
- MAX_STARGAZERS - Maximum number of stargazers to fetch (undefined to fetch all)
- MAX_FORKS - Maximum number of forks to fetch (undefined to fetch all)
- MAX_COMMENTS_PER_ISSUE - Maximum comments to fetch per issue (undefined to fetch all)
- FETCH_REACTIONS - Set to true to track reactions on issues and comments (Note that this will dramatically slow down the val. Default: false)
Optional (deduplication & data management):
- ENABLE_DEDUPLICATION - Prevents sending duplicate users to Clay by tracking them in a database (default: true)
- MAX_ISSUE_NUMBERS_PER_INTERACTION - Limits the number of issue numbers stored per interaction type to keep Clay payloads manageable (default: 20)
To test it out immediately, navigate to main.ts and click Run.
That's it! The cron will run on your repo every 30 minutes from now on.

Files

config.ts - Configuration settings for the scraper
database.ts - SQLite database for tracking sent users
clay.ts - Send data to Clay
github.ts - Collects engaged users and fetches usernames
main.ts - Cron trigger that orchestrates the scraping and Clay integration

User Deduplication

This scraper automatically tracks which users have been sent to Clay to prevent duplicates. The feature is controlled by the ENABLE_DEDUPLICATION flag in config.ts (enabled by default).

How it works:

GitHub usernames are stored in a SQLite database after being successfully sent to Clay
On subsequent runs, the scraper checks the database and only sends new users
The database persists across all cron runs, so users are only sent once
Each user is tracked by their GitHub username and source repository

Database details:

Users are tracked in the tracked_users table
Stores: username, source repo, and timestamp of first encounter
The database automatically initializes on the first run

To disable deduplication: Set ENABLE_DEDUPLICATION = false in config.ts to send all engaged users on every run (useful for testing or if you want to manage deduplication in Clay instead).

Rate Limiting

On larger repos, you may get rate-limited by GitHub. To mitigate this, Val Town uses a proxied fetch that reroutes requests using a proxy vendor so that requests obtain different IP addresses. It also automatically retries failed requests several times. Note that Note that using std/fetch will be significantly slower than directly calling the Javascript Fetch API due to extra network hops.

sourcebot

GitHub-To-Clay

GitHub Engagement to Clay

Setup

Files

User Deduplication

Rate Limiting