GitHub Stargazer Lead Tracker

Find potential customers from GitHub stargazer activity. Monitor any repositories for new stars, scrape historic stargazers on first run, and use AI to qualify leads automatically.

What this does

Historic Scraping: On first run, scrapes all existing stargazers from configured repositories
Ongoing Monitoring: A cron job monitors for new stars on your configured repositories
AI Lead Qualification: An AI agent researches each stargazer and scores them as a lead
Dashboard & Alerts: Qualified leads appear in your dashboard with periodic digest emails

Getting started

Click Remix
Add environment variables:
- OPENAI_API_KEY — for AI lead qualification
- GITHUB_TOKEN — for accessing GitHub API (create one here)
Configure your repositories:
- Edit GITHUB_REPOS array in github.cron.ts
- Add repositories in "owner/repo" format (e.g., "facebook/react")
Configure email settings:
- Edit RECIPIENTS in digest.ts
- Customize PROMPT.txt with your ICP criteria
Open main.ts to view your dashboard

Note: The first run will scrape ALL historic stargazers, which may take time for popular repositories.

How it works

Repository Monitoring (github.cron.ts)

Small Repositories (≤1,000 stars): Scrapes all stargazers in one run
Medium Repositories (1,001-5,000 stars): Scrapes in chunks of 500 stars per run
Large Repositories (5,001-10,000 stars): Gradual scraping with 200 stars per run
Very Large Repositories (>10,000 stars): Skips historic scraping, monitors new stars only
Ongoing Monitoring: After historic scraping completes, monitors for new stars since last check
Tracks repository state and progress to resume scraping across multiple runs

Gradual Scraping Process

Assessment: Checks repository size and determines scraping strategy
Chunked Processing: Large repositories are processed over multiple cron runs
Progress Tracking: Maintains state between runs to resume where it left off
Automatic Completion: Transitions to new-star monitoring when historic scraping finishes

AI Agent (agent.ts)

Researches each GitHub user's profile, repos, and linked sites
Uses web search to learn about their company and role
Scores them against your ICP defined in PROMPT.txt
Returns {name, match, score, leadTypes, reasoning}

Storage (db.ts)

Leads Table: Every lead is stored in SQLite with columns:
- id — auto-incremented
- timestamp — when first seen
- input_data — the GitHub stargazer event(s) that triggered it
- output_data — AI result
Repository State Table: Tracks scraping progress per repository:
- repo_name — repository identifier
- last_checked — timestamp of last check
- historic_scrape_complete — whether initial scrape is done
- historic_scrape_progress — current page for gradual scraping
- total_stars — total stars when scraping started

Dashboard (main.ts)

Qualified leads (match=true) appear at the top
Shows score and lead type tags (customer/hire)
Click a lead to see full details and GitHub activity

Email Digest (digest.ts)

Sends daily emails (1pm UTC) with new qualified leads
Edit RECIPIENTS array to configure who receives them

Customization

Repositories: Edit the GITHUB_REPOS array in github.cron.ts to add/remove repositories
Lead Criteria: Edit PROMPT.txt to define your ideal customer profile
Monitoring Frequency: Adjust the cron schedule in github.cron.ts (default: hourly)
Email Schedule: Adjust the email digest schedule in digest.ts (default: daily at 1pm UTC)

Repository Configuration

Add repositories to monitor in the GITHUB_REPOS array:

export const GITHUB_REPOS = [
  "your-org/your-repo",
  "facebook/react", 
  "microsoft/vscode",
  "octocat/Hello-World", // Good for testing
  // Add more repositories here
];

Performance Notes

Gradual Scraping: Large repositories are processed over multiple cron runs to avoid timeouts
Rate Limits: The system includes delays and batching to handle GitHub API limits
Incremental Updates: After historic scraping, only new stars are processed
Error Handling: Individual lead processing failures won't stop the entire job
Resume Capability: System automatically resumes scraping after interruptions

Important Limits & Scraping Strategy

Small Repos (≤1,000 stars): Complete historic scraping in one run
Medium Repos (1,001-5,000 stars): Scrape 500 stars per cron run
Large Repos (5,001-10,000 stars): Scrape 200 stars per cron run (gradual)
Very Large Repos (>10,000 stars): Skip historic scraping, monitor new stars only
Batch Processing: Leads are processed in batches of 5 to ensure reliability
Progress Tracking: System resumes scraping where it left off across multiple runs

Monitoring Progress

Check scraping progress at /status:

View status of all configured repositories
See progress percentage for ongoing historic scrapes
Monitor completion status and last check times

Testing

Use the test bar in the dashboard to evaluate any GitHub username instantly.

Troubleshooting

"GITHUB_TOKEN is required": Add your GitHub token to environment variables
Timeouts: Use smaller repositories or the system will automatically skip historic scraping
Rate Limits: The system includes delays and batching to handle GitHub API limits
Repository Not Found: Ensure repository names are in "owner/repo" format and are public

marvinkennis

github-leads