GitHub Stargazer Lead Tracker

Find potential customers from GitHub stargazer activity. Monitor any repositories for new stars, scrape historic stargazers on first run, and use AI to qualify leads automatically.

image.png

What this does

  1. Historic Scraping: On first run, scrapes all existing stargazers from configured repositories
  2. Ongoing Monitoring: A cron job monitors for new stars on your configured repositories
  3. AI Lead Qualification: An AI agent researches each stargazer and scores them as a lead
  4. Dashboard & Alerts: Qualified leads appear in your dashboard with periodic digest emails

Getting started

  1. Click Remix
  2. Add environment variables:
    • OPENAI_API_KEY — for AI lead qualification
    • GITHUB_TOKEN — for accessing GitHub API (create one here)
  3. Configure your repositories:
    • Edit GITHUB_REPOS array in github.cron.ts
    • Add repositories in "owner/repo" format (e.g., "facebook/react")
  4. Configure email settings:
  5. Open main.ts to view your dashboard

Note: The first run will scrape ALL historic stargazers, which may take time for popular repositories.

How it works

Repository Monitoring (github.cron.ts)

  • Small Repositories (≤1,000 stars): Scrapes all stargazers in one run
  • Medium Repositories (1,001-5,000 stars): Scrapes in chunks of 500 stars per run
  • Large Repositories (5,001-10,000 stars): Gradual scraping with 200 stars per run
  • Very Large Repositories (>10,000 stars): Skips historic scraping, monitors new stars only
  • Ongoing Monitoring: After historic scraping completes, monitors for new stars since last check
  • Tracks repository state and progress to resume scraping across multiple runs

Gradual Scraping Process

  1. Assessment: Checks repository size and determines scraping strategy
  2. Chunked Processing: Large repositories are processed over multiple cron runs
  3. Progress Tracking: Maintains state between runs to resume where it left off
  4. Automatic Completion: Transitions to new-star monitoring when historic scraping finishes

AI Agent (agent.ts)

  • Researches each GitHub user's profile, repos, and linked sites
  • Uses web search to learn about their company and role
  • Scores them against your ICP defined in PROMPT.txt
  • Returns {name, match, score, leadTypes, reasoning}

Storage (db.ts)

  • Leads Table: Every lead is stored in SQLite with columns:
    • id — auto-incremented
    • timestamp — when first seen
    • input_data — the GitHub stargazer event(s) that triggered it
    • output_data — AI result
  • Repository State Table: Tracks scraping progress per repository:
    • repo_name — repository identifier
    • last_checked — timestamp of last check
    • historic_scrape_complete — whether initial scrape is done
    • historic_scrape_progress — current page for gradual scraping
    • total_stars — total stars when scraping started

Dashboard (main.ts)

  • Qualified leads (match=true) appear at the top
  • Shows score and lead type tags (customer/hire)
  • Click a lead to see full details and GitHub activity

Email Digest (digest.ts)

  • Sends daily emails (1pm UTC) with new qualified leads
  • Edit RECIPIENTS array to configure who receives them

Customization

  • Repositories: Edit the GITHUB_REPOS array in github.cron.ts to add/remove repositories
  • Lead Criteria: Edit PROMPT.txt to define your ideal customer profile
  • Monitoring Frequency: Adjust the cron schedule in github.cron.ts (default: hourly)
  • Email Schedule: Adjust the email digest schedule in digest.ts (default: daily at 1pm UTC)

Repository Configuration

Add repositories to monitor in the GITHUB_REPOS array:

export const GITHUB_REPOS = [ "your-org/your-repo", "facebook/react", "microsoft/vscode", "octocat/Hello-World", // Good for testing // Add more repositories here ];

Performance Notes

  • Gradual Scraping: Large repositories are processed over multiple cron runs to avoid timeouts
  • Rate Limits: The system includes delays and batching to handle GitHub API limits
  • Incremental Updates: After historic scraping, only new stars are processed
  • Error Handling: Individual lead processing failures won't stop the entire job
  • Resume Capability: System automatically resumes scraping after interruptions

Important Limits & Scraping Strategy

  • Small Repos (≤1,000 stars): Complete historic scraping in one run
  • Medium Repos (1,001-5,000 stars): Scrape 500 stars per cron run
  • Large Repos (5,001-10,000 stars): Scrape 200 stars per cron run (gradual)
  • Very Large Repos (>10,000 stars): Skip historic scraping, monitor new stars only
  • Batch Processing: Leads are processed in batches of 5 to ensure reliability
  • Progress Tracking: System resumes scraping where it left off across multiple runs

Monitoring Progress

Check scraping progress at /status:

  • View status of all configured repositories
  • See progress percentage for ongoing historic scrapes
  • Monitor completion status and last check times

Testing

Use the test bar in the dashboard to evaluate any GitHub username instantly.

Troubleshooting

  • "GITHUB_TOKEN is required": Add your GitHub token to environment variables
  • Timeouts: Use smaller repositories or the system will automatically skip historic scraping
  • Rate Limits: The system includes delays and batching to handle GitHub API limits
  • Repository Not Found: Ensure repository names are in "owner/repo" format and are public