A high-performance GitHub data pipeline that extracts developer profiles and repositories with precision — built for researchers, recruiters, and data engineers.
GitHub Scraper exists to remove the friction between raw API data and actionable intelligence. Whether you're mapping open-source ecosystems, identifying talent, or researching contribution trends — we give you clean, structured data without the boilerplate.
◆ Purpose-built for clarityQueries the GitHub Search API with your target city/country and minimum follower count. Paginates through all results automatically.
All user logins are collected first, then detailed profiles (email, company, bio, stats) are fetched concurrently via a thread pool — up to 10x faster than sequential requests.
For each user, all public repositories are fetched in parallel — capturing language, stars, license, watchers, and project settings across paginated results.
Data is structured into two tidy CSVs — users.csv and repositories.csv — saved to a data/ directory, ready for analysis or import.
Building tools at the intersection of machine learning and data science. Passionate about clean APIs, fast pipelines, and making raw data immediately useful. GitHub Scraper is a personal project designed to solve a real research friction point.