Search for contacts, projects,
courses and publications

PoolinGH
fast, efficient, and robust GitHub repository mining

Additional information

Authors
André M., Raglianti M. (., Serbout S., Cleve A., Lanza M.
Type
Article in conference proceedings
Year
2026
Language
English
Abstract
Researchers in Mining (open-source) Software Repositories (MSR) often create datasets that should survive the single paper and support long-term investigation of specific phenomena. Although popular, these studies recurrently deal with similar technical limitations. For instance, public collaborative development platforms, such as GitHub, impose hourly rate limits on their API requests. Furthermore, depending on network and API conditions, queries can fail and disrupt the process. These unexpected events can slow down or even invalidate the mining. Nevertheless, there are ways to minimize the undesirable effects in a reusable way while still complying with such limitations. However, best practices are often (re-)implemented on an {\em ad hoc} basis. Whatever works. We propose PoolinGH, a lightweight, open-source, easy-to-use library, aimed at supporting researchers. It is designed to accelerate and ensure efficient and robust mining on the GitHub REST API while taking full advantage of its capabilities. PoolinGH enables automatic pooling of multiple access tokens and parallelizes queries. It optimizes queues and regulates network and API usage for respecting GitHub's limits and best practices. Error management and recovery or pruning in case of deadlocks are ensured. Search coverage maximization and progress monitoring are among the most useful features to avoid reinventing the wheel. We also provide solution templates that meet common needs for specific extensions of PoolinGH. A preliminary evaluation of these examples, involving tens of thousands of requests, demonstrates tangible gains.
Keywords
GitHub rest api, Library, Mining software repositories, Poolingh
Conference proceedings
ACM International Conference on Mining Software Repositories (MSR 2026)
Meeting name
MSR 2026
Meeting place
Rio de Janeiro, Brazil
Meeting date
13-14 Apr 2026
Pages (or article number)
in press
Edition
23rd

Diffusion

License
CC BY
Visibility
Public
Status open access
Gold