PoolinGH
fast, efficient, and robust GitHub repository mining
Informazioni aggiuntive
Autori
André M.,
Raglianti M. (.,
Serbout S.,
Cleve A.,
Lanza M.
Tipo
Contributo in atti di convegno
Anno
2026
Lingua
Inglese
Sommario
Researchers in Mining (open-source) Software Repositories (MSR) often create datasets that should survive the single paper and support long-term investigation of specific phenomena. Although popular, these studies recurrently deal with similar technical limitations. For instance, public collaborative development platforms, such as GitHub, impose hourly rate limits on their API requests. Furthermore, depending on network and API conditions, queries can fail and disrupt the process. These unexpected events can slow down or even invalidate the mining. Nevertheless, there are ways to minimize the undesirable effects in a reusable way while still complying with such limitations. However, best practices are often (re-)implemented on an {\em ad hoc} basis. Whatever works.
We propose PoolinGH, a lightweight, open-source, easy-to-use library, aimed at supporting researchers. It is designed to accelerate and ensure efficient and robust mining on the GitHub REST API while taking full advantage of its capabilities. PoolinGH enables automatic pooling of multiple access tokens and parallelizes queries. It optimizes queues and regulates network and API usage for respecting GitHub's limits and best practices. Error management and recovery or pruning in case of deadlocks are ensured. Search coverage maximization and progress monitoring are among the most useful features to avoid reinventing the wheel. We also provide solution templates that meet common needs for specific extensions of PoolinGH. A preliminary evaluation of these examples, involving tens of thousands of requests, demonstrates tangible gains.
Parole chiave
GitHub rest api, Library, Mining software repositories, Poolingh
Titolo atti di convegno
ACM International Conference on Mining Software Repositories (MSR 2026)
Nome convegno
MSR 2026
Luogo convegno
Rio de Janeiro, Brazil
Data convegno
13-14 Apr 2026
Pagine (o numero dell’articolo)
in press
Edizione
23rd
Diffusione
Licenza
CC BY
Visibilità
Pubblico
Status open access
Gold