The price of precision: the cost of preprocessing for automated code revision in code review

Autori

Pirouzkhah S., Rani P., Sovrano F., Hellendoorn V., Bacchelli A.

Tipo

Articolo pubblicato in rivista scientifica

Anno

2025

Lingua

Inglese

Sommario

Code review is a widespread practice in software engineering during which developers examine each other’s source code changes to identify potential issues and improve code quality. Among the automated techniques proposed by researchers to reduce the manual workload of code review, Automated Code Revision (ACR) aims to automatically address reviewers’ feedback by producing a revised version of the code. Transformer-based language models have demonstrated state-of-the-art results in ACR. The performance of these models, however, is significantly influenced by the quality and preparation of the training and evaluation data. We present several systematic analyses of prevalent preprocessing steps, examined both cumulatively and in isolation, across three established preprocessing pipelines and two dataset splitting strategies (time-level vs. project-level). Our study spans across models of different scales: OpenNMT (small), T5 and CodeReviewer (mid-sized), LoRA-tuned CodeLLaMA-7B (large), and GPT-3.5-Turbo (large, black-box). Using datasets up to 496k training records, we evaluate and statistically compare models’ performance using exact match ratio (EXM), CodeBLEU, and Levenshtein ratio. Our findings show that preprocessing may be a significant component in the success of the different techniques: OpenNMT relies on heavy preprocessing; T5 benefits from light filtering (selective removal of records); CodeReviewer performs best when trained on larger, less aggressively filtered data; CodeLLaMA-7B and ChatGPT-3.5 Turbo are largely indifferent to preprocessing. Overall, the effectiveness of ACR tools depends on aligning preprocessing with model scale and training setup. In general, small models need abstraction, mid-sized ones benefit from light filtering, and large-scale models perform best when trained on the original, unprocessed form of the code.

Periodico

Empirical Software Engineering

Volume

31

Numero ( Mese )

2

Pagine (o numero dell’articolo)

47

ISSN

1382-3256, 1573-7616

DOI

10.1007/s10664-025-10781-4

Persone

Formazione

Ricerca

Organizzazione

The price of precision: the cost of preprocessing for automated code revision in code review

Informazioni aggiuntive

Diffusione

Indicazioni

Resta in contatto