PARSED - Personalized and contextuAlized Recommendations for SoftwarE Developers
Software systems are amongst the most complex artifacts ever created by humans, featuring millions of components interacting between them. To support developers in handling such a complexity, researchers proposed recommender systems for software developers, defined by Robillard as "software tools that can assist developers with a wide range of activities". Part of this family are code recommender systems, namely tools recommending to developers source code aimed at speeding up a task at hand (e.g., the implementation of a new feature). While research in this field is very active and tools such as GitHub Copilot are nowadays used by thousands of developers, these tools suffer of two major limitations. First, they do not adapt the generated recommendations to the consumer (i.e., the developer who receives them): Two developers working on the same task but having a different technical background, coding history, and skills, will receive the same code recommendation, despite they may benefit from different coding solutions. For example, more expert developers working on real-time software are likely to appreciate multi-threading solutions for a given task, while newcomers working on non performance-critical software may just be confused by its usage. Second, the recommended code is usually derived by looking at a very limited coding context. Even GitHub Copilot does only consider the files on which the developer is currently working as the context to generate the recommendation. However, much more contextual information can be exploited to further boost the quality of the recommendations, such as information coming from coupled code components, technical documentation, etc.
Our goal is to overcome these limitation by developing techniques and tools aimed at bringing personalized and contextualized recommendations in source code recommender systems.
The PARSED project aims at answering the following research questions:
- Q1 How can we profile software developers to model their coding style and expertises?
- Q2 How can we exploit the developers' profile to customize code recommendations?
- Q3 How can we enrich the contextual information to generate better code recommendations?
- Q4 How can we reliably assess the usefulness of the generated recommendations?
For the PARSED project we propose to investigate the following research tracks:
- RT1: Profiling software developers. The goal is to develop techniques allowing to infer (i) the programming style of software developers, and (ii) their expertises, namely the specific programming languages, libraries, notions (e.g., design patterns), etc. they are at ease with. To build such a profile we plan to mine the past developers' activities from software repositories (e.g., versioning system, issue tracker).
RT2: Personalizing code recommendations. We want to integrate the model(s) output of RT1 into code recommenders and study their impact on the usefulness of the generated recommendations.
RT2: Contextualizing code recommendations. The goal is to investigate how to model the coding context in such a way to maximize the usefulness of the generated recommendations. We will mostly focus on ``internal'' sources of information (e.g., the software system itself, the official documentation, opened issues) to provide a comprehensive but succinct representation of the coding context.
RT4: Developing a framework for assessing the usefulness of code recommendations. Code recommenders are mostly evaluated using quantitative metrics. For example, code statements are removed from code files and the recommender is asked to predict the removed code. With such a mechanism, the percentage of correct predictions can be used as evaluation metric. However, if the predicted code is different but semantically equivalent to the removed one, the prediction is considered wrong. We aim at building a strong evaluation framework for code recommender systems, something currently missing in the literature.
RT5: Prototyping & Validation. The techniques developed in the project will be integrated to release and evaluate prototypes implementing our vision of personalized and contextualized code recommenders.