CCQR - Crowdsourced Code Quality Review
Code Review is the process of analyzing code written by a teammate to judge whether it is of sufficient quality to be integrated into the main code trunk. Recent studies provided evidence that reviewed code has lower chances of being buggy and exhibit higher internal quality, likely being easier to comprehend and maintain. Given its benefits, code review is widely adopted both in industrial and in open source projects with the main goals of finding defects, improve code quality, and identify alternative solutions. Clearly, the benefits brought by code review do not come for free. Indeed, code review adds to the standard development cost the expense due to the allocation of one or more reviewers having the responsibility of verifying the correctness, quality, and soundness of the newly developed code. Our goal is to develop models and techniques serving as the basis for a new generation of recommender systems able to support the code review process by exploiting the knowledge embedded in various sources available on the Web (e.g., Stack Overflow discussions, presentations on SlideShare, development mailing lists, etc.). We refer to such a novel perspective on code review as crowdsourced code quality review. The term crowdsourcing has been coined by Jeff Howe as "the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people". We want to outsource (at least in part) the code review process by automatically perform opinion mining on the vast amount of information available on the Web. In particular, given a newly developed piece of code, the recommender system we envision should be able to review it by (i) assessing its code quality, looking for opinions mined from the Internet about the design decisions it embeds, e.g., verify whether the use of a specific library is recommended in online forums; and (ii) identifying alternative solutions, considered better suited on the basis of the crowdsourced information, e.g., an alternative library to use. We do not aim at automatically identifying bugs in code since this is the goal of bug detection/prediction techniques, that are out of the scope of this project. Also, our goal is not to completely replace humans in the code review process, but to support them in performing the two code review sub-tasks listed above (i.e., assessing code quality and identifying alternative solutions).The CCQR project aims at answering the following research questions:Q1 How can we establish fine-grained traceability links between a code component and sources of documentation available on the Web (e.g., Stack Overflow discussions, tutorials)? Q2 How can we perform opinion mining to capture crowdsourced opinions about a code component?Q3 How can we automatically identify alternative implementations for a given code component? Can we explain the pros and cons of the identified alternative implementations? For the CCQR project we propose to investigate the following research tracks: Fine-grained linking between code and documentation. The goal is to develop approaches mining informal documentation related to a given code component (a snippet, a class, or even a set of classes) from various sources. We want to provide a fine-grained linking between the code and the identified documentation (e.g., indicate which paragraph in a discussion refers to a specific library used in the code).Opinion mining. We want to exploit the fine-grained links defined in the first track to design opinion mining techniques able to provide a crowdsourced quality assessment about a specific code component. Automatic identification of alternative implementations. The goal is to develop methods and tools able to mine alternative implementations for a given code component. Prototyping & Validation. The techniques developed in the three above tracks will be integrated to periodically release prototypes of a recommender system implementing our vision of crowdsourced code quality review. The prototypes will be evaluated through controlled experiments and case studies.