Search for contacts, projects,
courses and publications

Machine Learning for Software Engineering

People

Tonella P.

Course director

Description

This course deals with the problem of extracting information and knowledge from data, using unsupervised/supervised learning, as well as natural language processing algorithms, and how to use such knowledge to address various software engineering tasks. The course will cover the following topics:

  • Data pre-processing
  • Unsupervised learning
    • Hierarchical clustering, k-means, feature maps, graph-based clustering, density based clustering, anomaly detection
  • Supervised learning
    • Classifiers (e.g., nearest neighbour, decision trees, naive Bayes, SVM), regression models, deep neural networks
  • Evaluation methods and metrics, statistical tests
  • Latent semantic indexing and latent Dirichlet allocation
  • Text embedding for text search
  • Part of speech tagging
  • Constituency and dependency parsing
  • Semantic role labelling
  • Text summarization
  • Sentiment analysis
  • Language models

Objectives

The main objective of this course is to equip students with unsupervised/supervised learning, as well as natural language processing algorithms, which can be used to address software engineering tasks, such as defect prediction, bug triaging, code refactoring and modularization, code search, code completion, vulnerability detection, test case generation, documentation extraction and architecture recovery. Students will learn:

  • How unsupervised and supervised machine learning algorithms work, including their assumptions and limitations
  • How to select the training corpus and how to evaluate the outcome of training 
  • How to embed text into numerical vectors to support search
  • How to process text to extract structural and semantic information
  • How to support text and code completion using language models
  • How to apply the theory to a set of use cases in Software Engineering

Teaching mode

In presence

Learning methods

Lectures consist of theory lectures and lab lectures. Theory lectures cover each of the course topics in terms of algorithms behind the presented test techniques, application scenarios, and empirical results. Lab lectures aim at helping students complete the two course projects.

Examination information

Optional written mid-term exam; final oral exam; optional homework; two mandatory projects.

Education