Search for contacts, projects,
courses and publications

Knowledge Search & Extraction

People

Tonella P.

Course director

Description

The first part of the course deals with text search and extraction. The course will present techniques and approaches for the representation of natural language text as a numerical embedding, to support search on large corpora. Then, we will consider various algorithms for text analysis at the syntactic and semantic level, including parts of speech tagging, constituency and dependency parsing, semantic role labelling, text summarization, sentiment analysis and language models. We will cover also large language models and prompt engineering. In this part of the course, students will develop a search engine that can query a large Python code repository.

The second part of the course deals with code search and extraction. We will consider techniques to automatically fuzz the code and we will see applications of search based algorithms for automated test case generation. Then, we will consider information extraction for testing in different domains, including grammar based software, web applications, and GUI based systems. In this part of the course, students will develop a search based test case generator for Python.

Objectives

This course deals with the search and extraction of knowledge from text and code, using natural language processing and search-based algorithms. 

Teaching mode

In presence

Learning methods

Students will be involved in practical exercises and will experiment with the presented techniques by applying them to the course projects.

Examination information

Optional written mid-term exam; final oral exam; optional homework; two mandatory projects.

Education