Nowadays more and more information is available in unstructured or poorly structured form. Examples of information of this type are textual documents, web pages, videos, photos, music, blogs, etc. The goal of this course is to enable the student to understand the foundations of managing unstructured or poorly structured information.
The course aims to assist students to understand techniques for the indexing, retrieval, filtering, clustering, and presentation of textual and multimedia information held in digital archives, the web, and/or multimedia systems. From this perspective the course complements what the student learned from the previous course on Data Management, where only structured information is dealt with.
The course consists of theoretical lectures and practical sessions. The practical sessions deal with the design, implementation, and evaluation of an information retrieval system for a small and medium size collection of documents.
Examination will consist of 3 theoretical tests and 1 project (no final exam). The tests will check the student knowledge of the theoretical notions taught, while the project will test the student's ability to put them into practice implementing a system to index and retrieve a collection of docs.
- Data Management
- Required: C. Zhai, and S. Massung. Text Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining. ACM Books, 2016.
- Suggested: W.B. Croft, D. Metzler, and T. Strohman. Search Engines: Information Retrieval in Practice, Pearson, 2009.