Differently from other data mining courses, in this course we deal with datasets that for their large size, fast speed of updating, and variety of content (all characteristics of Big Data) cannot be mined with standard techniques. Hence, the course deals with topics such as: similarity measures for very large datasets, mining fast data streams, link analysis, clustering, recommender systems, etc.
The course deals with mining very large datasets, analysing them to make some descriptive summary of their content, test some hypothesis and extract valuable knowledge from them.
Modalità di insegnamento
The course consists of lectures (where the methods to use are analysed from a theoretical perspective) and of a practical part (where theory will be put into practice using statistical packages in Python). The aim of the course is to teach, mostly by example, how to perform practical analysis of large datasets to interpret, visualise, and diagnose results and potential problems. The course assumes a decent knowledge of Python.
Students are examined during the course by means of 2 theoretical and 1-2 practical tests (to be decided with the students). The theoretical tests will deal with the material taught during the theoretical lectures, while the practical tests will consists of the analysis of an individually assigned large datasets provided during the course. There will be no final exam.