Search for contacts, projects,
courses and publications

Data Analytics

People

Crestani F.

Course director

Ríssola E. A.

Assistant

Description

The course deals with mining very large datasets, analysing them to make some descriptive summary of their content, test hypothesis, and extract valuable knowledge from them. Differently from other data mining courses here we deal with datasets that for their large size, speed of updating, or variety of content cannot be mined with standard techniques. Hence we will deal with topics such as: similarity measures for very large datasets and data streams, link analysis, clustering, recommender systems, MapReduce, etc. As part of the course we also learn how to use the R statistical programming language to perform, interpret and visualise results, and diagnose potential problems of your analysis.

 

 

REFERENCES

  • Required: Jure Leskovec, Anand Rajaraman and Jeffrey David Ullman. Mining of Massive Datasets (2nd edition). Cambridge University Press, 2014.
  • Suggested: Paul Teetor. R Cookbook. O' Reilly, 2011.

Education