Data Design & Modeling
Data design and modeling provides the foundation for representing, storing and managing structured, semi-structured and unstructured data. Data can be persistent or volatile, processed in batches or in continuous streams. Students will learn how to select appropriate data management solutions to deal with scalability, availability, consistency, performance, and expressiveness requirements.
- big data dimensions: volume, velocity, variety, and veracity
- CRUD primitives (create, read, update, delete) implemented at scale
- ACID/BASE transactional properties of existing SQL/NOSQL data management technologies
- No-SQL data models and technologies
- sharding and replication strategies
- data analysis pipeline: Acquisition, Integration, Exploration, Mining, Analytics, Interpretation and Visualization
- data quality, provenance, wrangling, and cleansing to ensure data is worthy of trust
Besides the introductory classes, students will experiment with big data technologies with hands-on use cases and practical use of cloud big data platforms.
The exam will consist in a written session where theory questions and exercises will be responded by students on paper. The written exam will account for 70% of the mark. Along with the course, project work activities will be carried out by students in groups. This will count for 30% of the mark.
- Martin J. Fowler, Pramodkumar J. Sadalage. Nosql Distilled: A Brief Guide to the Emerging World of Polyglot Persistence. Addison-Wesley, 2009. Traditional foundational book on NoSQL practices, mainly from a software engineering perspective.
- Ted Hills. NoSQL and SQL Data Modeling: Bringing Together Data, Semantics, and Software. Technics Pubns Llc, 2016. Good overview on database diversity.
- Aaron Ploetz, Devram Kandhare, Sudarshan Kadambi, Xun (Brian) Wu. Seven NoSQL Databases in a Week. 2018. Basic, introductory reading to the most popular NoSql solutions, with an empirical and simple perspective.
- Andreas Meier e Michael Kaufmann. SQL- & Nosql-databases: Models, Languages, Consistency Options and Architectures for Big Data Management. Springer, mid-2019. Book showing how strategic it is to get a good positioning in the field with an appropriate book on NoSQL.
- Martin Kleppmann. Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. O’Reilly, 2017. Best seller book on modern, large scale data processing system. It has a technology and system-centric perspective, with a simple coverage of data models and their impact on system performance.