Data design and modeling provides the foundation for representing, storing and managing structured, semi-structured and unstructured data. Data can be persistent or volatile, processed in batches or in continuous streams. Students will learn:
- how to select appropriate data management solutions to deal with scalability, availability, consistency, performance and expressiveness requirements
- big data dimensions: volume, velocity, variety and veracity
- CRUD primitives (create, read, update, delete) implemented at scale
- ACID/BASE transactional properties of existing SQL/NOSQL data management technologies
- No-SQL data models and technologies
- sharding and replication strategies
- data analysis pipeline: Acquisition, Integration, Exploration, Mining, Analytics, Interpretation and Visualization
- data quality, provenance, wrangling and cleansing to ensure data is worthy of trust
Students will experiment with big data technologies with hands-on use cases and practical use of cloud big data platforms.
- Martin J. Fowler, Pramodkumar J. Sadalage. Nosql Distilled: A Brief Guide to the Emerging World of Polyglot Persistence. Addison-Wesley, 2009. Traditional foundational book on NoSQL practices, mainly from software engineering perspective.
- Ted Hills. NoSQL and SQL Data Modeling: Bringing Together Data, Semantics, and Software. Technics Pubns Llc, 2016. Good, recent overview on database diversity. Despite the title, it does not really focus on modeling and it does not provide a holistic view on the field.
- Aaron Ploetz, Devram Kandhare, Sudarshan Kadambi, Xun (Brian) Wu. Seven NoSQL Databases in a Week. 2018. Basic, introductory reading to the most popular NoSql solutions, with an empirical and simple vision, missing completely the big picture on design and decision making.
- Andreas Meier e Michael Kaufmann. SQL- & Nosql-databases: Models, Languages, Consistency Options and Architectures for Big Data Management. Springer, mid-2019. Yet to be published book, showing how strategic it is to get a good positioning in the field with an appropriate book on NoSQL. It possibly take a perspective similar to ours.
- Martin Kleppmann. Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. O’Reilly, 2017. Best seller book on modern, large scale data processing system. It has a technology and system-centric perspective, that only superficially touch data models and their impact on system performance.