INSTINCT - Improving Database Interactions in NoSQL Applications
NoSQL technologies have co-existed with relational databases since the first Database Management Systems appeared in the 1960s. The term “NoSQL” has only recently gained popularity as modern Big Data and Web 2.0 technologies triggered the need for other database solutions. In particular, they address greater scalability, fulfill a widespread preference for free and open source software, support special query operations that are not well supported in a relational database, and overcome the restrictiveness of strict schemas. Many database systems are considered as NoSQL databases resulting in a retroactive reinterpretation of the term as “Not Only SQL”. Examples of NoSQL databases are key-value stores (e.g., MongoDB, CouchDB), graph databases (e.g., Neo4J) or column family databases (e.g., HBase, Cassandra). According to DB-Engines ranking9, at the time of writing our proposal, half of the Top-10 most popular database management systems are NoSQL technologies, and their popularity is increasing. Many of the world’s largest tech companies are known to use these technologies such as LinkedIn, Amazon, or eBay. This is also the case for big metropolises, Swiss banks, and insurance companies. Despite the clear benefits of NoSQL, it poses new and unique challenges both for developers and researchers. For example, a prominent feature of such databases is that they are “schema-less”, offering greater flexibility to handle data without the limitations of a strict data model. This freedom often strikes back when it comes to the maintenance of an evolving data-intensive application. An advantage of relational databases is that they represent an established technology, and when it comes to maintenance tasks, many tools are readily available. This is not (yet) the case for NoSQL today. The main research goal of our project is to fulfill this gap by examining how developers interact with NoSQL databases from the application code and by developing techniques and tools to help developers improve these interactions. Database interactions play a crucial role in data-centric applications, as they determine how the system communicates with its database(s). When the application sends a query to its database, it is the database’s responsibility to handle the query with its best performance, and the developer has very limited control over it. However, if the query is not well-formed or not handled correctly in the program code, it will generate extra load on the database side what will affect the performance of the application. In the worst case, it can lead to errors, bugs, or even security vulnerabilities such as code injection. This is exactly what we target with our research. We address our goal from multiple directions: (i) we develop a method to identify the interaction points through which an application communicates with its underlying NoSQL database and we extract/recover the dynamically generated NoSQL queries of these locations; (ii) we analyze the extracted/recovered queries to infer (non-unique) database schemas, (iii) we identify frequent/critical antipatterns that can lead to potential vulnerabilities, bugs, or performance issues, and (iv) we develop analytics solutions (e.g., visualization techniques) to ease maintenance and evolution tasks for NoSQL database applications. We aim to achieve scalable, fully automatic analysis of an application intensively interacting with a NoSQL database, and provide developers different ways to improve the code of the application and easily perform related maintenance tasks.