LoadOpt - Workload and Optimization for Multicore Systems
Multicore architectures have become the standard both for general-purpose PCs and for high-end servers. Such platforms provide an environment that is ideally suited for consolidating multiple applications within the same machine to fully take advantage of the available parallelism. As multicores become the basic building blocks of data centers and Cloud computing platforms, the problem of effective workload management becomes very challenging because the interference between collocated workloads may significantly affect their performance. Moreover, workloads are often observed to be executed with different processing rates for different system resources (e.g., CPU and disk) depending on the overall system load.
In this project we will first explore novel methods to characterize the dynamic behavior of applications running on modern multicores. We will develop practical methods to profile the spatial and temporal characteristics of applications. We will vertically profile load-dependent application behavior across the entire system stack, paying also attention to overheads due to virtualization. For the temporal characterization, we will devise sampling approaches that automatically adjust the sampling frequency to extract desired statistical properties and stochastic processes. We will leverage and refine existing dimension reduction and clustering methods to identify predominant load-dependent resources and representative workload classes. Our goal is to provide the statistics necessary to capture the temporal and spatial workload characteristics as well as to build suitable performance models.
Second, we will investigate novel performance models of load-dependent systems executing multiple classes of workloads. Based on our performance analysis, we will explore new load optimization algorithms aiming at optimally collocating application instances taking various performance objectives and system capacities into account. Our aim is to provide a set of optimal rules in collocating application instances in a wide range of system scenarios, applying rigorous mathematical optimization. To manage time-varying workloads, we will extend the developed workload consolidation algorithm into a dynamic framework, focusing on admission control and migration of workloads. Furthermore, we will explore virtual resource provisioning to optimize the system for a given workload at runtime.
The results of this project will improve capacity planning and performance management for various kinds of modern multicore systems. They will help avoid expensive over-provisioning of hardware resources and achieve better utilization of the available system resources, which in turn helps reduce energy consumption.
- Rosà A., Chen L. Y., Binder W. (2017) Actor Profiling in Virtual Execution Environments, ACM SIGPLAN Notices 52(3):36-46
- Rosà A., Chen L. Y., Binder W. (2017) Failure Analysis and Prediction for Big-Data Systems, IEEE Transactions on Services Computing 10(6):984-998
- Rosà A., Chen L. Y., Binder W. (2016) Actor Profiling in Virtual Execution Environments. GPCE. Amsterdam, The Netherlands. 2016
- Rosà A., Zheng Y., Sun H., Javed O., Binder W. (2016) Adaptable Runtime Monitoring for the Java Virtual Machine. ISoLA. Corfu, Greece. 2016
- Rosà A., Chen L. Y., Binder W. (2016) AkkaProf: a Profiler for Akka Actors in Parallel and Distributed Applications. APLAS. Hanoi, Vietnam. 2016
- Rosà A., Chen L. Y., Binder W. (2016) An Endpoint Communication Profiling Tool for Distributed Computing Frameworks. ICDCS. Nara, Japan. 2016
- Zheng Y., Rosà A., Salucci L., Li Y., Sun H., Javed O., Bulej L., Chen L. Y., Qi . Z., Binder W. (2016) AutoBench: Finding Workloads That You Need Using Pluggable Hybrid Analyses. SANER. Osaka, Japan. 2016
- Rosà A., Chen L. Y., Binder W. (2016) Efficient Profiling of Actor-based Applications in Parallel and Distributed Systems. ICOOOLPS. Rome, Italy. 2016
- Javed O., Zheng Y., Rosà A., Sun H., Binder W. (2016) Extended Code Coverage for AspectJ-based Runtime Verification Tools. RV. Madrid, Spain. 2016
- Rosà A., Chen L. Y., Binder W. (2016) Profiling Actor Utilization and Communication in Akka. Erlang. Nara, Japan. 2016
- Rosà A., Chen L. Y., Binder W. (2015) Catching Failures of Failures at Big-Data Clusters: a Two-Level Neural Network Approach. IWQoS. Portland, OR, USA. 2015
- Rosà A., Chen L. Y., Binder W. (2015) Demystifying Casualties of Evictions in Big Data Priority Scheduling, SIGMETRICS Perform. Eval. Rev. 42(4):12-21
- Rosà A., Chen . L. Y., Binder W. (2015) Predicting and Mitigating Jobs Failures in Big Data Clusters. CCGrid. Shenzen, China. 2015
- Rosà A., Chen . L. Y., Binder W. (2015) Understanding the Dark Side of Big Data Clusters: an Analysis beyond Failures. DSN. Rio de Janeiro, Brazil. 2015
- Rosà A., Chen . L. Y., Binder W. (2015) Understanding Unsuccessful Executions in Big-Data Systems. CCGrid. Shenzen, China. 2015
- Rosà A., Binder W., Chen L. Y., Gribaudo M., Serazzi G. (2014) ParSim: a Tool for Workload Modeling and Reproduction of Parallel Applications. MASCOTS. Paris, France. 2014
- Çavdar D., Rosà A., Chen L. Y., Binder W. (2014) Quantifying the Brown Side of Priority Schedulers: Lessons from Big Clusters. Greenmetrics. Austin, TX, USA. 2014
- Çavdar D., Rosà A., Chen . L. Y., Binder W., Alagöz F. (2014) Quantifying the Brown Side of the Priority Scheduler: Lessons from Big Clusters., SIGMETRICS Perform. Eval. Rev. 42(3):76-81
- Rosà A., Chen . L. Y., Binder W. (2014) When Things Turn Sour at Big Data Clusters: Understanding Unsuccessful Executions