FARM: Comprehensive Data Center Network Monitoring and Management
Informazioni aggiuntive
Autori
Graf J. P. R.,
Chuprikov P.,
Eugster P. T.
Tipo
Contributo in atti di convegno
Anno
2024
Lingua
Inglese
Sommario
Modern data centers face growing workloads, putting accrued pressure on network monitoring solutions necessary for ensuring correct and efficient operation. Advances in network programmability have meanwhile led to yet more monitoring data being straightforwardly collected from switches, exacerbating bottlenecks in corresponding collection-centric approaches. This limits scalability and responsiveness, especially when several monitoring tasks are deployed side-by-side, as is common for network management. We present a novel and comprehensive selection-centric solution for network monitoring and management (M&M) called FARM that significantly simplifies the development and deployment of network M&M tasks while being effective and scalable. FARM's main novelty lies in its comprehensive design. Instead of focusing solely on individual parts of network monitoring, FARM takes a global perspective on the problem and aligns all of its components correspondingly: a strongly decentralized software architecture, a specifically designed programming model, and an integrated performance optimization framework. In short, FARM performs monitoring (re)actions locally on switches to the extent possible, using centralized components only if and when needed, and globally optimizes placement, considering placement constraints intrinsically expressed through its programming model as well as commonalities among tasks. Deployed in a production data center, FARM shows significant gains in responsiveness (up to 3427× faster over recent generic approaches and 4 × faster over highly specialized solutions), and savings in network band-width (10000 ×) and computational effort. Placement optimization shows excellent scalability up to 10200 seeds across 1040 switches.
Parole chiave
Data Center Networking, Domain-specific Language, Network Monitoring and Management
Titolo atti di convegno
2024 IEEE 44th IEEE International Conference on Distributed Computing Systems
Editore
IEEE
Nome convegno
2024 IEEE 44th International Conference on Distributed Computing Systems (ICDCS)
Luogo convegno
Jersey City, NJ, USA
Data convegno
July 23-26 2024
Pagine (o numero dell’articolo)
520-530