From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Tue, 22 Sep 2015 20:42:48 +0000 (UTC) From: Mathieu Desnoyers Message-ID: <378118656.448.1442954568188.JavaMail.zimbra@efficios.com> In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Subject: Re: [diamon-discuss] Interested in integration with cluster management software List-Id: DiaMon diagnostic and monitoring workgroup general discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Connor Doyle Cc: diamon-discuss@lists.linuxfoundation.org ----- On Sep 21, 2015, at 12:09 PM, Connor Doyle connor.p.d@gmail.com wrote: > Hello, > > I heard about his workgroup through the LF newsletter this morning. > At Mesosphere we contribute to Apache Mesos, a poular open source > cluster resource manager and related software. Much of our work would > benefit from more standardization (even de-facto standardization) > around application level tracing and monitoring. For example, Mesos > recently added support for modular oversubscription policies for slack > estimation and QoS control. We've started discussions about wiring up > something bespoke for use in Mesos, but standard format for expressing > SLI/SLO could be better. > > Anyway, just want to express interest in the outcomes, volunteer to > discuss and help where possible, and say "kudos" for bootstrapping > this workgroup. Hi Connor, Thanks for your interest in the DiaMon Workgroup! Indeed, integrating tracing/monitoring solutions into a CI resource manager feedback loop would be an interesting area to tackle. We could then do fine-grained resource monitoring based on a wide set of metrics, e.g.: - I/O throughput, max latency, - Network throughput and max latency, - CPU utilization, - Preemption latency, - Memory usage. One aspect that pure sampling approaches (profiling) usually don't handle well are those latency-related. Doing aggregation on tracing data can be a good way to achieve this. You could then express constraints on the resources in different ways. Instead of just reserving "capacity", you could also reserve "latency guarantees". Thoughts ? Thanks, Mathieu > > Best, > -- > Connor Doyle > _______________________________________________ > diamon-discuss mailing list > diamon-discuss@lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/diamon-discuss -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com