From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NeYIj-0003o3-7b for qemu-devel@nongnu.org; Mon, 08 Feb 2010 13:25:37 -0500 Received: from [199.232.76.173] (port=45614 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NeYIi-0003nt-Oq for qemu-devel@nongnu.org; Mon, 08 Feb 2010 13:25:36 -0500 Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim 4.60) (envelope-from ) id 1NeYIf-00062B-3u for qemu-devel@nongnu.org; Mon, 08 Feb 2010 13:25:35 -0500 Received: from mx1.redhat.com ([209.132.183.28]:62976) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NeYId-00061W-Rl for qemu-devel@nongnu.org; Mon, 08 Feb 2010 13:25:32 -0500 Date: Mon, 8 Feb 2010 16:25:21 -0200 From: Luiz Capitulino Subject: Re: [Qemu-devel] Re: Two QMP events issues Message-ID: <20100208162521.788f9c02@doriath> In-Reply-To: <4B702A21.1070808@codemonkey.ws> References: <20100208114145.4bd64349@doriath> <20100208141218.GG17328@redhat.com> <4B702470.5080401@codemonkey.ws> <20100208145653.GA25256@redhat.com> <4B702A21.1070808@codemonkey.ws> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: qemu-devel@nongnu.org, armbru@redhat.com On Mon, 08 Feb 2010 09:13:37 -0600 Anthony Liguori wrote: > On 02/08/2010 08:56 AM, Daniel P. Berrange wrote: > > On Mon, Feb 08, 2010 at 08:49:20AM -0600, Anthony Liguori wrote: > > > >> On 02/08/2010 08:12 AM, Daniel P. Berrange wrote: > >> > >>> For further backgrou, the key end goal here is that in a QMP client, upon > >>> receipt of the 'RESET' event, we need to reliably& immediately determine > >>> why it occurred. eg, triggered by watchdog, or by guest OS request. There > >>> are actually 3 possible sequences > >>> > >>> - WATCHDOG + action=reset, followed by RESET. Assuming no intervening > >>> event can occurr, the client can merely record 'WATCHDOG' and interpret > >>> it when it gets the immediately following 'RESET' event > >>> > >>> - RESET, followed by WATCHDOG + action=reset. The client doesn't know > >>> the reason for the RESET and can't wait arbitrarily for WATCHDOG since > >>> there might never be one arriving. > >>> > >>> - RESET + source=watchdog. Client directly sees the reason > >>> > >>> The second scenario is the one I'd like us to avoid at all costs, since it > >>> will require the client to introduce arbitrary delays in processing events > >>> to determine cause. The first is slightly inconvenient, but doable if we > >>> can assume no intervening events will occur, between WATCHDOG and the > >>> RESET events. The last is obviously simplest for the clients. > >>> > >>> > >> I really prefer the third option but I'm a little concerned that we're > >> throwing events around somewhat haphazardly. > >> > >> So let me ask, why does a client need to determine when a guest reset > >> and why it reset? > >> > > If a guest OS is repeatedly hanging/crashing resulting in the watchdog > > device firing, management software for the host really wants to know about > > that (so that appropriate alerts/action can be taken) and thus needs to > > be able to distinguish this from a "normal" guest OS initiated reboot. > > > > I think that's an argument for having the watchdog events independent of > the reset events. > > The watchdog condition happening is not directly related to the action > the watchdog takes. The watchdog event really belongs in a class events > that are closely associated with a particular device emulation. > > In fact, I think what we're really missing in events today is a notion > of a context. A RESET event is really a CPU event. A watchdog > expiration event is a watchdog event. A connect event is a VNC event > (Spice and chardevs will also generate connect events). This could be done by adding a 'context' member to all the events and then an event would have to be identified by the pair event_name:context. This way we can have the same event_name for events in different contexts. For example: { 'event': DISCONNECT, 'context': 'spice', [...] } { 'event': DISCONNECT, 'context': 'vnc', [...] } Note that today we have VNC_DISCONNECT and will probably have SPICE_DISCONNECT too.