From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1NeYIj-0003o3-7b
	for qemu-devel@nongnu.org; Mon, 08 Feb 2010 13:25:37 -0500
Received: from [199.232.76.173] (port=45614 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1NeYIi-0003nt-Oq
	for qemu-devel@nongnu.org; Mon, 08 Feb 2010 13:25:36 -0500
Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim
	4.60) (envelope-from <lcapitulino@redhat.com>) id 1NeYIf-00062B-3u
	for qemu-devel@nongnu.org; Mon, 08 Feb 2010 13:25:35 -0500
Received: from mx1.redhat.com ([209.132.183.28]:62976)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <lcapitulino@redhat.com>) id 1NeYId-00061W-Rl
	for qemu-devel@nongnu.org; Mon, 08 Feb 2010 13:25:32 -0500
Date: Mon, 8 Feb 2010 16:25:21 -0200
From: Luiz Capitulino <lcapitulino@redhat.com>
Subject: Re: [Qemu-devel] Re: Two QMP events issues
Message-ID: <20100208162521.788f9c02@doriath>
In-Reply-To: <4B702A21.1070808@codemonkey.ws>
References: <20100208114145.4bd64349@doriath>
	<20100208141218.GG17328@redhat.com>
	<4B702470.5080401@codemonkey.ws>
	<20100208145653.GA25256@redhat.com>
	<4B702A21.1070808@codemonkey.ws>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: qemu-devel@nongnu.org, armbru@redhat.com

On Mon, 08 Feb 2010 09:13:37 -0600
Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 02/08/2010 08:56 AM, Daniel P. Berrange wrote:
> > On Mon, Feb 08, 2010 at 08:49:20AM -0600, Anthony Liguori wrote:
> >    
> >> On 02/08/2010 08:12 AM, Daniel P. Berrange wrote:
> >>      
> >>> For further backgrou, the key end goal here is that in a QMP client, upon
> >>> receipt of the  'RESET' event, we need to reliably&   immediately determine
> >>> why it  occurred. eg, triggered by watchdog, or by guest OS request. There
> >>> are actually 3 possible sequences
> >>>
> >>>   - WATCHDOG + action=reset, followed by RESET.  Assuming no intervening
> >>>     event can occurr, the client can merely record 'WATCHDOG' and interpret
> >>>     it when it gets the immediately following 'RESET' event
> >>>
> >>>   - RESET, followed by WATCHDOG + action=reset. The client doesn't know
> >>>     the reason for the RESET and can't wait arbitrarily for WATCHDOG since
> >>>     there might never be one arriving.
> >>>
> >>>   - RESET + source=watchdog. Client directly sees the reason
> >>>
> >>> The second scenario is the one I'd like us to avoid at all costs, since it
> >>> will require the client to introduce arbitrary delays in processing events
> >>> to determine cause. The first is slightly inconvenient, but doable if we
> >>> can assume no intervening events will occur, between WATCHDOG and the
> >>> RESET events. The last is obviously simplest for the clients.
> >>>
> >>>        
> >> I really prefer the third option but I'm a little concerned that we're
> >> throwing events around somewhat haphazardly.
> >>
> >> So let me ask, why does a client need to determine when a guest reset
> >> and why it reset?
> >>      
> > If a guest OS is repeatedly hanging/crashing resulting in the watchdog
> > device firing, management software for the host really wants to know about
> > that (so that appropriate alerts/action can be taken) and thus needs to
> > be able to distinguish this from a "normal"  guest OS initiated reboot.
> >    
> 
> I think that's an argument for having the watchdog events independent of 
> the reset events.
> 
> The watchdog condition happening is not directly related to the action 
> the watchdog takes.  The watchdog event really belongs in a class events 
> that are closely associated with a particular device emulation.
> 
> In fact, I think what we're really missing in events today is a notion 
> of a context.  A RESET event is really a CPU event.  A watchdog 
> expiration event is a watchdog event.  A connect event is a VNC event 
> (Spice and chardevs will also generate connect events).

 This could be done by adding a 'context' member to all the events and
then an event would have to be identified by the pair event_name:context.

 This way we can have the same event_name for events in different
contexts. For example:

{ 'event': DISCONNECT, 'context': 'spice', [...] }

{ 'event': DISCONNECT, 'context': 'vnc', [...] }

 Note that today we have VNC_DISCONNECT and will probably have
SPICE_DISCONNECT too.