From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1Nevhl-0005yl-4U
	for qemu-devel@nongnu.org; Tue, 09 Feb 2010 14:25:01 -0500
Received: from [199.232.76.173] (port=46903 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Nevhk-0005yX-Ph
	for qemu-devel@nongnu.org; Tue, 09 Feb 2010 14:25:00 -0500
Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim
	4.60) (envelope-from <jamie@shareable.org>) id 1Nevhi-000219-OU
	for qemu-devel@nongnu.org; Tue, 09 Feb 2010 14:25:00 -0500
Received: from mail2.shareable.org ([80.68.89.115]:37556)
	by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32)
	(Exim 4.60) (envelope-from <jamie@shareable.org>) id 1Nevhg-0001z7-Tr
	for qemu-devel@nongnu.org; Tue, 09 Feb 2010 14:24:57 -0500
Date: Tue, 9 Feb 2010 19:24:35 +0000
From: Jamie Lokier <jamie@shareable.org>
Subject: Re: [Qemu-devel] Re: Two QMP events issues
Message-ID: <20100209192435.GB946@shareable.org>
References: <20100208114145.4bd64349@doriath>
	<20100208141218.GG17328@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20100208141218.GG17328@redhat.com>
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Daniel P. Berrange" <berrange@redhat.com>
Cc: armbru@redhat.com, aliguori@us.ibm.com, qemu-devel@nongnu.org, Luiz Capitulino <lcapitulino@redhat.com>

Daniel P. Berrange wrote:
> For further backgrou, the key end goal here is that in a QMP client, upon
> receipt of the  'RESET' event, we need to reliably & immediately determine
> why it  occurred. eg, triggered by watchdog, or by guest OS request. There
> are actually 3 possible sequences
> 
>  - WATCHDOG + action=reset, followed by RESET.  Assuming no intervening 
>    event can occurr, the client can merely record 'WATCHDOG' and interpret
>    it when it gets the immediately following 'RESET' event

WATCHDOG is useful in it's own right.  For example, a manager may
decide itself what action to take - such as resetting on the first
three watchdog triggers and then stopping the vm without reset - so
there wouldn't be any other event from qemu about the watchdog.

Because WATCHDOG is useful in some circumstances, I think for
consistency it should always be emitted.

>  - RESET, followed by WATCHDOG + action=reset. The client doesn't know
>    the reason for the RESET and can't wait arbitrarily for WATCHDOG since
>    there might never be one arriving.

Bad.  Avoid :-)

Actually, if there is a problem maintaining event order, this would be
ok as long as RESET includes the reason - then the listener knows to
wait for the WATCHDOG event.

>  - RESET + source=watchdog. Client directly sees the reason

I think this is good, but it should be preceded by the WATCHDOG event as all.

So:

    WATCHDOG action=reset
    RESET reason=watchdog

By the way, if a listener attaches to qemu in the middle of this
operation, is it possible for it to receive one event but not the
other due to timing?

It might make sense to add the concept of "group of events" if this
could be a problem.

> The second scenario is the one I'd like us to avoid at all costs, since it
> will require the client to introduce arbitrary delays in processing events
> to determine cause. The first is slightly inconvenient, but doable if we 
> can assume no intervening events will occur, between WATCHDOG and the
> RESET events. The last is obviously simplest for the clients.

The last isn't simple for clients that want to know when the watchdog
triggers, independent of reason.  They would have to look for
different kinds of events, depending on how the watchdog is configured.

And, perhaps more importantly, they wouldn't work if more
action-options were added to the watchdog device.

-- Jamie