From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NeVJ2-00067V-JA for qemu-devel@nongnu.org; Mon, 08 Feb 2010 10:13:44 -0500 Received: from [199.232.76.173] (port=44904 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NeVJ2-00067K-69 for qemu-devel@nongnu.org; Mon, 08 Feb 2010 10:13:44 -0500 Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim 4.60) (envelope-from ) id 1NeVIx-0000SE-Mz for qemu-devel@nongnu.org; Mon, 08 Feb 2010 10:13:44 -0500 Received: from mail-yx0-f183.google.com ([209.85.210.183]:50119) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NeVIx-0000S0-Ch for qemu-devel@nongnu.org; Mon, 08 Feb 2010 10:13:39 -0500 Received: by yxe13 with SMTP id 13so4323705yxe.18 for ; Mon, 08 Feb 2010 07:13:38 -0800 (PST) Message-ID: <4B702A21.1070808@codemonkey.ws> Date: Mon, 08 Feb 2010 09:13:37 -0600 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] Re: Two QMP events issues References: <20100208114145.4bd64349@doriath> <20100208141218.GG17328@redhat.com> <4B702470.5080401@codemonkey.ws> <20100208145653.GA25256@redhat.com> In-Reply-To: <20100208145653.GA25256@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Daniel P. Berrange" Cc: armbru@redhat.com, qemu-devel@nongnu.org, Luiz Capitulino On 02/08/2010 08:56 AM, Daniel P. Berrange wrote: > On Mon, Feb 08, 2010 at 08:49:20AM -0600, Anthony Liguori wrote: > >> On 02/08/2010 08:12 AM, Daniel P. Berrange wrote: >> >>> For further backgrou, the key end goal here is that in a QMP client, upon >>> receipt of the 'RESET' event, we need to reliably& immediately determine >>> why it occurred. eg, triggered by watchdog, or by guest OS request. There >>> are actually 3 possible sequences >>> >>> - WATCHDOG + action=reset, followed by RESET. Assuming no intervening >>> event can occurr, the client can merely record 'WATCHDOG' and interpret >>> it when it gets the immediately following 'RESET' event >>> >>> - RESET, followed by WATCHDOG + action=reset. The client doesn't know >>> the reason for the RESET and can't wait arbitrarily for WATCHDOG since >>> there might never be one arriving. >>> >>> - RESET + source=watchdog. Client directly sees the reason >>> >>> The second scenario is the one I'd like us to avoid at all costs, since it >>> will require the client to introduce arbitrary delays in processing events >>> to determine cause. The first is slightly inconvenient, but doable if we >>> can assume no intervening events will occur, between WATCHDOG and the >>> RESET events. The last is obviously simplest for the clients. >>> >>> >> I really prefer the third option but I'm a little concerned that we're >> throwing events around somewhat haphazardly. >> >> So let me ask, why does a client need to determine when a guest reset >> and why it reset? >> > If a guest OS is repeatedly hanging/crashing resulting in the watchdog > device firing, management software for the host really wants to know about > that (so that appropriate alerts/action can be taken) and thus needs to > be able to distinguish this from a "normal" guest OS initiated reboot. > I think that's an argument for having the watchdog events independent of the reset events. The watchdog condition happening is not directly related to the action the watchdog takes. The watchdog event really belongs in a class events that are closely associated with a particular device emulation. In fact, I think what we're really missing in events today is a notion of a context. A RESET event is really a CPU event. A watchdog expiration event is a watchdog event. A connect event is a VNC event (Spice and chardevs will also generate connect events). Including what the current action is in the watchdog expiration event is certainly reasonable although not strictly necessary. Regards, Anthony Liguori > Regards, > Daniel >