From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NeZ43-0005C4-TC for qemu-devel@nongnu.org; Mon, 08 Feb 2010 14:14:31 -0500 Received: from [199.232.76.173] (port=37807 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NeZ43-0005Bt-Es for qemu-devel@nongnu.org; Mon, 08 Feb 2010 14:14:31 -0500 Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim 4.60) (envelope-from ) id 1NeZ40-0000vw-Ro for qemu-devel@nongnu.org; Mon, 08 Feb 2010 14:14:31 -0500 Received: from mail-iw0-f185.google.com ([209.85.223.185]:34145) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NeZ40-0000vU-E4 for qemu-devel@nongnu.org; Mon, 08 Feb 2010 14:14:28 -0500 Received: by iwn15 with SMTP id 15so4150772iwn.19 for ; Mon, 08 Feb 2010 11:14:27 -0800 (PST) Message-ID: <4B706290.7020104@codemonkey.ws> Date: Mon, 08 Feb 2010 13:14:24 -0600 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] Re: Two QMP events issues References: <20100208114145.4bd64349@doriath> <20100208141218.GG17328@redhat.com> <4B702470.5080401@codemonkey.ws> <20100208145653.GA25256@redhat.com> <4B702A21.1070808@codemonkey.ws> <20100208162521.788f9c02@doriath> In-Reply-To: <20100208162521.788f9c02@doriath> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Luiz Capitulino Cc: qemu-devel@nongnu.org, armbru@redhat.com On 02/08/2010 12:25 PM, Luiz Capitulino wrote: > On Mon, 08 Feb 2010 09:13:37 -0600 > Anthony Liguori wrote: > > >> On 02/08/2010 08:56 AM, Daniel P. Berrange wrote: >> >>> On Mon, Feb 08, 2010 at 08:49:20AM -0600, Anthony Liguori wrote: >>> >>> >>>> On 02/08/2010 08:12 AM, Daniel P. Berrange wrote: >>>> >>>> >>>>> For further backgrou, the key end goal here is that in a QMP client, upon >>>>> receipt of the 'RESET' event, we need to reliably& immediately determine >>>>> why it occurred. eg, triggered by watchdog, or by guest OS request. There >>>>> are actually 3 possible sequences >>>>> >>>>> - WATCHDOG + action=reset, followed by RESET. Assuming no intervening >>>>> event can occurr, the client can merely record 'WATCHDOG' and interpret >>>>> it when it gets the immediately following 'RESET' event >>>>> >>>>> - RESET, followed by WATCHDOG + action=reset. The client doesn't know >>>>> the reason for the RESET and can't wait arbitrarily for WATCHDOG since >>>>> there might never be one arriving. >>>>> >>>>> - RESET + source=watchdog. Client directly sees the reason >>>>> >>>>> The second scenario is the one I'd like us to avoid at all costs, since it >>>>> will require the client to introduce arbitrary delays in processing events >>>>> to determine cause. The first is slightly inconvenient, but doable if we >>>>> can assume no intervening events will occur, between WATCHDOG and the >>>>> RESET events. The last is obviously simplest for the clients. >>>>> >>>>> >>>>> >>>> I really prefer the third option but I'm a little concerned that we're >>>> throwing events around somewhat haphazardly. >>>> >>>> So let me ask, why does a client need to determine when a guest reset >>>> and why it reset? >>>> >>>> >>> If a guest OS is repeatedly hanging/crashing resulting in the watchdog >>> device firing, management software for the host really wants to know about >>> that (so that appropriate alerts/action can be taken) and thus needs to >>> be able to distinguish this from a "normal" guest OS initiated reboot. >>> >>> >> I think that's an argument for having the watchdog events independent of >> the reset events. >> >> The watchdog condition happening is not directly related to the action >> the watchdog takes. The watchdog event really belongs in a class events >> that are closely associated with a particular device emulation. >> >> In fact, I think what we're really missing in events today is a notion >> of a context. A RESET event is really a CPU event. A watchdog >> expiration event is a watchdog event. A connect event is a VNC event >> (Spice and chardevs will also generate connect events). >> > This could be done by adding a 'context' member to all the events and > then an event would have to be identified by the pair event_name:context. > > This way we can have the same event_name for events in different > contexts. For example: > > { 'event': DISCONNECT, 'context': 'spice', [...] } > > { 'event': DISCONNECT, 'context': 'vnc', [...] } > > Note that today we have VNC_DISCONNECT and will probably have > SPICE_DISCONNECT too. > Which is why we gave ourselves until 0.13 to straighten out the protocol. N.B. in this model, you'd have: { 'event' : 'EXPIRED', 'context': 'watchdog', 'action': 'reset' } /* some arbitrary number of events */ { 'event' : 'RESET', 'context': 'cpu' } And the only reason RESET follows EXPIRED is because action=reset. If action was different, a RESET might not occur. A client needs to see the EXPIRED event, determine whether to expect a RESET event, and if so, wait for the next RESET event to happen. Regards, Anthony Liguori