Re: [Qemu-devel] Re: Two QMP events issues

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Anthony Liguori <anthony@codemonkey.ws>
To: Luiz Capitulino <lcapitulino@redhat.com>
Cc: qemu-devel@nongnu.org, armbru@redhat.com
Subject: Re: [Qemu-devel] Re: Two QMP events issues
Date: Mon, 08 Feb 2010 13:14:24 -0600	[thread overview]
Message-ID: <4B706290.7020104@codemonkey.ws> (raw)
In-Reply-To: <20100208162521.788f9c02@doriath>

On 02/08/2010 12:25 PM, Luiz Capitulino wrote:
> On Mon, 08 Feb 2010 09:13:37 -0600
> Anthony Liguori<anthony@codemonkey.ws>  wrote:
>
>    
>> On 02/08/2010 08:56 AM, Daniel P. Berrange wrote:
>>      
>>> On Mon, Feb 08, 2010 at 08:49:20AM -0600, Anthony Liguori wrote:
>>>
>>>        
>>>> On 02/08/2010 08:12 AM, Daniel P. Berrange wrote:
>>>>
>>>>          
>>>>> For further backgrou, the key end goal here is that in a QMP client, upon
>>>>> receipt of the  'RESET' event, we need to reliably&    immediately determine
>>>>> why it  occurred. eg, triggered by watchdog, or by guest OS request. There
>>>>> are actually 3 possible sequences
>>>>>
>>>>>    - WATCHDOG + action=reset, followed by RESET.  Assuming no intervening
>>>>>      event can occurr, the client can merely record 'WATCHDOG' and interpret
>>>>>      it when it gets the immediately following 'RESET' event
>>>>>
>>>>>    - RESET, followed by WATCHDOG + action=reset. The client doesn't know
>>>>>      the reason for the RESET and can't wait arbitrarily for WATCHDOG since
>>>>>      there might never be one arriving.
>>>>>
>>>>>    - RESET + source=watchdog. Client directly sees the reason
>>>>>
>>>>> The second scenario is the one I'd like us to avoid at all costs, since it
>>>>> will require the client to introduce arbitrary delays in processing events
>>>>> to determine cause. The first is slightly inconvenient, but doable if we
>>>>> can assume no intervening events will occur, between WATCHDOG and the
>>>>> RESET events. The last is obviously simplest for the clients.
>>>>>
>>>>>
>>>>>            
>>>> I really prefer the third option but I'm a little concerned that we're
>>>> throwing events around somewhat haphazardly.
>>>>
>>>> So let me ask, why does a client need to determine when a guest reset
>>>> and why it reset?
>>>>
>>>>          
>>> If a guest OS is repeatedly hanging/crashing resulting in the watchdog
>>> device firing, management software for the host really wants to know about
>>> that (so that appropriate alerts/action can be taken) and thus needs to
>>> be able to distinguish this from a "normal"  guest OS initiated reboot.
>>>
>>>        
>> I think that's an argument for having the watchdog events independent of
>> the reset events.
>>
>> The watchdog condition happening is not directly related to the action
>> the watchdog takes.  The watchdog event really belongs in a class events
>> that are closely associated with a particular device emulation.
>>
>> In fact, I think what we're really missing in events today is a notion
>> of a context.  A RESET event is really a CPU event.  A watchdog
>> expiration event is a watchdog event.  A connect event is a VNC event
>> (Spice and chardevs will also generate connect events).
>>      
>   This could be done by adding a 'context' member to all the events and
> then an event would have to be identified by the pair event_name:context.
>
>   This way we can have the same event_name for events in different
> contexts. For example:
>
> { 'event': DISCONNECT, 'context': 'spice', [...] }
>
> { 'event': DISCONNECT, 'context': 'vnc', [...] }
>
>   Note that today we have VNC_DISCONNECT and will probably have
> SPICE_DISCONNECT too.
>    

Which is why we gave ourselves until 0.13 to straighten out the protocol.

N.B. in this model, you'd have:

{ 'event' : 'EXPIRED', 'context': 'watchdog', 'action': 'reset' }
/* some arbitrary number of events */
{ 'event' : 'RESET', 'context': 'cpu' }

And the only reason RESET follows EXPIRED is because action=reset.  If 
action was different, a RESET might not occur.

A client needs to see the EXPIRED event, determine whether to expect a 
RESET event, and if so, wait for the next RESET event to happen.

Regards,

Anthony Liguori

next prev parent reply	other threads:[~2010-02-08 19:14 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-08 13:41 [Qemu-devel] Two QMP events issues Luiz Capitulino
2010-02-08 14:12 ` [Qemu-devel] " Daniel P. Berrange
2010-02-08 14:49   ` Anthony Liguori
2010-02-08 14:56     ` Daniel P. Berrange
2010-02-08 15:13       ` Anthony Liguori
2010-02-08 18:25         ` Luiz Capitulino
2010-02-08 19:14           ` Anthony Liguori [this message]
2010-02-08 19:59             ` Luiz Capitulino
2010-02-08 20:22               ` Anthony Liguori
2010-02-08 18:19   ` Luiz Capitulino
2010-02-09 19:24   ` Jamie Lokier
2010-02-09 19:32   ` Jamie Lokier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B706290.7020104@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=armbru@redhat.com \
    --cc=lcapitulino@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).