[Qemu-devel] Two QMP events issues

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] Two QMP events issues
@ 2010-02-08 13:41 Luiz Capitulino
  2010-02-08 14:12 ` [Qemu-devel] " Daniel P. Berrange
  0 siblings, 1 reply; 12+ messages in thread
From: Luiz Capitulino @ 2010-02-08 13:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aliguori, armbru

 Hi there,

 I have two not so related QMP events issues two discuss, but I will talk about
them in the same email to avoid starting two threads.

 The first problem is wrt the STOP event. Right now it's only emitted if it's
triggered through qemu_system_vmstop_request(), which afaik will only be
called if CONFIG_IOTHREAD is enabled (nonsense, yes).

 The best fix I can think of is to move the STOP event down to do_vm_stop().
We could even have a 'reason' data member with the string representation of
the EXCP_ macros. Looks like this is the right thing do to.

 There's a problem, though. Migration and block subsystems also do vm_stop(0).
The former's reason seems to be 'stop to be loaded' and the latter is 'can't
continue' on disk errors. Note that the block subsystem already has its own
event for disk errors.

 So, my solution is to not generate the STOP event on vm_stop(0). If any
vm_stop(0) user (eg. migration) wants to generate events they should create
the appropriate EXCP_ macro for that.

 Does this look good?

 The second problem is about the watchdog device. I have been asked to
add events for the watchdog's device actions (see
hw/watchdog.c:watchdog_perform_action()).

 Issue is: most of those events directly map to QEMU's events already
generated by QMP, such as RESET, SHUTDOWN, POWEROFF etc.

 We have two solutions:

1. Introduce watchdog's own events. This is easy to do, but will
generate two QMP events for most actions. Eg. the watchdog's WDT_RESET
action will generate a QMP event for WDT_RESET and will generate
another RESET event when this action takes place in QEMU

2. Add a 'source' data member to all events requested via the
qemu_system_* functions, so that we can have a 'wachtdog' source and
only one event is triggered. This will require a more complex change
and maybe some hacks will be needed (eg. for vm_stop())

 Opinions?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Qemu-devel] Re: Two QMP events issues
  2010-02-08 13:41 [Qemu-devel] Two QMP events issues Luiz Capitulino
@ 2010-02-08 14:12 ` Daniel P. Berrange
  2010-02-08 14:49   ` Anthony Liguori
                     ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Daniel P. Berrange @ 2010-02-08 14:12 UTC (permalink / raw)
  To: Luiz Capitulino; +Cc: aliguori, qemu-devel, armbru

On Mon, Feb 08, 2010 at 11:41:45AM -0200, Luiz Capitulino wrote:
> 
>  Hi there,
> 
>  I have two not so related QMP events issues two discuss, but I will talk about
> them in the same email to avoid starting two threads.
> 
>  The first problem is wrt the STOP event. Right now it's only emitted if it's
> triggered through qemu_system_vmstop_request(), which afaik will only be
> called if CONFIG_IOTHREAD is enabled (nonsense, yes).
> 
>  The best fix I can think of is to move the STOP event down to do_vm_stop().
> We could even have a 'reason' data member with the string representation of
> the EXCP_ macros. Looks like this is the right thing do to.
> 
>  There's a problem, though. Migration and block subsystems also do vm_stop(0).
> The former's reason seems to be 'stop to be loaded' and the latter is 'can't
> continue' on disk errors. Note that the block subsystem already has its own
> event for disk errors.
> 
>  So, my solution is to not generate the STOP event on vm_stop(0). If any
> vm_stop(0) user (eg. migration) wants to generate events they should create
> the appropriate EXCP_ macro for that.
> 
>  Does this look good?
> 
>  The second problem is about the watchdog device. I have been asked to
> add events for the watchdog's device actions (see
> hw/watchdog.c:watchdog_perform_action()).
> 
>  Issue is: most of those events directly map to QEMU's events already
> generated by QMP, such as RESET, SHUTDOWN, POWEROFF etc.
> 
>  We have two solutions:
> 
> 1. Introduce watchdog's own events. This is easy to do, but will
> generate two QMP events for most actions. Eg. the watchdog's WDT_RESET
> action will generate a QMP event for WDT_RESET and will generate
> another RESET event when this action takes place in QEMU
> 
> 2. Add a 'source' data member to all events requested via the
> qemu_system_* functions, so that we can have a 'wachtdog' source and
> only one event is triggered. This will require a more complex change
> and maybe some hacks will be needed (eg. for vm_stop())
> 
>  Opinions?

For further backgrou, the key end goal here is that in a QMP client, upon
receipt of the  'RESET' event, we need to reliably & immediately determine
why it  occurred. eg, triggered by watchdog, or by guest OS request. There
are actually 3 possible sequences

 - WATCHDOG + action=reset, followed by RESET.  Assuming no intervening 
   event can occurr, the client can merely record 'WATCHDOG' and interpret
   it when it gets the immediately following 'RESET' event

 - RESET, followed by WATCHDOG + action=reset. The client doesn't know
   the reason for the RESET and can't wait arbitrarily for WATCHDOG since
   there might never be one arriving.

 - RESET + source=watchdog. Client directly sees the reason

The second scenario is the one I'd like us to avoid at all costs, since it
will require the client to introduce arbitrary delays in processing events
to determine cause. The first is slightly inconvenient, but doable if we 
can assume no intervening events will occur, between WATCHDOG and the
RESET events. The last is obviously simplest for the clients.

This question is also pretty relevant for Luiz's previous posting of disk
block I/O errors, since one of those actions  can result in a PAUSE event

Regards,
Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] Re: Two QMP events issues
  2010-02-08 14:12 ` [Qemu-devel] " Daniel P. Berrange
@ 2010-02-08 14:49   ` Anthony Liguori
  2010-02-08 14:56     ` Daniel P. Berrange
  2010-02-08 18:19   ` Luiz Capitulino
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Anthony Liguori @ 2010-02-08 14:49 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: armbru, qemu-devel, Luiz Capitulino

On 02/08/2010 08:12 AM, Daniel P. Berrange wrote:
>
> For further backgrou, the key end goal here is that in a QMP client, upon
> receipt of the  'RESET' event, we need to reliably&  immediately determine
> why it  occurred. eg, triggered by watchdog, or by guest OS request. There
> are actually 3 possible sequences
>
>   - WATCHDOG + action=reset, followed by RESET.  Assuming no intervening
>     event can occurr, the client can merely record 'WATCHDOG' and interpret
>     it when it gets the immediately following 'RESET' event
>
>   - RESET, followed by WATCHDOG + action=reset. The client doesn't know
>     the reason for the RESET and can't wait arbitrarily for WATCHDOG since
>     there might never be one arriving.
>
>   - RESET + source=watchdog. Client directly sees the reason
>
> The second scenario is the one I'd like us to avoid at all costs, since it
> will require the client to introduce arbitrary delays in processing events
> to determine cause. The first is slightly inconvenient, but doable if we
> can assume no intervening events will occur, between WATCHDOG and the
> RESET events. The last is obviously simplest for the clients.
>    

I really prefer the third option but I'm a little concerned that we're 
throwing events around somewhat haphazardly.

So let me ask, why does a client need to determine when a guest reset 
and why it reset?

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] Re: Two QMP events issues
  2010-02-08 14:49   ` Anthony Liguori
@ 2010-02-08 14:56     ` Daniel P. Berrange
  2010-02-08 15:13       ` Anthony Liguori
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel P. Berrange @ 2010-02-08 14:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: armbru, qemu-devel, Luiz Capitulino

On Mon, Feb 08, 2010 at 08:49:20AM -0600, Anthony Liguori wrote:
> On 02/08/2010 08:12 AM, Daniel P. Berrange wrote:
> >
> >For further backgrou, the key end goal here is that in a QMP client, upon
> >receipt of the  'RESET' event, we need to reliably&  immediately determine
> >why it  occurred. eg, triggered by watchdog, or by guest OS request. There
> >are actually 3 possible sequences
> >
> >  - WATCHDOG + action=reset, followed by RESET.  Assuming no intervening
> >    event can occurr, the client can merely record 'WATCHDOG' and interpret
> >    it when it gets the immediately following 'RESET' event
> >
> >  - RESET, followed by WATCHDOG + action=reset. The client doesn't know
> >    the reason for the RESET and can't wait arbitrarily for WATCHDOG since
> >    there might never be one arriving.
> >
> >  - RESET + source=watchdog. Client directly sees the reason
> >
> >The second scenario is the one I'd like us to avoid at all costs, since it
> >will require the client to introduce arbitrary delays in processing events
> >to determine cause. The first is slightly inconvenient, but doable if we
> >can assume no intervening events will occur, between WATCHDOG and the
> >RESET events. The last is obviously simplest for the clients.
> >   
> 
> I really prefer the third option but I'm a little concerned that we're 
> throwing events around somewhat haphazardly.
> 
> So let me ask, why does a client need to determine when a guest reset 
> and why it reset?

If a guest OS is repeatedly hanging/crashing resulting in the watchdog 
device firing, management software for the host really wants to know about
that (so that appropriate alerts/action can be taken) and thus needs to 
be able to distinguish this from a "normal"  guest OS initiated reboot.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] Re: Two QMP events issues
  2010-02-08 14:56     ` Daniel P. Berrange
@ 2010-02-08 15:13       ` Anthony Liguori
  2010-02-08 18:25         ` Luiz Capitulino
  0 siblings, 1 reply; 12+ messages in thread
From: Anthony Liguori @ 2010-02-08 15:13 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: armbru, qemu-devel, Luiz Capitulino

On 02/08/2010 08:56 AM, Daniel P. Berrange wrote:
> On Mon, Feb 08, 2010 at 08:49:20AM -0600, Anthony Liguori wrote:
>    
>> On 02/08/2010 08:12 AM, Daniel P. Berrange wrote:
>>      
>>> For further backgrou, the key end goal here is that in a QMP client, upon
>>> receipt of the  'RESET' event, we need to reliably&   immediately determine
>>> why it  occurred. eg, triggered by watchdog, or by guest OS request. There
>>> are actually 3 possible sequences
>>>
>>>   - WATCHDOG + action=reset, followed by RESET.  Assuming no intervening
>>>     event can occurr, the client can merely record 'WATCHDOG' and interpret
>>>     it when it gets the immediately following 'RESET' event
>>>
>>>   - RESET, followed by WATCHDOG + action=reset. The client doesn't know
>>>     the reason for the RESET and can't wait arbitrarily for WATCHDOG since
>>>     there might never be one arriving.
>>>
>>>   - RESET + source=watchdog. Client directly sees the reason
>>>
>>> The second scenario is the one I'd like us to avoid at all costs, since it
>>> will require the client to introduce arbitrary delays in processing events
>>> to determine cause. The first is slightly inconvenient, but doable if we
>>> can assume no intervening events will occur, between WATCHDOG and the
>>> RESET events. The last is obviously simplest for the clients.
>>>
>>>        
>> I really prefer the third option but I'm a little concerned that we're
>> throwing events around somewhat haphazardly.
>>
>> So let me ask, why does a client need to determine when a guest reset
>> and why it reset?
>>      
> If a guest OS is repeatedly hanging/crashing resulting in the watchdog
> device firing, management software for the host really wants to know about
> that (so that appropriate alerts/action can be taken) and thus needs to
> be able to distinguish this from a "normal"  guest OS initiated reboot.
>    

I think that's an argument for having the watchdog events independent of 
the reset events.

The watchdog condition happening is not directly related to the action 
the watchdog takes.  The watchdog event really belongs in a class events 
that are closely associated with a particular device emulation.

In fact, I think what we're really missing in events today is a notion 
of a context.  A RESET event is really a CPU event.  A watchdog 
expiration event is a watchdog event.  A connect event is a VNC event 
(Spice and chardevs will also generate connect events).

Including what the current action is in the watchdog expiration event is 
certainly reasonable although not strictly necessary.

Regards,

Anthony Liguori

> Regards,
> Daniel
>    

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Qemu-devel] Re: Two QMP events issues
  2010-02-08 14:12 ` [Qemu-devel] " Daniel P. Berrange
  2010-02-08 14:49   ` Anthony Liguori
@ 2010-02-08 18:19   ` Luiz Capitulino
  2010-02-09 19:24   ` Jamie Lokier
  2010-02-09 19:32   ` Jamie Lokier
  3 siblings, 0 replies; 12+ messages in thread
From: Luiz Capitulino @ 2010-02-08 18:19 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: aliguori, qemu-devel, armbru

On Mon, 8 Feb 2010 14:12:20 +0000
"Daniel P. Berrange" <berrange@redhat.com> wrote:

> On Mon, Feb 08, 2010 at 11:41:45AM -0200, Luiz Capitulino wrote:
> > 
> >  Hi there,
> > 
> >  I have two not so related QMP events issues two discuss, but I will talk about
> > them in the same email to avoid starting two threads.
> > 
> >  The first problem is wrt the STOP event. Right now it's only emitted if it's
> > triggered through qemu_system_vmstop_request(), which afaik will only be
> > called if CONFIG_IOTHREAD is enabled (nonsense, yes).
> > 
> >  The best fix I can think of is to move the STOP event down to do_vm_stop().
> > We could even have a 'reason' data member with the string representation of
> > the EXCP_ macros. Looks like this is the right thing do to.
> > 
> >  There's a problem, though. Migration and block subsystems also do vm_stop(0).
> > The former's reason seems to be 'stop to be loaded' and the latter is 'can't
> > continue' on disk errors. Note that the block subsystem already has its own
> > event for disk errors.
> > 
> >  So, my solution is to not generate the STOP event on vm_stop(0). If any
> > vm_stop(0) user (eg. migration) wants to generate events they should create
> > the appropriate EXCP_ macro for that.
> > 
> >  Does this look good?
> > 
> >  The second problem is about the watchdog device. I have been asked to
> > add events for the watchdog's device actions (see
> > hw/watchdog.c:watchdog_perform_action()).
> > 
> >  Issue is: most of those events directly map to QEMU's events already
> > generated by QMP, such as RESET, SHUTDOWN, POWEROFF etc.
> > 
> >  We have two solutions:
> > 
> > 1. Introduce watchdog's own events. This is easy to do, but will
> > generate two QMP events for most actions. Eg. the watchdog's WDT_RESET
> > action will generate a QMP event for WDT_RESET and will generate
> > another RESET event when this action takes place in QEMU
> > 
> > 2. Add a 'source' data member to all events requested via the
> > qemu_system_* functions, so that we can have a 'wachtdog' source and
> > only one event is triggered. This will require a more complex change
> > and maybe some hacks will be needed (eg. for vm_stop())
> > 
> >  Opinions?
> 
> For further backgrou, the key end goal here is that in a QMP client, upon
> receipt of the  'RESET' event, we need to reliably & immediately determine
> why it  occurred. eg, triggered by watchdog, or by guest OS request. There
> are actually 3 possible sequences
> 
>  - WATCHDOG + action=reset, followed by RESET.  Assuming no intervening 
>    event can occurr, the client can merely record 'WATCHDOG' and interpret
>    it when it gets the immediately following 'RESET' event

 I don't think we can guarantee that, as there a number of events that
can happen between the two (eg. VNC events, disk errors etc).

>  - RESET, followed by WATCHDOG + action=reset. The client doesn't know
>    the reason for the RESET and can't wait arbitrarily for WATCHDOG since
>    there might never be one arriving.

 This is possible :(

>  - RESET + source=watchdog. Client directly sees the reason
> 
> The second scenario is the one I'd like us to avoid at all costs, since it
> will require the client to introduce arbitrary delays in processing events
> to determine cause. The first is slightly inconvenient, but doable if we 
> can assume no intervening events will occur, between WATCHDOG and the
> RESET events. The last is obviously simplest for the clients.
> 
> This question is also pretty relevant for Luiz's previous posting of disk
> block I/O errors, since one of those actions  can result in a PAUSE event

 Not exactly. The first part my original email describes the low-level
part of this problem.

 First, the block I/O error event may not stop the VM, so I think even if
we implement the third scenario, we should keep the block's event separated.

 Second, the block layer uses vm_stop(0) to stop the VM, according to the
description in this email I plan not to generate the STOP event when
vm_stop()'s argument is 0.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] Re: Two QMP events issues
  2010-02-08 15:13       ` Anthony Liguori
@ 2010-02-08 18:25         ` Luiz Capitulino
  2010-02-08 19:14           ` Anthony Liguori
  0 siblings, 1 reply; 12+ messages in thread
From: Luiz Capitulino @ 2010-02-08 18:25 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, armbru

On Mon, 08 Feb 2010 09:13:37 -0600
Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 02/08/2010 08:56 AM, Daniel P. Berrange wrote:
> > On Mon, Feb 08, 2010 at 08:49:20AM -0600, Anthony Liguori wrote:
> >    
> >> On 02/08/2010 08:12 AM, Daniel P. Berrange wrote:
> >>      
> >>> For further backgrou, the key end goal here is that in a QMP client, upon
> >>> receipt of the  'RESET' event, we need to reliably&   immediately determine
> >>> why it  occurred. eg, triggered by watchdog, or by guest OS request. There
> >>> are actually 3 possible sequences
> >>>
> >>>   - WATCHDOG + action=reset, followed by RESET.  Assuming no intervening
> >>>     event can occurr, the client can merely record 'WATCHDOG' and interpret
> >>>     it when it gets the immediately following 'RESET' event
> >>>
> >>>   - RESET, followed by WATCHDOG + action=reset. The client doesn't know
> >>>     the reason for the RESET and can't wait arbitrarily for WATCHDOG since
> >>>     there might never be one arriving.
> >>>
> >>>   - RESET + source=watchdog. Client directly sees the reason
> >>>
> >>> The second scenario is the one I'd like us to avoid at all costs, since it
> >>> will require the client to introduce arbitrary delays in processing events
> >>> to determine cause. The first is slightly inconvenient, but doable if we
> >>> can assume no intervening events will occur, between WATCHDOG and the
> >>> RESET events. The last is obviously simplest for the clients.
> >>>
> >>>        
> >> I really prefer the third option but I'm a little concerned that we're
> >> throwing events around somewhat haphazardly.
> >>
> >> So let me ask, why does a client need to determine when a guest reset
> >> and why it reset?
> >>      
> > If a guest OS is repeatedly hanging/crashing resulting in the watchdog
> > device firing, management software for the host really wants to know about
> > that (so that appropriate alerts/action can be taken) and thus needs to
> > be able to distinguish this from a "normal"  guest OS initiated reboot.
> >    
> 
> I think that's an argument for having the watchdog events independent of 
> the reset events.
> 
> The watchdog condition happening is not directly related to the action 
> the watchdog takes.  The watchdog event really belongs in a class events 
> that are closely associated with a particular device emulation.
> 
> In fact, I think what we're really missing in events today is a notion 
> of a context.  A RESET event is really a CPU event.  A watchdog 
> expiration event is a watchdog event.  A connect event is a VNC event 
> (Spice and chardevs will also generate connect events).

 This could be done by adding a 'context' member to all the events and
then an event would have to be identified by the pair event_name:context.

 This way we can have the same event_name for events in different
contexts. For example:

{ 'event': DISCONNECT, 'context': 'spice', [...] }

{ 'event': DISCONNECT, 'context': 'vnc', [...] }

 Note that today we have VNC_DISCONNECT and will probably have
SPICE_DISCONNECT too.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] Re: Two QMP events issues
  2010-02-08 18:25         ` Luiz Capitulino
@ 2010-02-08 19:14           ` Anthony Liguori
  2010-02-08 19:59             ` Luiz Capitulino
  0 siblings, 1 reply; 12+ messages in thread
From: Anthony Liguori @ 2010-02-08 19:14 UTC (permalink / raw)
  To: Luiz Capitulino; +Cc: qemu-devel, armbru

On 02/08/2010 12:25 PM, Luiz Capitulino wrote:
> On Mon, 08 Feb 2010 09:13:37 -0600
> Anthony Liguori<anthony@codemonkey.ws>  wrote:
>
>    
>> On 02/08/2010 08:56 AM, Daniel P. Berrange wrote:
>>      
>>> On Mon, Feb 08, 2010 at 08:49:20AM -0600, Anthony Liguori wrote:
>>>
>>>        
>>>> On 02/08/2010 08:12 AM, Daniel P. Berrange wrote:
>>>>
>>>>          
>>>>> For further backgrou, the key end goal here is that in a QMP client, upon
>>>>> receipt of the  'RESET' event, we need to reliably&    immediately determine
>>>>> why it  occurred. eg, triggered by watchdog, or by guest OS request. There
>>>>> are actually 3 possible sequences
>>>>>
>>>>>    - WATCHDOG + action=reset, followed by RESET.  Assuming no intervening
>>>>>      event can occurr, the client can merely record 'WATCHDOG' and interpret
>>>>>      it when it gets the immediately following 'RESET' event
>>>>>
>>>>>    - RESET, followed by WATCHDOG + action=reset. The client doesn't know
>>>>>      the reason for the RESET and can't wait arbitrarily for WATCHDOG since
>>>>>      there might never be one arriving.
>>>>>
>>>>>    - RESET + source=watchdog. Client directly sees the reason
>>>>>
>>>>> The second scenario is the one I'd like us to avoid at all costs, since it
>>>>> will require the client to introduce arbitrary delays in processing events
>>>>> to determine cause. The first is slightly inconvenient, but doable if we
>>>>> can assume no intervening events will occur, between WATCHDOG and the
>>>>> RESET events. The last is obviously simplest for the clients.
>>>>>
>>>>>
>>>>>            
>>>> I really prefer the third option but I'm a little concerned that we're
>>>> throwing events around somewhat haphazardly.
>>>>
>>>> So let me ask, why does a client need to determine when a guest reset
>>>> and why it reset?
>>>>
>>>>          
>>> If a guest OS is repeatedly hanging/crashing resulting in the watchdog
>>> device firing, management software for the host really wants to know about
>>> that (so that appropriate alerts/action can be taken) and thus needs to
>>> be able to distinguish this from a "normal"  guest OS initiated reboot.
>>>
>>>        
>> I think that's an argument for having the watchdog events independent of
>> the reset events.
>>
>> The watchdog condition happening is not directly related to the action
>> the watchdog takes.  The watchdog event really belongs in a class events
>> that are closely associated with a particular device emulation.
>>
>> In fact, I think what we're really missing in events today is a notion
>> of a context.  A RESET event is really a CPU event.  A watchdog
>> expiration event is a watchdog event.  A connect event is a VNC event
>> (Spice and chardevs will also generate connect events).
>>      
>   This could be done by adding a 'context' member to all the events and
> then an event would have to be identified by the pair event_name:context.
>
>   This way we can have the same event_name for events in different
> contexts. For example:
>
> { 'event': DISCONNECT, 'context': 'spice', [...] }
>
> { 'event': DISCONNECT, 'context': 'vnc', [...] }
>
>   Note that today we have VNC_DISCONNECT and will probably have
> SPICE_DISCONNECT too.
>    

Which is why we gave ourselves until 0.13 to straighten out the protocol.

N.B. in this model, you'd have:

{ 'event' : 'EXPIRED', 'context': 'watchdog', 'action': 'reset' }
/* some arbitrary number of events */
{ 'event' : 'RESET', 'context': 'cpu' }

And the only reason RESET follows EXPIRED is because action=reset.  If 
action was different, a RESET might not occur.

A client needs to see the EXPIRED event, determine whether to expect a 
RESET event, and if so, wait for the next RESET event to happen.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] Re: Two QMP events issues
  2010-02-08 19:14           ` Anthony Liguori
@ 2010-02-08 19:59             ` Luiz Capitulino
  2010-02-08 20:22               ` Anthony Liguori
  0 siblings, 1 reply; 12+ messages in thread
From: Luiz Capitulino @ 2010-02-08 19:59 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, armbru

On Mon, 08 Feb 2010 13:14:24 -0600
Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 02/08/2010 12:25 PM, Luiz Capitulino wrote:
> > On Mon, 08 Feb 2010 09:13:37 -0600
> > Anthony Liguori<anthony@codemonkey.ws>  wrote:
> >
> >    
> >> On 02/08/2010 08:56 AM, Daniel P. Berrange wrote:
> >>      
> >>> On Mon, Feb 08, 2010 at 08:49:20AM -0600, Anthony Liguori wrote:
> >>>
> >>>        
> >>>> On 02/08/2010 08:12 AM, Daniel P. Berrange wrote:
> >>>>
> >>>>          
> >>>>> For further backgrou, the key end goal here is that in a QMP client, upon
> >>>>> receipt of the  'RESET' event, we need to reliably&    immediately determine
> >>>>> why it  occurred. eg, triggered by watchdog, or by guest OS request. There
> >>>>> are actually 3 possible sequences
> >>>>>
> >>>>>    - WATCHDOG + action=reset, followed by RESET.  Assuming no intervening
> >>>>>      event can occurr, the client can merely record 'WATCHDOG' and interpret
> >>>>>      it when it gets the immediately following 'RESET' event
> >>>>>
> >>>>>    - RESET, followed by WATCHDOG + action=reset. The client doesn't know
> >>>>>      the reason for the RESET and can't wait arbitrarily for WATCHDOG since
> >>>>>      there might never be one arriving.
> >>>>>
> >>>>>    - RESET + source=watchdog. Client directly sees the reason
> >>>>>
> >>>>> The second scenario is the one I'd like us to avoid at all costs, since it
> >>>>> will require the client to introduce arbitrary delays in processing events
> >>>>> to determine cause. The first is slightly inconvenient, but doable if we
> >>>>> can assume no intervening events will occur, between WATCHDOG and the
> >>>>> RESET events. The last is obviously simplest for the clients.
> >>>>>
> >>>>>
> >>>>>            
> >>>> I really prefer the third option but I'm a little concerned that we're
> >>>> throwing events around somewhat haphazardly.
> >>>>
> >>>> So let me ask, why does a client need to determine when a guest reset
> >>>> and why it reset?
> >>>>
> >>>>          
> >>> If a guest OS is repeatedly hanging/crashing resulting in the watchdog
> >>> device firing, management software for the host really wants to know about
> >>> that (so that appropriate alerts/action can be taken) and thus needs to
> >>> be able to distinguish this from a "normal"  guest OS initiated reboot.
> >>>
> >>>        
> >> I think that's an argument for having the watchdog events independent of
> >> the reset events.
> >>
> >> The watchdog condition happening is not directly related to the action
> >> the watchdog takes.  The watchdog event really belongs in a class events
> >> that are closely associated with a particular device emulation.
> >>
> >> In fact, I think what we're really missing in events today is a notion
> >> of a context.  A RESET event is really a CPU event.  A watchdog
> >> expiration event is a watchdog event.  A connect event is a VNC event
> >> (Spice and chardevs will also generate connect events).
> >>      
> >   This could be done by adding a 'context' member to all the events and
> > then an event would have to be identified by the pair event_name:context.
> >
> >   This way we can have the same event_name for events in different
> > contexts. For example:
> >
> > { 'event': DISCONNECT, 'context': 'spice', [...] }
> >
> > { 'event': DISCONNECT, 'context': 'vnc', [...] }
> >
> >   Note that today we have VNC_DISCONNECT and will probably have
> > SPICE_DISCONNECT too.
> >    
> 
> Which is why we gave ourselves until 0.13 to straighten out the protocol.

 Yeah.

> N.B. in this model, you'd have:
> 
> { 'event' : 'EXPIRED', 'context': 'watchdog', 'action': 'reset' }
> /* some arbitrary number of events */
> { 'event' : 'RESET', 'context': 'cpu' }
> 
> And the only reason RESET follows EXPIRED is because action=reset.  If 
> action was different, a RESET might not occur.
> 
> A client needs to see the EXPIRED event, determine whether to expect a 
> RESET event, and if so, wait for the next RESET event to happen.

 Looks reasonable to me, what do think Daniel?

 Note that if we agree on the 'context design', I'll have to change
VNC's events names..

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] Re: Two QMP events issues
  2010-02-08 19:59             ` Luiz Capitulino
@ 2010-02-08 20:22               ` Anthony Liguori
  0 siblings, 0 replies; 12+ messages in thread
From: Anthony Liguori @ 2010-02-08 20:22 UTC (permalink / raw)
  To: Luiz Capitulino; +Cc: qemu-devel, armbru

On 02/08/2010 01:59 PM, Luiz Capitulino wrote:
>
>   Looks reasonable to me, what do think Daniel?
>
>   Note that if we agree on the 'context design', I'll have to change
> VNC's events names..
>    

Let me give you a few suggestions before diving into it.  context might 
not be the best name.

For event generated by devices, the event should be raised with 
something like qdev_event(&s->dev, QMPEV_WD_EXPIRED, ...).

The context argument should allow a client to determine which device 
raised the event.  So it could be a combination of the device's qdev 
name and it's id.

For event generated by non-qdev mechanisms, we should try our best to 
associate context with that event.  For instance, a DISCONNECT event 
happens to a particular session.  We don't quite have VNC session ids 
yet but if we did, it would make sense to include that in the context info.

So my thinking is that we don't just want context to serve as a 
classification mechanism, but we want it to indicate what 
subsystem/device/session generated the event.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] Re: Two QMP events issues
  2010-02-08 14:12 ` [Qemu-devel] " Daniel P. Berrange
  2010-02-08 14:49   ` Anthony Liguori
  2010-02-08 18:19   ` Luiz Capitulino
@ 2010-02-09 19:24   ` Jamie Lokier
  2010-02-09 19:32   ` Jamie Lokier
  3 siblings, 0 replies; 12+ messages in thread
From: Jamie Lokier @ 2010-02-09 19:24 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: armbru, aliguori, qemu-devel, Luiz Capitulino

Daniel P. Berrange wrote:
> For further backgrou, the key end goal here is that in a QMP client, upon
> receipt of the  'RESET' event, we need to reliably & immediately determine
> why it  occurred. eg, triggered by watchdog, or by guest OS request. There
> are actually 3 possible sequences
> 
>  - WATCHDOG + action=reset, followed by RESET.  Assuming no intervening 
>    event can occurr, the client can merely record 'WATCHDOG' and interpret
>    it when it gets the immediately following 'RESET' event

WATCHDOG is useful in it's own right.  For example, a manager may
decide itself what action to take - such as resetting on the first
three watchdog triggers and then stopping the vm without reset - so
there wouldn't be any other event from qemu about the watchdog.

Because WATCHDOG is useful in some circumstances, I think for
consistency it should always be emitted.

>  - RESET, followed by WATCHDOG + action=reset. The client doesn't know
>    the reason for the RESET and can't wait arbitrarily for WATCHDOG since
>    there might never be one arriving.

Bad.  Avoid :-)

Actually, if there is a problem maintaining event order, this would be
ok as long as RESET includes the reason - then the listener knows to
wait for the WATCHDOG event.

>  - RESET + source=watchdog. Client directly sees the reason

I think this is good, but it should be preceded by the WATCHDOG event as all.

So:

    WATCHDOG action=reset
    RESET reason=watchdog

By the way, if a listener attaches to qemu in the middle of this
operation, is it possible for it to receive one event but not the
other due to timing?

It might make sense to add the concept of "group of events" if this
could be a problem.

> The second scenario is the one I'd like us to avoid at all costs, since it
> will require the client to introduce arbitrary delays in processing events
> to determine cause. The first is slightly inconvenient, but doable if we 
> can assume no intervening events will occur, between WATCHDOG and the
> RESET events. The last is obviously simplest for the clients.

The last isn't simple for clients that want to know when the watchdog
triggers, independent of reason.  They would have to look for
different kinds of events, depending on how the watchdog is configured.

And, perhaps more importantly, they wouldn't work if more
action-options were added to the watchdog device.

-- Jamie

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] Re: Two QMP events issues
  2010-02-08 14:12 ` [Qemu-devel] " Daniel P. Berrange
                     ` (2 preceding siblings ...)
  2010-02-09 19:24   ` Jamie Lokier
@ 2010-02-09 19:32   ` Jamie Lokier
  3 siblings, 0 replies; 12+ messages in thread
From: Jamie Lokier @ 2010-02-09 19:32 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: armbru, aliguori, qemu-devel, Luiz Capitulino

Daniel P. Berrange wrote:
> For further backgrou, the key end goal here is that in a QMP client, upon
> receipt of the  'RESET' event, we need to reliably & immediately determine
> why it  occurred. eg, triggered by watchdog, or by guest OS request. There
> are actually 3 possible sequences

Note that some on hardware, the OS requests a normal reset by setting
the watchdog to a short timeout and waiting :-)

-- Jamie

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-02-09 19:32 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-08 13:41 [Qemu-devel] Two QMP events issues Luiz Capitulino
2010-02-08 14:12 ` [Qemu-devel] " Daniel P. Berrange
2010-02-08 14:49   ` Anthony Liguori
2010-02-08 14:56     ` Daniel P. Berrange
2010-02-08 15:13       ` Anthony Liguori
2010-02-08 18:25         ` Luiz Capitulino
2010-02-08 19:14           ` Anthony Liguori
2010-02-08 19:59             ` Luiz Capitulino
2010-02-08 20:22               ` Anthony Liguori
2010-02-08 18:19   ` Luiz Capitulino
2010-02-09 19:24   ` Jamie Lokier
2010-02-09 19:32   ` Jamie Lokier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).