xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS?
@ 2013-10-28 15:26 Luwei Cheng
  2013-10-28 15:51 ` Roger Pau Monné
  2013-10-29 15:21 ` David Vrabel
  0 siblings, 2 replies; 26+ messages in thread
From: Luwei Cheng @ 2013-10-28 15:26 UTC (permalink / raw)
  To: xen-devel; +Cc: George.Dunlap, wei.liu2, david.vrabel


[-- Attachment #1.1: Type: text/plain, Size: 1758 bytes --]

This following idea was first discussed with George Dunlap, David Vrabel
and Wei Liu in XenDevSummit13. Many thanks for their encouragement to
post this idea to the community for a wider discussion.

[Current Design]
Each event channel is associated with only “one” notified vCPU: one-to-one.

[Problem]
Some events are per-vCPU (such as local timer interrupts) while some others
are per-OS (such as I/O interrupts: network and disk).
For SMP-VMs, it is possible that when one vCPU is waiting in the scheduling
queue, another vCPU is running. So, if the I/O events can be dynamically
routed to the running vCPU, the events can be processed quickly, without
suffering from VM scheduling delays (tens of milliseconds). On the other
hand, no reschedule operations are introduced.

Though users can set IRQ affinity in the guest OS, the current
implementation forces to bind the IRQ to the first vCPU of the
affinity mask [events.c: set_affinity_irq].
If the hypervisor delivers the event to a different vCPU, the event
will get lost because the guest OS has masked out this event in all
non-notified vCPUs [events.c: bind_evtchn_to_cpu].

[New Design]
For per-OS event channel, add “vCPU affinity” support: one-to-many.
The “affinity” should be consistent with the ‘/proc/irq/#/smp_affinity’ of
the
guest OS and users can change the mapping at runtime. But by default,
all vCPUs should be enabled to serve I/O.

When such flexibility is enabled, I/O balancing among vCPUs can be
offloaded to the hypervisor. “irqbalance” is designed for physical
SMP systems, not virtual SMP systems.

Any comments are welcome!

Thanks,
Luwei
--
Mr. CHENG Luwei, PhD Candidate
Department of Computer Science
The University of Hong Kong

[-- Attachment #1.2: Type: text/html, Size: 2233 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS?
  2013-10-28 15:26 [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS? Luwei Cheng
@ 2013-10-28 15:51 ` Roger Pau Monné
  2013-10-29  2:56   ` Luwei Cheng
  2013-10-29 15:21 ` David Vrabel
  1 sibling, 1 reply; 26+ messages in thread
From: Roger Pau Monné @ 2013-10-28 15:51 UTC (permalink / raw)
  To: Luwei Cheng, xen-devel; +Cc: George.Dunlap, wei.liu2, david.vrabel

On 28/10/13 16:26, Luwei Cheng wrote:
> This following idea was first discussed with George Dunlap, David Vrabel 
> and Wei Liu in XenDevSummit13. Many thanks for their encouragement to 
> post this idea to the community for a wider discussion.
> 
> [Current Design]
> Each event channel is associated with only “one” notified vCPU: one-to-one.
> 
> [Problem]
> Some events are per-vCPU (such as local timer interrupts) while some others 
> are per-OS (such as I/O interrupts: network and disk). 
> For SMP-VMs, it is possible that when one vCPU is waiting in the scheduling 
> queue, another vCPU is running. So, if the I/O events can be dynamically 
> routed to the running vCPU, the events can be processed quickly, without 
> suffering from VM scheduling delays (tens of milliseconds). On the other 
> hand, no reschedule operations are introduced.
> 
> Though users can set IRQ affinity in the guest OS, the current 
> implementation forces to bind the IRQ to the first vCPU of the 
> affinity mask [events.c: set_affinity_irq].
> If the hypervisor delivers the event to a different vCPU, the event 
> will get lost because the guest OS has masked out this event in all 
> non-notified vCPUs [events.c: bind_evtchn_to_cpu].
> 
> [New Design]
> For per-OS event channel, add “vCPU affinity” support: one-to-many.
> The “affinity” should be consistent with the ‘/proc/irq/#/smp_affinity’
> of the 
> guest OS and users can change the mapping at runtime. But by default, 
> all vCPUs should be enabled to serve I/O.
> 
> When such flexibility is enabled, I/O balancing among vCPUs can be 
> offloaded to the hypervisor. “irqbalance” is designed for physical 
> SMP systems, not virtual SMP systems.
> 
> Any comments are welcome!

Hello,

Looks interesting, but if IO events can indeed fire on any CPU, isn't
this going to introduce locking (and contention) on the event channel
upcall handler in order to prevent two (or more) CPUs from firing the
same event concurrently?

Roger.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS?
  2013-10-28 15:51 ` Roger Pau Monné
@ 2013-10-29  2:56   ` Luwei Cheng
  2013-10-29  8:19     ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Luwei Cheng @ 2013-10-29  2:56 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: George Dunlap, xen-devel, wei.liu2, david.vrabel


[-- Attachment #1.1: Type: text/plain, Size: 2410 bytes --]

On Mon, Oct 28, 2013 at 11:51 PM, Roger Pau Monné <roger.pau@citrix.com>wrote:

> On 28/10/13 16:26, Luwei Cheng wrote:
> > This following idea was first discussed with George Dunlap, David Vrabel
> > and Wei Liu in XenDevSummit13. Many thanks for their encouragement to
> > post this idea to the community for a wider discussion.
> >
> > [Current Design]
> > Each event channel is associated with only “one” notified vCPU:
> one-to-one.
> >
> > [Problem]
> > Some events are per-vCPU (such as local timer interrupts) while some
> others
> > are per-OS (such as I/O interrupts: network and disk).
> > For SMP-VMs, it is possible that when one vCPU is waiting in the
> scheduling
> > queue, another vCPU is running. So, if the I/O events can be dynamically
> > routed to the running vCPU, the events can be processed quickly, without
> > suffering from VM scheduling delays (tens of milliseconds). On the other
> > hand, no reschedule operations are introduced.
> >
> > Though users can set IRQ affinity in the guest OS, the current
> > implementation forces to bind the IRQ to the first vCPU of the
> > affinity mask [events.c: set_affinity_irq].
> > If the hypervisor delivers the event to a different vCPU, the event
> > will get lost because the guest OS has masked out this event in all
> > non-notified vCPUs [events.c: bind_evtchn_to_cpu].
> >
> > [New Design]
> > For per-OS event channel, add “vCPU affinity” support: one-to-many.
> > The “affinity” should be consistent with the ‘/proc/irq/#/smp_affinity’
> > of the
> > guest OS and users can change the mapping at runtime. But by default,
> > all vCPUs should be enabled to serve I/O.
> >
> > When such flexibility is enabled, I/O balancing among vCPUs can be
> > offloaded to the hypervisor. “irqbalance” is designed for physical
> > SMP systems, not virtual SMP systems.
> >
> > Any comments are welcome!
>
> Hello,
>
> Looks interesting, but if IO events can indeed fire on any CPU, isn't
> this going to introduce locking (and contention) on the event channel
> upcall handler in order to prevent two (or more) CPUs from firing the
> same event concurrently?
>
> Roger.
>

Hmm.. though all vCPUs can serve the events, the hypervisor delivers the
event to only "one" vCPU at at time, so only that vCPU can see this event.
Analytically no race condition will be introduced.

Thanks,
Luwei

[-- Attachment #1.2: Type: text/html, Size: 3140 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS?
  2013-10-29  2:56   ` Luwei Cheng
@ 2013-10-29  8:19     ` Jan Beulich
  2013-10-29  9:02       ` Luwei Cheng
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2013-10-29  8:19 UTC (permalink / raw)
  To: Luwei Cheng; +Cc: George Dunlap, xen-devel, wei.liu2, david.vrabel, roger.pau

>>> On 29.10.13 at 03:56, Luwei Cheng <chengluwei@gmail.com> wrote:
> Hmm.. though all vCPUs can serve the events, the hypervisor delivers the
> event to only "one" vCPU at at time, so only that vCPU can see this event.
> Analytically no race condition will be introduced.

No - an event is globally pending (at least in the old model, the
situation is better with the new FIFO model), i.e. if more than
one of the guest's vCPU-s allowed to service it would be looking
at it simultaneously, they'd still need to arbitrate which one
ought to handle it.

So your proposed extension might need to be limited to the
FIFO model.

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS?
  2013-10-29  8:19     ` Jan Beulich
@ 2013-10-29  9:02       ` Luwei Cheng
  2013-10-29  9:34         ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Luwei Cheng @ 2013-10-29  9:02 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, xen-devel, wei.liu2, david.vrabel,
	Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 1068 bytes --]

On Tue, Oct 29, 2013 at 4:19 PM, Jan Beulich <JBeulich@suse.com> wrote:

> >>> On 29.10.13 at 03:56, Luwei Cheng <chengluwei@gmail.com> wrote:
> > Hmm.. though all vCPUs can serve the events, the hypervisor delivers the
> > event to only "one" vCPU at at time, so only that vCPU can see this
> event.
> > Analytically no race condition will be introduced.
>
> No - an event is globally pending (at least in the old model, the
> situation is better with the new FIFO model), i.e. if more than
> one of the guest's vCPU-s allowed to service it would be looking
> at it simultaneously, they'd still need to arbitrate which one
> ought to handle it.
>
> So your proposed extension might need to be limited to the
> FIFO model.
>
> Jan
>
> Thanks for your reply. Yes, you are right. My prior description was
incorrect.
When there are more than one vCPUs picking the event, even without
arbitrary, will it cause "correctness" problem? After the event is served
by
the first entered vCPU, and the rest vCPUs just have noting to do in the
event handler (no much harm).

Luwei

[-- Attachment #1.2: Type: text/html, Size: 1671 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS?
  2013-10-29  9:02       ` Luwei Cheng
@ 2013-10-29  9:34         ` Jan Beulich
  2013-10-29  9:49           ` Luwei Cheng
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2013-10-29  9:34 UTC (permalink / raw)
  To: Luwei Cheng; +Cc: George Dunlap, xen-devel, wei.liu2, david.vrabel, roger.pau

>>> On 29.10.13 at 10:02, Luwei Cheng <chengluwei@gmail.com> wrote:
> On Tue, Oct 29, 2013 at 4:19 PM, Jan Beulich <JBeulich@suse.com> wrote:
>> >>> On 29.10.13 at 03:56, Luwei Cheng <chengluwei@gmail.com> wrote:
>> > Hmm.. though all vCPUs can serve the events, the hypervisor delivers the
>> > event to only "one" vCPU at at time, so only that vCPU can see this
>> event.
>> > Analytically no race condition will be introduced.
>>
>> No - an event is globally pending (at least in the old model, the
>> situation is better with the new FIFO model), i.e. if more than
>> one of the guest's vCPU-s allowed to service it would be looking
>> at it simultaneously, they'd still need to arbitrate which one
>> ought to handle it.
>>
>> So your proposed extension might need to be limited to the
>> FIFO model.
> 
> Thanks for your reply. Yes, you are right. My prior description was
> incorrect.
> When there are more than one vCPUs picking the event, even without
> arbitrary, will it cause "correctness" problem? After the event is served by
> the first entered vCPU, and the rest vCPUs just have noting to do in the
> event handler (no much harm).

That really depends on the handler. Plus it might be a performance
and/or latency issue to run handlers that don't need to be run.

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS?
  2013-10-29  9:34         ` Jan Beulich
@ 2013-10-29  9:49           ` Luwei Cheng
  2013-10-29  9:57             ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Luwei Cheng @ 2013-10-29  9:49 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, xen-devel, wei.liu2, david.vrabel,
	Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 1773 bytes --]

On Tue, Oct 29, 2013 at 5:34 PM, Jan Beulich <JBeulich@suse.com> wrote:

> >>> On 29.10.13 at 10:02, Luwei Cheng <chengluwei@gmail.com> wrote:
> > On Tue, Oct 29, 2013 at 4:19 PM, Jan Beulich <JBeulich@suse.com> wrote:
> >> >>> On 29.10.13 at 03:56, Luwei Cheng <chengluwei@gmail.com> wrote:
> >> > Hmm.. though all vCPUs can serve the events, the hypervisor delivers
> the
> >> > event to only "one" vCPU at at time, so only that vCPU can see this
> >> event.
> >> > Analytically no race condition will be introduced.
> >>
> >> No - an event is globally pending (at least in the old model, the
> >> situation is better with the new FIFO model), i.e. if more than
> >> one of the guest's vCPU-s allowed to service it would be looking
> >> at it simultaneously, they'd still need to arbitrate which one
> >> ought to handle it.
> >>
> >> So your proposed extension might need to be limited to the
> >> FIFO model.
> >
> > Thanks for your reply. Yes, you are right. My prior description was
> > incorrect.
> > When there are more than one vCPUs picking the event, even without
> > arbitrary, will it cause "correctness" problem? After the event is
> served by
> > the first entered vCPU, and the rest vCPUs just have noting to do in the
> > event handler (no much harm).
>
> That really depends on the handler. Plus it might be a performance
> and/or latency issue to run handlers that don't need to be run.
>
> Jan
>
> I think the situation is much like IO-APIC routing in physical SMP
systems:
in logical destination mode, all processors can serve I/O interrupts.
Seemingly the current IRQ handlers can deal with it gracefully.
Compared with the potential latency issue, I think the gain of this
approach
is bigger: avoiding vCPU scheduling delays (10x ms).

Thanks,
Luwei

[-- Attachment #1.2: Type: text/html, Size: 2851 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS?
  2013-10-29  9:49           ` Luwei Cheng
@ 2013-10-29  9:57             ` Jan Beulich
  2013-10-29 10:52               ` George Dunlap
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2013-10-29  9:57 UTC (permalink / raw)
  To: Luwei Cheng; +Cc: George Dunlap, xen-devel, wei.liu2, david.vrabel, roger.pau

>>> On 29.10.13 at 10:49, Luwei Cheng <chengluwei@gmail.com> wrote:
> On Tue, Oct 29, 2013 at 5:34 PM, Jan Beulich <JBeulich@suse.com> wrote:
>> >>> On 29.10.13 at 10:02, Luwei Cheng <chengluwei@gmail.com> wrote:
>> > On Tue, Oct 29, 2013 at 4:19 PM, Jan Beulich <JBeulich@suse.com> wrote:
>> >> >>> On 29.10.13 at 03:56, Luwei Cheng <chengluwei@gmail.com> wrote:
>> >> > Hmm.. though all vCPUs can serve the events, the hypervisor delivers the
>> >> > event to only "one" vCPU at at time, so only that vCPU can see this event.
>> >> > Analytically no race condition will be introduced.
>> >>
>> >> No - an event is globally pending (at least in the old model, the
>> >> situation is better with the new FIFO model), i.e. if more than
>> >> one of the guest's vCPU-s allowed to service it would be looking
>> >> at it simultaneously, they'd still need to arbitrate which one
>> >> ought to handle it.
>> >>
>> >> So your proposed extension might need to be limited to the
>> >> FIFO model.
>> >
>> > Thanks for your reply. Yes, you are right. My prior description was
>> > incorrect.
>> > When there are more than one vCPUs picking the event, even without
>> > arbitrary, will it cause "correctness" problem? After the event is
>> served by
>> > the first entered vCPU, and the rest vCPUs just have noting to do in the
>> > event handler (no much harm).
>>
>> That really depends on the handler. Plus it might be a performance
>> and/or latency issue to run handlers that don't need to be run.
> 
> I think the situation is much like IO-APIC routing in physical SMP
> systems:

Indeed, yet you draw the wrong conclusion.

> in logical destination mode, all processors can serve I/O interrupts.

But only one gets delivered any individual instance - there is
arbitration being done in hardware.

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS?
  2013-10-29  9:57             ` Jan Beulich
@ 2013-10-29 10:52               ` George Dunlap
  2013-10-29 11:00                 ` Roger Pau Monné
  2013-10-29 11:22                 ` Jan Beulich
  0 siblings, 2 replies; 26+ messages in thread
From: George Dunlap @ 2013-10-29 10:52 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Luwei Cheng, xen-devel, wei.liu2, david.vrabel, roger.pau

On 10/29/2013 09:57 AM, Jan Beulich wrote:
>>>> On 29.10.13 at 10:49, Luwei Cheng <chengluwei@gmail.com> wrote:
>> On Tue, Oct 29, 2013 at 5:34 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>> On 29.10.13 at 10:02, Luwei Cheng <chengluwei@gmail.com> wrote:
>>>> On Tue, Oct 29, 2013 at 4:19 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>>> On 29.10.13 at 03:56, Luwei Cheng <chengluwei@gmail.com> wrote:
>>>>>> Hmm.. though all vCPUs can serve the events, the hypervisor delivers the
>>>>>> event to only "one" vCPU at at time, so only that vCPU can see this event.
>>>>>> Analytically no race condition will be introduced.
>>>>>
>>>>> No - an event is globally pending (at least in the old model, the
>>>>> situation is better with the new FIFO model), i.e. if more than
>>>>> one of the guest's vCPU-s allowed to service it would be looking
>>>>> at it simultaneously, they'd still need to arbitrate which one
>>>>> ought to handle it.
>>>>>
>>>>> So your proposed extension might need to be limited to the
>>>>> FIFO model.
>>>>
>>>> Thanks for your reply. Yes, you are right. My prior description was
>>>> incorrect.
>>>> When there are more than one vCPUs picking the event, even without
>>>> arbitrary, will it cause "correctness" problem? After the event is
>>> served by
>>>> the first entered vCPU, and the rest vCPUs just have noting to do in the
>>>> event handler (no much harm).
>>>
>>> That really depends on the handler. Plus it might be a performance
>>> and/or latency issue to run handlers that don't need to be run.
>>
>> I think the situation is much like IO-APIC routing in physical SMP
>> systems:
>
> Indeed, yet you draw the wrong conclusion.
>
>> in logical destination mode, all processors can serve I/O interrupts.
>
> But only one gets delivered any individual instance - there is
> arbitration being done in hardware.

Xen should be able to arbitrate which one gets the actual event 
delivery, right?  So the only risk would be that another vcpu would 
notice the pending interrupt and handle it itself.

  -George

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS?
  2013-10-29 10:52               ` George Dunlap
@ 2013-10-29 11:00                 ` Roger Pau Monné
  2013-10-29 14:20                   ` Luwei Cheng
  2013-10-29 11:22                 ` Jan Beulich
  1 sibling, 1 reply; 26+ messages in thread
From: Roger Pau Monné @ 2013-10-29 11:00 UTC (permalink / raw)
  To: George Dunlap, Jan Beulich; +Cc: Luwei Cheng, xen-devel, wei.liu2, david.vrabel

On 29/10/13 11:52, George Dunlap wrote:
> On 10/29/2013 09:57 AM, Jan Beulich wrote:
>>>>> On 29.10.13 at 10:49, Luwei Cheng <chengluwei@gmail.com> wrote:
>>> On Tue, Oct 29, 2013 at 5:34 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>> On 29.10.13 at 10:02, Luwei Cheng <chengluwei@gmail.com> wrote:
>>>>> On Tue, Oct 29, 2013 at 4:19 PM, Jan Beulich <JBeulich@suse.com>
>>>>> wrote:
>>>>>>>>> On 29.10.13 at 03:56, Luwei Cheng <chengluwei@gmail.com> wrote:
>>>>>>> Hmm.. though all vCPUs can serve the events, the hypervisor
>>>>>>> delivers the
>>>>>>> event to only "one" vCPU at at time, so only that vCPU can see
>>>>>>> this event.
>>>>>>> Analytically no race condition will be introduced.
>>>>>>
>>>>>> No - an event is globally pending (at least in the old model, the
>>>>>> situation is better with the new FIFO model), i.e. if more than
>>>>>> one of the guest's vCPU-s allowed to service it would be looking
>>>>>> at it simultaneously, they'd still need to arbitrate which one
>>>>>> ought to handle it.
>>>>>>
>>>>>> So your proposed extension might need to be limited to the
>>>>>> FIFO model.
>>>>>
>>>>> Thanks for your reply. Yes, you are right. My prior description was
>>>>> incorrect.
>>>>> When there are more than one vCPUs picking the event, even without
>>>>> arbitrary, will it cause "correctness" problem? After the event is
>>>> served by
>>>>> the first entered vCPU, and the rest vCPUs just have noting to do
>>>>> in the
>>>>> event handler (no much harm).
>>>>
>>>> That really depends on the handler. Plus it might be a performance
>>>> and/or latency issue to run handlers that don't need to be run.
>>>
>>> I think the situation is much like IO-APIC routing in physical SMP
>>> systems:
>>
>> Indeed, yet you draw the wrong conclusion.
>>
>>> in logical destination mode, all processors can serve I/O interrupts.
>>
>> But only one gets delivered any individual instance - there is
>> arbitration being done in hardware.
> 
> Xen should be able to arbitrate which one gets the actual event
> delivery, right?  So the only risk would be that another vcpu would
> notice the pending interrupt and handle it itself.

If events are no longer assigned to a single CPU there's no guarantee
that the CPU you deliver the event to is the one that's actually going
to handle it, another CPU might be already in the event channel upcall
and stole it from under your feet (or event worse, the event could be
fired on several CPUs at the same time, at least with the current
implementation).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS?
  2013-10-29 10:52               ` George Dunlap
  2013-10-29 11:00                 ` Roger Pau Monné
@ 2013-10-29 11:22                 ` Jan Beulich
  2013-10-29 14:28                   ` Luwei Cheng
  1 sibling, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2013-10-29 11:22 UTC (permalink / raw)
  To: George Dunlap; +Cc: Luwei Cheng, xen-devel, wei.liu2, david.vrabel, roger.pau

>>> On 29.10.13 at 11:52, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> Xen should be able to arbitrate which one gets the actual event 
> delivery, right?  So the only risk would be that another vcpu would 
> notice the pending interrupt and handle it itself.

As said before - for the FIFO model Xen's arbitration would be
sufficient (as long as affinity changes get carried out with
sufficient care), but for the legacy model several vCPU-s might
end up trying to service the event (since the pending bitmap is
per-domain, not per-vCPU)..

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS?
  2013-10-29 11:00                 ` Roger Pau Monné
@ 2013-10-29 14:20                   ` Luwei Cheng
  2013-10-29 14:30                     ` Wei Liu
  0 siblings, 1 reply; 26+ messages in thread
From: Luwei Cheng @ 2013-10-29 14:20 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: George Dunlap, xen-devel, wei.liu2, david.vrabel, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 3380 bytes --]

On Tue, Oct 29, 2013 at 7:00 PM, Roger Pau Monné <roger.pau@citrix.com>wrote:

> On 29/10/13 11:52, George Dunlap wrote:
> > On 10/29/2013 09:57 AM, Jan Beulich wrote:
> >>>>> On 29.10.13 at 10:49, Luwei Cheng <chengluwei@gmail.com> wrote:
> >>> On Tue, Oct 29, 2013 at 5:34 PM, Jan Beulich <JBeulich@suse.com>
> wrote:
> >>>>>>> On 29.10.13 at 10:02, Luwei Cheng <chengluwei@gmail.com> wrote:
> >>>>> On Tue, Oct 29, 2013 at 4:19 PM, Jan Beulich <JBeulich@suse.com>
> >>>>> wrote:
> >>>>>>>>> On 29.10.13 at 03:56, Luwei Cheng <chengluwei@gmail.com> wrote:
> >>>>>>> Hmm.. though all vCPUs can serve the events, the hypervisor
> >>>>>>> delivers the
> >>>>>>> event to only "one" vCPU at at time, so only that vCPU can see
> >>>>>>> this event.
> >>>>>>> Analytically no race condition will be introduced.
> >>>>>>
> >>>>>> No - an event is globally pending (at least in the old model, the
> >>>>>> situation is better with the new FIFO model), i.e. if more than
> >>>>>> one of the guest's vCPU-s allowed to service it would be looking
> >>>>>> at it simultaneously, they'd still need to arbitrate which one
> >>>>>> ought to handle it.
> >>>>>>
> >>>>>> So your proposed extension might need to be limited to the
> >>>>>> FIFO model.
> >>>>>
> >>>>> Thanks for your reply. Yes, you are right. My prior description was
> >>>>> incorrect.
> >>>>> When there are more than one vCPUs picking the event, even without
> >>>>> arbitrary, will it cause "correctness" problem? After the event is
> >>>> served by
> >>>>> the first entered vCPU, and the rest vCPUs just have noting to do
> >>>>> in the
> >>>>> event handler (no much harm).
> >>>>
> >>>> That really depends on the handler. Plus it might be a performance
> >>>> and/or latency issue to run handlers that don't need to be run.
> >>>
> >>> I think the situation is much like IO-APIC routing in physical SMP
> >>> systems:
> >>
> >> Indeed, yet you draw the wrong conclusion.
> >>
> >>> in logical destination mode, all processors can serve I/O interrupts.
> >>
> >> But only one gets delivered any individual instance - there is
> >> arbitration being done in hardware.
> >
> > Xen should be able to arbitrate which one gets the actual event
> > delivery, right?  So the only risk would be that another vcpu would
> > notice the pending interrupt and handle it itself.
>
> If events are no longer assigned to a single CPU there's no guarantee
> that the CPU you deliver the event to is the one that's actually going
> to handle it, another CPU might be already in the event channel upcall
> and stole it from under your feet (or event worse, the event could be
> fired on several CPUs at the same time, at least with the current
> implementation).
>
> The goal is: to process the event asap. So, if the event is indeed stolen
by
another vCPU, we should be happy about it because it means that the event
can be processed "faster”, before the targeted vCPU picks it:)

With current implementation, the upcall only happens when the processor
switches from the hypervisor world to the guest world. It seems that the
likelihood that, such"switch" happens on multiple CPUs at the same time, is
very small.
Even if the event fires on several vCPUs, what is the negative effect..?
Is the guest OS able to tolerate it (reentrant IRQ handler)?

Thanks,
Luwei

[-- Attachment #1.2: Type: text/html, Size: 5004 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS?
  2013-10-29 11:22                 ` Jan Beulich
@ 2013-10-29 14:28                   ` Luwei Cheng
  2013-10-29 14:42                     ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Luwei Cheng @ 2013-10-29 14:28 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, xen-devel, wei.liu2, david.vrabel,
	Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 816 bytes --]

On Tue, Oct 29, 2013 at 7:22 PM, Jan Beulich <JBeulich@suse.com> wrote:

> >>> On 29.10.13 at 11:52, George Dunlap <george.dunlap@eu.citrix.com>
> wrote:
> > Xen should be able to arbitrate which one gets the actual event
> > delivery, right?  So the only risk would be that another vcpu would
> > notice the pending interrupt and handle it itself.
>
> As said before - for the FIFO model Xen's arbitration would be
> sufficient (as long as affinity changes get carried out with
> sufficient care), but for the legacy model several vCPU-s might
> end up trying to service the event (since the pending bitmap is
> per-domain, not per-vCPU)..
>
> Jan
>
> As long as the event can be served quickly, and meanwhile there is no
correctness
problem (hopefully), do we really care which vCPU serves it..:d ?

Thanks,
Luwei

[-- Attachment #1.2: Type: text/html, Size: 1377 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS?
  2013-10-29 14:20                   ` Luwei Cheng
@ 2013-10-29 14:30                     ` Wei Liu
  2013-10-29 14:43                       ` Luwei Cheng
  0 siblings, 1 reply; 26+ messages in thread
From: Wei Liu @ 2013-10-29 14:30 UTC (permalink / raw)
  To: Luwei Cheng
  Cc: wei.liu2, George Dunlap, david.vrabel, Jan Beulich, xen-devel,
	Roger Pau Monné

On Tue, Oct 29, 2013 at 10:20:46PM +0800, Luwei Cheng wrote:
[...]
> >
> > If events are no longer assigned to a single CPU there's no guarantee
> > that the CPU you deliver the event to is the one that's actually going
> > to handle it, another CPU might be already in the event channel upcall
> > and stole it from under your feet (or event worse, the event could be
> > fired on several CPUs at the same time, at least with the current
> > implementation).
> >
> > The goal is: to process the event asap. So, if the event is indeed stolen
> by
> another vCPU, we should be happy about it because it means that the event
> can be processed "faster”, before the targeted vCPU picks it:)
> 
> With current implementation, the upcall only happens when the processor
> switches from the hypervisor world to the guest world. It seems that the
> likelihood that, such"switch" happens on multiple CPUs at the same time, is
> very small.
> Even if the event fires on several vCPUs, what is the negative effect..?
> Is the guest OS able to tolerate it (reentrant IRQ handler)?
> 

As Jan said, it depends. It is sure that unnecessary call to handlers
introduce overhead (however small).

Furthurmore, with your proposed scheme, it looks like you would need to
introduce locks to protect critical regions if there's any. This can
introduce overhead as well.

Wei.

> Thanks,
> Luwei

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS?
  2013-10-29 14:28                   ` Luwei Cheng
@ 2013-10-29 14:42                     ` Jan Beulich
  2013-10-29 15:20                       ` Luwei Cheng
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2013-10-29 14:42 UTC (permalink / raw)
  To: Luwei Cheng; +Cc: George Dunlap, xen-devel, wei.liu2, david.vrabel, roger.pau

>>> On 29.10.13 at 15:28, Luwei Cheng <chengluwei@gmail.com> wrote:
> On Tue, Oct 29, 2013 at 7:22 PM, Jan Beulich <JBeulich@suse.com> wrote:
>> >>> On 29.10.13 at 11:52, George Dunlap <george.dunlap@eu.citrix.com>
>> wrote:
>> > Xen should be able to arbitrate which one gets the actual event
>> > delivery, right?  So the only risk would be that another vcpu would
>> > notice the pending interrupt and handle it itself.
>>
>> As said before - for the FIFO model Xen's arbitration would be
>> sufficient (as long as affinity changes get carried out with
>> sufficient care), but for the legacy model several vCPU-s might
>> end up trying to service the event (since the pending bitmap is
>> per-domain, not per-vCPU)..
>>
>> As long as the event can be served quickly, and meanwhile there is no
> correctness
> problem (hopefully), do we really care which vCPU serves it..:d ?

No, we don't care. But we do care that it is exactly one that
does.

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS?
  2013-10-29 14:30                     ` Wei Liu
@ 2013-10-29 14:43                       ` Luwei Cheng
  2013-10-29 15:25                         ` Wei Liu
  0 siblings, 1 reply; 26+ messages in thread
From: Luwei Cheng @ 2013-10-29 14:43 UTC (permalink / raw)
  To: Wei Liu
  Cc: George Dunlap, xen-devel, david.vrabel, Jan Beulich,
	Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 1893 bytes --]

On Tue, Oct 29, 2013 at 10:30 PM, Wei Liu <wei.liu2@citrix.com> wrote:

> On Tue, Oct 29, 2013 at 10:20:46PM +0800, Luwei Cheng wrote:
> [...]
> > >
> > > If events are no longer assigned to a single CPU there's no guarantee
> > > that the CPU you deliver the event to is the one that's actually going
> > > to handle it, another CPU might be already in the event channel upcall
> > > and stole it from under your feet (or event worse, the event could be
> > > fired on several CPUs at the same time, at least with the current
> > > implementation).
> > >
> > > The goal is: to process the event asap. So, if the event is indeed
> stolen
> > by
> > another vCPU, we should be happy about it because it means that the event
> > can be processed "faster”, before the targeted vCPU picks it:)
> >
> > With current implementation, the upcall only happens when the processor
> > switches from the hypervisor world to the guest world. It seems that the
> > likelihood that, such"switch" happens on multiple CPUs at the same time,
> is
> > very small.
> > Even if the event fires on several vCPUs, what is the negative effect..?
> > Is the guest OS able to tolerate it (reentrant IRQ handler)?
> >
>
> As Jan said, it depends. It is sure that unnecessary call to handlers
> introduce overhead (however small).
>
> Furthurmore, with your proposed scheme, it looks like you would need to
> introduce locks to protect critical regions if there's any. This can
> introduce overhead as well.
>
> Wei.
>
>
Thanks Wei for your comment. Let's compare the cons with pros:

[Benefit]:
avoid long vCPU scheduling delays (10x ms), without introducing additional
reschedule operations

[Negative effect, possible]:
the latency due to unnecessary call to handlers on other vCPUs
(micro-second or nano-second?)

So, ... which side we should prefer?

Thanks,
Luwei

[-- Attachment #1.2: Type: text/html, Size: 2739 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS?
  2013-10-29 14:42                     ` Jan Beulich
@ 2013-10-29 15:20                       ` Luwei Cheng
  2013-10-29 16:37                         ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Luwei Cheng @ 2013-10-29 15:20 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, xen-devel, Wei Liu, david.vrabel,
	Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 1437 bytes --]

On Tue, Oct 29, 2013 at 10:42 PM, Jan Beulich <JBeulich@suse.com> wrote:

> >>> On 29.10.13 at 15:28, Luwei Cheng <chengluwei@gmail.com> wrote:
> > On Tue, Oct 29, 2013 at 7:22 PM, Jan Beulich <JBeulich@suse.com> wrote:
> >> >>> On 29.10.13 at 11:52, George Dunlap <george.dunlap@eu.citrix.com>
> >> wrote:
> >> > Xen should be able to arbitrate which one gets the actual event
> >> > delivery, right?  So the only risk would be that another vcpu would
> >> > notice the pending interrupt and handle it itself.
> >>
> >> As said before - for the FIFO model Xen's arbitration would be
> >> sufficient (as long as affinity changes get carried out with
> >> sufficient care), but for the legacy model several vCPU-s might
> >> end up trying to service the event (since the pending bitmap is
> >> per-domain, not per-vCPU)..
> >>
> >> As long as the event can be served quickly, and meanwhile there is no
> > correctness
> > problem (hopefully), do we really care which vCPU serves it..:d ?
>
> No, we don't care. But we do care that it is exactly one that
> does.
>
> Jan
>
Since the I/O event is marked as pending for "the whole guest OS", not
pending for a specific vCPU. To tickle a vCPU does not necessarily mean
that the event is exactly for that vCPU, but can mean that we give the
tickled vCPU a chance to serve it. But we do not refuse other vCPUs to
get in early.
Not sure whether my argument is sensible or not..:)

Thanks,
Luwei

[-- Attachment #1.2: Type: text/html, Size: 2304 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS?
  2013-10-28 15:26 [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS? Luwei Cheng
  2013-10-28 15:51 ` Roger Pau Monné
@ 2013-10-29 15:21 ` David Vrabel
  2013-10-30  7:35   ` Luwei Cheng
  1 sibling, 1 reply; 26+ messages in thread
From: David Vrabel @ 2013-10-29 15:21 UTC (permalink / raw)
  To: Luwei Cheng, xen-devel; +Cc: George.Dunlap, wei.liu2, david.vrabel

On 28/10/2013 15:26, Luwei Cheng wrote:
> This following idea was first discussed with George Dunlap, David Vrabel 
> and Wei Liu in XenDevSummit13. Many thanks for their encouragement to 
> post this idea to the community for a wider discussion.
> 
> [Current Design]
> Each event channel is associated with only “one” notified vCPU: one-to-one.
> 
> [Problem]
> Some events are per-vCPU (such as local timer interrupts) while some others 
> are per-OS (such as I/O interrupts: network and disk). 
> For SMP-VMs, it is possible that when one vCPU is waiting in the scheduling 
> queue, another vCPU is running. So, if the I/O events can be dynamically 
> routed to the running vCPU, the events can be processed quickly, without 
> suffering from VM scheduling delays (tens of milliseconds). On the other 
> hand, no reschedule operations are introduced.
> 
> Though users can set IRQ affinity in the guest OS, the current 
> implementation forces to bind the IRQ to the first vCPU of the 
> affinity mask [events.c: set_affinity_irq].
> If the hypervisor delivers the event to a different vCPU, the event 
> will get lost because the guest OS has masked out this event in all 
> non-notified vCPUs [events.c: bind_evtchn_to_cpu].
> 
> [New Design]
> For per-OS event channel, add “vCPU affinity” support: one-to-many.
> The “affinity” should be consistent with the ‘/proc/irq/#/smp_affinity’
> of the 
> guest OS and users can change the mapping at runtime. But by default, 
> all vCPUs should be enabled to serve I/O.
> 
> When such flexibility is enabled, I/O balancing among vCPUs can be 
> offloaded to the hypervisor. “irqbalance” is designed for physical 
> SMP systems, not virtual SMP systems.

It's an interesting idea but I'm not sure how useful it will be in
practise as often work is deferred to threads in the guest rather than
done directly in the interrupt handler.

I don't see any way this could be implemented using the 2-level ABI.

With the FIFO ABI, queues cannot move between VCPUs without some
additional locking (dequeuing an event is only safe with a single
consumer) but it may be possible (when an event is set pending) for Xen
to pick a queue from a set of queues, instead of always using the same
queue.

I don't think this would result in balanced I/O between VCPUs, but the
opposite -- events would crowd onto the few VCPUs that are currently
running.

David

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS?
  2013-10-29 14:43                       ` Luwei Cheng
@ 2013-10-29 15:25                         ` Wei Liu
  2013-10-30  7:40                           ` Luwei Cheng
  0 siblings, 1 reply; 26+ messages in thread
From: Wei Liu @ 2013-10-29 15:25 UTC (permalink / raw)
  To: Luwei Cheng
  Cc: Wei Liu, George Dunlap, david.vrabel, Jan Beulich, xen-devel,
	Roger Pau Monné

On Tue, Oct 29, 2013 at 10:43:34PM +0800, Luwei Cheng wrote:
> On Tue, Oct 29, 2013 at 10:30 PM, Wei Liu <wei.liu2@citrix.com> wrote:
> 
> > On Tue, Oct 29, 2013 at 10:20:46PM +0800, Luwei Cheng wrote:
> > [...]
> > > >
> > > > If events are no longer assigned to a single CPU there's no guarantee
> > > > that the CPU you deliver the event to is the one that's actually going
> > > > to handle it, another CPU might be already in the event channel upcall
> > > > and stole it from under your feet (or event worse, the event could be
> > > > fired on several CPUs at the same time, at least with the current
> > > > implementation).
> > > >
> > > > The goal is: to process the event asap. So, if the event is indeed
> > stolen
> > > by
> > > another vCPU, we should be happy about it because it means that the event
> > > can be processed "faster”, before the targeted vCPU picks it:)
> > >
> > > With current implementation, the upcall only happens when the processor
> > > switches from the hypervisor world to the guest world. It seems that the
> > > likelihood that, such"switch" happens on multiple CPUs at the same time,
> > is
> > > very small.
> > > Even if the event fires on several vCPUs, what is the negative effect..?
> > > Is the guest OS able to tolerate it (reentrant IRQ handler)?
> > >
> >
> > As Jan said, it depends. It is sure that unnecessary call to handlers
> > introduce overhead (however small).
> >
> > Furthurmore, with your proposed scheme, it looks like you would need to
> > introduce locks to protect critical regions if there's any. This can
> > introduce overhead as well.
> >
> > Wei.
> >
> >
> Thanks Wei for your comment. Let's compare the cons with pros:
> 
> [Benefit]:
> avoid long vCPU scheduling delays (10x ms), without introducing additional
> reschedule operations
> 
> [Negative effect, possible]:
> the latency due to unnecessary call to handlers on other vCPUs
> (micro-second or nano-second?)
> 

What I mean is that you will introduce latency / performance penalty
from locks to protect critical sections. Say, if several CPUs contents
for same event, overall performance might downgrade.

> So, ... which side we should prefer?
> 
> Thanks,
> Luwei

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS?
  2013-10-29 15:20                       ` Luwei Cheng
@ 2013-10-29 16:37                         ` Jan Beulich
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2013-10-29 16:37 UTC (permalink / raw)
  To: Luwei Cheng; +Cc: George Dunlap, xen-devel, Wei Liu, david.vrabel, roger.pau

>>> On 29.10.13 at 16:20, Luwei Cheng <chengluwei@gmail.com> wrote:
> Since the I/O event is marked as pending for "the whole guest OS", not
> pending for a specific vCPU. To tickle a vCPU does not necessarily mean
> that the event is exactly for that vCPU, but can mean that we give the
> tickled vCPU a chance to serve it. But we do not refuse other vCPUs to
> get in early.
> Not sure whether my argument is sensible or not..:)

It's sensible, but it's only covering one side of the whole thing:
Avoiding the tickling. You continue to ignore the added
complexities in the code handling the event.

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS?
  2013-10-29 15:21 ` David Vrabel
@ 2013-10-30  7:35   ` Luwei Cheng
  2013-10-30  8:45     ` Roger Pau Monné
  2013-10-30  8:45     ` Roger Pau Monné
  0 siblings, 2 replies; 26+ messages in thread
From: Luwei Cheng @ 2013-10-30  7:35 UTC (permalink / raw)
  To: David Vrabel; +Cc: George Dunlap, xen-devel, Wei Liu, david.vrabel


[-- Attachment #1.1: Type: text/plain, Size: 3536 bytes --]

On Tue, Oct 29, 2013 at 11:21 PM, David Vrabel <dvrabel@cantab.net> wrote:

> On 28/10/2013 15:26, Luwei Cheng wrote:
> > This following idea was first discussed with George Dunlap, David Vrabel
> > and Wei Liu in XenDevSummit13. Many thanks for their encouragement to
> > post this idea to the community for a wider discussion.
> >
> > [Current Design]
> > Each event channel is associated with only “one” notified vCPU:
> one-to-one.
> >
> > [Problem]
> > Some events are per-vCPU (such as local timer interrupts) while some
> others
> > are per-OS (such as I/O interrupts: network and disk).
> > For SMP-VMs, it is possible that when one vCPU is waiting in the
> scheduling
> > queue, another vCPU is running. So, if the I/O events can be dynamically
> > routed to the running vCPU, the events can be processed quickly, without
> > suffering from VM scheduling delays (tens of milliseconds). On the other
> > hand, no reschedule operations are introduced.
> >
> > Though users can set IRQ affinity in the guest OS, the current
> > implementation forces to bind the IRQ to the first vCPU of the
> > affinity mask [events.c: set_affinity_irq].
> > If the hypervisor delivers the event to a different vCPU, the event
> > will get lost because the guest OS has masked out this event in all
> > non-notified vCPUs [events.c: bind_evtchn_to_cpu].
> >
> > [New Design]
> > For per-OS event channel, add “vCPU affinity” support: one-to-many.
> > The “affinity” should be consistent with the ‘/proc/irq/#/smp_affinity’
> > of the
> > guest OS and users can change the mapping at runtime. But by default,
> > all vCPUs should be enabled to serve I/O.
> >
> > When such flexibility is enabled, I/O balancing among vCPUs can be
> > offloaded to the hypervisor. “irqbalance” is designed for physical
> > SMP systems, not virtual SMP systems.
>
> Thanks for your echoing, David.

>  It's an interesting idea but I'm not sure how useful it will be in
> practise as often work is deferred to threads in the guest rather than
> done directly in the interrupt handler.

Sure, but if the interrupt handler is not called timely, no irq threads will
be created.


> I don't see any way this could be implemented using the 2-level ABI.

Probably the implementation does not need to bother 2-level ABI.

With the FIFO ABI, queues cannot move between VCPUs without some
> additional locking (dequeuing an event is only safe with a single
> consumer) but it may be possible (when an event is set pending) for Xen
> to pick a queue from a set of queues, instead of always using the same
> queue.
>
> I don't think this would result in balanced I/O between VCPUs, but the
> opposite -- events would crowd onto the few VCPUs that are currently
> running.
>
I think it is the hypervisor who plays the role of deciding which vCPU
should
be kicked to serve I/O. Different routing policies results in different
results.
Since all vCPUs are symmetrically scheduled, the events can therefore be
evenly distributed onto them. At one moment, vCPUx is running, while at
another moment, vCPUy is running. So, the events will not always crowd to
very few ones.

Currently, all I/O events are bound to vCPU0, which is just like what you
said:
events would crowd onto that vCPU. As a result, vCPU0 consumes much more
CPU cycles than other ones, leading to unfairness. If some workload can be
dynamically migrated to other vCPUs, I believe more or less we can get
some benefit.

Thanks,
Luwei

[-- Attachment #1.2: Type: text/html, Size: 4924 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS?
  2013-10-29 15:25                         ` Wei Liu
@ 2013-10-30  7:40                           ` Luwei Cheng
  2013-10-30 10:27                             ` Wei Liu
  0 siblings, 1 reply; 26+ messages in thread
From: Luwei Cheng @ 2013-10-30  7:40 UTC (permalink / raw)
  To: Wei Liu
  Cc: George Dunlap, xen-devel, david.vrabel, Jan Beulich,
	Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 2825 bytes --]

On Tue, Oct 29, 2013 at 11:25 PM, Wei Liu <wei.liu2@citrix.com> wrote:

> On Tue, Oct 29, 2013 at 10:43:34PM +0800, Luwei Cheng wrote:
> > On Tue, Oct 29, 2013 at 10:30 PM, Wei Liu <wei.liu2@citrix.com> wrote:
> >
> > > On Tue, Oct 29, 2013 at 10:20:46PM +0800, Luwei Cheng wrote:
> > > [...]
> > > > >
> > > > > If events are no longer assigned to a single CPU there's no
> guarantee
> > > > > that the CPU you deliver the event to is the one that's actually
> going
> > > > > to handle it, another CPU might be already in the event channel
> upcall
> > > > > and stole it from under your feet (or event worse, the event could
> be
> > > > > fired on several CPUs at the same time, at least with the current
> > > > > implementation).
> > > > >
> > > > > The goal is: to process the event asap. So, if the event is indeed
> > > stolen
> > > > by
> > > > another vCPU, we should be happy about it because it means that the
> event
> > > > can be processed "faster”, before the targeted vCPU picks it:)
> > > >
> > > > With current implementation, the upcall only happens when the
> processor
> > > > switches from the hypervisor world to the guest world. It seems that
> the
> > > > likelihood that, such"switch" happens on multiple CPUs at the same
> time,
> > > is
> > > > very small.
> > > > Even if the event fires on several vCPUs, what is the negative
> effect..?
> > > > Is the guest OS able to tolerate it (reentrant IRQ handler)?
> > > >
> > >
> > > As Jan said, it depends. It is sure that unnecessary call to handlers
> > > introduce overhead (however small).
> > >
> > > Furthurmore, with your proposed scheme, it looks like you would need to
> > > introduce locks to protect critical regions if there's any. This can
> > > introduce overhead as well.
> > >
> > > Wei.
> > >
> > >
> > Thanks Wei for your comment. Let's compare the cons with pros:
> >
> > [Benefit]:
> > avoid long vCPU scheduling delays (10x ms), without introducing
> additional
> > reschedule operations
> >
> > [Negative effect, possible]:
> > the latency due to unnecessary call to handlers on other vCPUs
> > (micro-second or nano-second?)
> >
>
> What I mean is that you will introduce latency / performance penalty
> from locks to protect critical sections. Say, if several CPUs contents
> for same event, overall performance might downgrade.



I agree with you to some extent. But the question is: how frequently such
"contention" will happen? As explained, upcall handler is called only when
the processor switches from the hypervisor to the guest OS, and trapping
into the hypervisor are mostly caused by things like hypercall, IPI, etc.
The probability that multiple switches happen "exactly" at the same same,
which I guess, is very small..

Thanks,
Luwei

[-- Attachment #1.2: Type: text/html, Size: 3879 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS?
  2013-10-30  7:35   ` Luwei Cheng
@ 2013-10-30  8:45     ` Roger Pau Monné
  2013-10-30  8:45     ` Roger Pau Monné
  1 sibling, 0 replies; 26+ messages in thread
From: Roger Pau Monné @ 2013-10-30  8:45 UTC (permalink / raw)
  To: Luwei Cheng, David Vrabel; +Cc: George Dunlap, xen-devel, Wei Liu, david.vrabel

On 30/10/13 08:35, Luwei Cheng wrote:
> 
> 
> 
> On Tue, Oct 29, 2013 at 11:21 PM, David Vrabel <dvrabel@cantab.net
> <mailto:dvrabel@cantab.net>> wrote:
> 
>     On 28/10/2013 15:26, Luwei Cheng wrote:
>     > This following idea was first discussed with George Dunlap, David
>     Vrabel
>     > and Wei Liu in XenDevSummit13. Many thanks for their encouragement to
>     > post this idea to the community for a wider discussion.
>     >
>     > [Current Design]
>     > Each event channel is associated with only “one” notified vCPU:
>     one-to-one.
>     >
>     > [Problem]
>     > Some events are per-vCPU (such as local timer interrupts) while
>     some others
>     > are per-OS (such as I/O interrupts: network and disk).
>     > For SMP-VMs, it is possible that when one vCPU is waiting in the
>     scheduling
>     > queue, another vCPU is running. So, if the I/O events can be
>     dynamically
>     > routed to the running vCPU, the events can be processed quickly,
>     without
>     > suffering from VM scheduling delays (tens of milliseconds). On the
>     other
>     > hand, no reschedule operations are introduced.
>     >
>     > Though users can set IRQ affinity in the guest OS, the current
>     > implementation forces to bind the IRQ to the first vCPU of the
>     > affinity mask [events.c: set_affinity_irq].
>     > If the hypervisor delivers the event to a different vCPU, the event
>     > will get lost because the guest OS has masked out this event in all
>     > non-notified vCPUs [events.c: bind_evtchn_to_cpu].
>     >
>     > [New Design]
>     > For per-OS event channel, add “vCPU affinity” support: one-to-many.
>     > The “affinity” should be consistent with the
>     ‘/proc/irq/#/smp_affinity’
>     > of the
>     > guest OS and users can change the mapping at runtime. But by default,
>     > all vCPUs should be enabled to serve I/O.
>     >
>     > When such flexibility is enabled, I/O balancing among vCPUs can be
>     > offloaded to the hypervisor. “irqbalance” is designed for physical
>     > SMP systems, not virtual SMP systems.
> 
> Thanks for your echoing, David. 
> 
>     It's an interesting idea but I'm not sure how useful it will be in
>     practise as often work is deferred to threads in the guest rather than
>     done directly in the interrupt handler.
> 
> Sure, but if the interrupt handler is not called timely, no irq threads will
> be created.
>  
> 
>     I don't see any way this could be implemented using the 2-level ABI.
> 
> Probably the implementation does not need to bother 2-level ABI.
> 
>     With the FIFO ABI, queues cannot move between VCPUs without some
>     additional locking (dequeuing an event is only safe with a single
>     consumer) but it may be possible (when an event is set pending) for Xen
>     to pick a queue from a set of queues, instead of always using the same
>     queue.
> 
>     I don't think this would result in balanced I/O between VCPUs, but the
>     opposite -- events would crowd onto the few VCPUs that are currently
>     running.
> 
> I think it is the hypervisor who plays the role of deciding which vCPU
> should
> be kicked to serve I/O. Different routing policies results in different
> results.
> Since all vCPUs are symmetrically scheduled, the events can therefore be 
> evenly distributed onto them. At one moment, vCPUx is running, while at 
> another moment, vCPUy is running. So, the events will not always crowd to
> very few ones.

So you will end up delivering one event to only one vCPU, you are not
going to deliver the event to all vCPUs in a domain?

If that's the case, I'm not sure there's anyway you can assure that it's
going to be faster than what we currently do, for example if the online
vCPU you are delivering the event is scheduled out before actually
processing the event it might actually be worse than what we currently do.

> 
> Currently, all I/O events are bound to vCPU0, which is just like what
> you said:
> events would crowd onto that vCPU. As a result, vCPU0 consumes much more
> CPU cycles than other ones, leading to unfairness. If some workload can be 
> dynamically migrated to other vCPUs, I believe more or less we can get 
> some benefit.

Are you sure about this? I'm not that familiar with the Linux event
code, but at least on FreeBSD all interrupts get automatically balanced
across all available CPUs by the OS itself.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS?
  2013-10-30  7:35   ` Luwei Cheng
  2013-10-30  8:45     ` Roger Pau Monné
@ 2013-10-30  8:45     ` Roger Pau Monné
  2013-10-30 13:11       ` Luwei Cheng
  1 sibling, 1 reply; 26+ messages in thread
From: Roger Pau Monné @ 2013-10-30  8:45 UTC (permalink / raw)
  To: Luwei Cheng, David Vrabel; +Cc: George Dunlap, xen-devel, Wei Liu, david.vrabel

On 30/10/13 08:35, Luwei Cheng wrote:
> 
> 
> 
> On Tue, Oct 29, 2013 at 11:21 PM, David Vrabel <dvrabel@cantab.net
> <mailto:dvrabel@cantab.net>> wrote:
> 
>     On 28/10/2013 15:26, Luwei Cheng wrote:
>     > This following idea was first discussed with George Dunlap, David
>     Vrabel
>     > and Wei Liu in XenDevSummit13. Many thanks for their encouragement to
>     > post this idea to the community for a wider discussion.
>     >
>     > [Current Design]
>     > Each event channel is associated with only “one” notified vCPU:
>     one-to-one.
>     >
>     > [Problem]
>     > Some events are per-vCPU (such as local timer interrupts) while
>     some others
>     > are per-OS (such as I/O interrupts: network and disk).
>     > For SMP-VMs, it is possible that when one vCPU is waiting in the
>     scheduling
>     > queue, another vCPU is running. So, if the I/O events can be
>     dynamically
>     > routed to the running vCPU, the events can be processed quickly,
>     without
>     > suffering from VM scheduling delays (tens of milliseconds). On the
>     other
>     > hand, no reschedule operations are introduced.
>     >
>     > Though users can set IRQ affinity in the guest OS, the current
>     > implementation forces to bind the IRQ to the first vCPU of the
>     > affinity mask [events.c: set_affinity_irq].
>     > If the hypervisor delivers the event to a different vCPU, the event
>     > will get lost because the guest OS has masked out this event in all
>     > non-notified vCPUs [events.c: bind_evtchn_to_cpu].
>     >
>     > [New Design]
>     > For per-OS event channel, add “vCPU affinity” support: one-to-many.
>     > The “affinity” should be consistent with the
>     ‘/proc/irq/#/smp_affinity’
>     > of the
>     > guest OS and users can change the mapping at runtime. But by default,
>     > all vCPUs should be enabled to serve I/O.
>     >
>     > When such flexibility is enabled, I/O balancing among vCPUs can be
>     > offloaded to the hypervisor. “irqbalance” is designed for physical
>     > SMP systems, not virtual SMP systems.
> 
> Thanks for your echoing, David. 
> 
>     It's an interesting idea but I'm not sure how useful it will be in
>     practise as often work is deferred to threads in the guest rather than
>     done directly in the interrupt handler.
> 
> Sure, but if the interrupt handler is not called timely, no irq threads will
> be created.
>  
> 
>     I don't see any way this could be implemented using the 2-level ABI.
> 
> Probably the implementation does not need to bother 2-level ABI.
> 
>     With the FIFO ABI, queues cannot move between VCPUs without some
>     additional locking (dequeuing an event is only safe with a single
>     consumer) but it may be possible (when an event is set pending) for Xen
>     to pick a queue from a set of queues, instead of always using the same
>     queue.
> 
>     I don't think this would result in balanced I/O between VCPUs, but the
>     opposite -- events would crowd onto the few VCPUs that are currently
>     running.
> 
> I think it is the hypervisor who plays the role of deciding which vCPU
> should
> be kicked to serve I/O. Different routing policies results in different
> results.
> Since all vCPUs are symmetrically scheduled, the events can therefore be 
> evenly distributed onto them. At one moment, vCPUx is running, while at 
> another moment, vCPUy is running. So, the events will not always crowd to
> very few ones.

So you will end up delivering one event to only one vCPU, you are not
going to deliver the event to all vCPUs in a domain?

If that's the case, I'm not sure there's anyway you can assure that it's
going to be faster than what we currently do, for example if the online
vCPU you are delivering the event is scheduled out before actually
processing the event it might actually be worse than what we currently do.

> 
> Currently, all I/O events are bound to vCPU0, which is just like what
> you said:
> events would crowd onto that vCPU. As a result, vCPU0 consumes much more
> CPU cycles than other ones, leading to unfairness. If some workload can be 
> dynamically migrated to other vCPUs, I believe more or less we can get 
> some benefit.

Are you sure about this? I'm not that familiar with the Linux event
code, but at least on FreeBSD all interrupts get automatically balanced
across all available CPUs by the OS itself.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS?
  2013-10-30  7:40                           ` Luwei Cheng
@ 2013-10-30 10:27                             ` Wei Liu
  0 siblings, 0 replies; 26+ messages in thread
From: Wei Liu @ 2013-10-30 10:27 UTC (permalink / raw)
  To: Luwei Cheng
  Cc: Wei Liu, George Dunlap, david.vrabel, Jan Beulich, xen-devel,
	Roger Pau Monné

On Wed, Oct 30, 2013 at 03:40:25PM +0800, Luwei Cheng wrote:
[...]
> >
> > What I mean is that you will introduce latency / performance penalty
> > from locks to protect critical sections. Say, if several CPUs contents
> > for same event, overall performance might downgrade.
> 
> 
> 
> I agree with you to some extent. But the question is: how frequently such
> "contention" will happen? As explained, upcall handler is called only when
> the processor switches from the hypervisor to the guest OS, and trapping
> into the hypervisor are mostly caused by things like hypercall, IPI, etc.

PV guest traps into hypervisor everytime it enables interrupt.

> The probability that multiple switches happen "exactly" at the same same,
> which I guess, is very small..
> 

It's not about "exactly at the same time", it's about we need to ensure
the handler runs only once (takes effect only once).

> Thanks,
> Luwei

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS?
  2013-10-30  8:45     ` Roger Pau Monné
@ 2013-10-30 13:11       ` Luwei Cheng
  0 siblings, 0 replies; 26+ messages in thread
From: Luwei Cheng @ 2013-10-30 13:11 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: David Vrabel, George Dunlap, Wei Liu, david.vrabel, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 5638 bytes --]

On Wed, Oct 30, 2013 at 8:45 AM, Roger Pau Monné <roger.pau@citrix.com>wrote:

> On 30/10/13 08:35, Luwei Cheng wrote:
> >
> >
> >
> > On Tue, Oct 29, 2013 at 11:21 PM, David Vrabel <dvrabel@cantab.net
> > <mailto:dvrabel@cantab.net>> wrote:
> >
> >     On 28/10/2013 15:26, Luwei Cheng wrote:
> >     > This following idea was first discussed with George Dunlap, David
> >     Vrabel
> >     > and Wei Liu in XenDevSummit13. Many thanks for their encouragement
> to
> >     > post this idea to the community for a wider discussion.
> >     >
> >     > [Current Design]
> >     > Each event channel is associated with only “one” notified vCPU:
> >     one-to-one.
> >     >
> >     > [Problem]
> >     > Some events are per-vCPU (such as local timer interrupts) while
> >     some others
> >     > are per-OS (such as I/O interrupts: network and disk).
> >     > For SMP-VMs, it is possible that when one vCPU is waiting in the
> >     scheduling
> >     > queue, another vCPU is running. So, if the I/O events can be
> >     dynamically
> >     > routed to the running vCPU, the events can be processed quickly,
> >     without
> >     > suffering from VM scheduling delays (tens of milliseconds). On the
> >     other
> >     > hand, no reschedule operations are introduced.
> >     >
> >     > Though users can set IRQ affinity in the guest OS, the current
> >     > implementation forces to bind the IRQ to the first vCPU of the
> >     > affinity mask [events.c: set_affinity_irq].
> >     > If the hypervisor delivers the event to a different vCPU, the event
> >     > will get lost because the guest OS has masked out this event in all
> >     > non-notified vCPUs [events.c: bind_evtchn_to_cpu].
> >     >
> >     > [New Design]
> >     > For per-OS event channel, add “vCPU affinity” support: one-to-many.
> >     > The “affinity” should be consistent with the
> >     ‘/proc/irq/#/smp_affinity’
> >     > of the
> >     > guest OS and users can change the mapping at runtime. But by
> default,
> >     > all vCPUs should be enabled to serve I/O.
> >     >
> >     > When such flexibility is enabled, I/O balancing among vCPUs can be
> >     > offloaded to the hypervisor. “irqbalance” is designed for physical
> >     > SMP systems, not virtual SMP systems.
> >
> > Thanks for your echoing, David.
> >
> >     It's an interesting idea but I'm not sure how useful it will be in
> >     practise as often work is deferred to threads in the guest rather
> than
> >     done directly in the interrupt handler.
> >
> > Sure, but if the interrupt handler is not called timely, no irq threads
> will
> > be created.
> >
> >
> >     I don't see any way this could be implemented using the 2-level ABI.
> >
> > Probably the implementation does not need to bother 2-level ABI.
> >
> >     With the FIFO ABI, queues cannot move between VCPUs without some
> >     additional locking (dequeuing an event is only safe with a single
> >     consumer) but it may be possible (when an event is set pending) for
> Xen
> >     to pick a queue from a set of queues, instead of always using the
> same
> >     queue.
> >
> >     I don't think this would result in balanced I/O between VCPUs, but
> the
> >     opposite -- events would crowd onto the few VCPUs that are currently
> >     running.
> >
> > I think it is the hypervisor who plays the role of deciding which vCPU
> > should
> > be kicked to serve I/O. Different routing policies results in different
> > results.
> > Since all vCPUs are symmetrically scheduled, the events can therefore be
> > evenly distributed onto them. At one moment, vCPUx is running, while at
> > another moment, vCPUy is running. So, the events will not always crowd to
> > very few ones.
>
> So you will end up delivering one event to only one vCPU, you are not
> going to deliver the event to all vCPUs in a domain?
>
In current implementation, the event pending map is global to all vCPUs,
but only one vCPU is enabled to "see" that event's status (vCPU0 by
default).
My idea is to enable all vCPUs to see such event. The I/O event is actually
delivered to the whole OS, and we select the best vCPU to tickle.


>
> If that's the case, I'm not sure there's anyway you can assure that it's
> going to be faster than what we currently do, for example if the online
> vCPU you are delivering the event is scheduled out before actually
> processing the event it might actually be worse than what we currently do.
>

> Hmm.. yes it can happen, but it seems to be the corner case, not the
common case.


>  >
> > Currently, all I/O events are bound to vCPU0, which is just like what
> > you said:
> > events would crowd onto that vCPU. As a result, vCPU0 consumes much more
> > CPU cycles than other ones, leading to unfairness. If some workload can
> be
> > dynamically migrated to other vCPUs, I believe more or less we can get
> > some benefit.
>
> Are you sure about this? I'm not that familiar with the Linux event
> code, but at least on FreeBSD all interrupts get automatically balanced
> across all available CPUs by the OS itself.
>
> I am trying to compare how interrupts are handled in physical SMP systems
and then think about how we can do better in virtual SMP systems.
In physical SMP, Linux OS commonly relies on "irqbalance" daemon: remap
interrupts every few seconds, which is too coarse for SMP-VMs, because
hypervisor schedules vCPUs in tens of milliseconds.

I am not familiar with FreeBSD, does it have a similar functionality as
irqbalance?

Thanks,
Luwei

[-- Attachment #1.2: Type: text/html, Size: 7905 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2013-10-30 13:11 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-28 15:26 [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS? Luwei Cheng
2013-10-28 15:51 ` Roger Pau Monné
2013-10-29  2:56   ` Luwei Cheng
2013-10-29  8:19     ` Jan Beulich
2013-10-29  9:02       ` Luwei Cheng
2013-10-29  9:34         ` Jan Beulich
2013-10-29  9:49           ` Luwei Cheng
2013-10-29  9:57             ` Jan Beulich
2013-10-29 10:52               ` George Dunlap
2013-10-29 11:00                 ` Roger Pau Monné
2013-10-29 14:20                   ` Luwei Cheng
2013-10-29 14:30                     ` Wei Liu
2013-10-29 14:43                       ` Luwei Cheng
2013-10-29 15:25                         ` Wei Liu
2013-10-30  7:40                           ` Luwei Cheng
2013-10-30 10:27                             ` Wei Liu
2013-10-29 11:22                 ` Jan Beulich
2013-10-29 14:28                   ` Luwei Cheng
2013-10-29 14:42                     ` Jan Beulich
2013-10-29 15:20                       ` Luwei Cheng
2013-10-29 16:37                         ` Jan Beulich
2013-10-29 15:21 ` David Vrabel
2013-10-30  7:35   ` Luwei Cheng
2013-10-30  8:45     ` Roger Pau Monné
2013-10-30  8:45     ` Roger Pau Monné
2013-10-30 13:11       ` Luwei Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).