Re: [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS?

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: "Roger Pau Monné" <roger.pau@citrix.com>
To: Luwei Cheng <chengluwei@gmail.com>, David Vrabel <dvrabel@cantab.net>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
	xen-devel@lists.xenproject.org, Wei Liu <wei.liu2@citrix.com>,
	david.vrabel@citrix.com
Subject: Re: [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS?
Date: Wed, 30 Oct 2013 09:45:31 +0100	[thread overview]
Message-ID: <5270C72B.1020004@citrix.com> (raw)
In-Reply-To: <CA+1E0hQGx_ns6ELZOzNnDHdZFFiadXM1cCNU9nrmBhtOKgyu3Q@mail.gmail.com>

On 30/10/13 08:35, Luwei Cheng wrote:
> 
> 
> 
> On Tue, Oct 29, 2013 at 11:21 PM, David Vrabel <dvrabel@cantab.net
> <mailto:dvrabel@cantab.net>> wrote:
> 
>     On 28/10/2013 15:26, Luwei Cheng wrote:
>     > This following idea was first discussed with George Dunlap, David
>     Vrabel
>     > and Wei Liu in XenDevSummit13. Many thanks for their encouragement to
>     > post this idea to the community for a wider discussion.
>     >
>     > [Current Design]
>     > Each event channel is associated with only “one” notified vCPU:
>     one-to-one.
>     >
>     > [Problem]
>     > Some events are per-vCPU (such as local timer interrupts) while
>     some others
>     > are per-OS (such as I/O interrupts: network and disk).
>     > For SMP-VMs, it is possible that when one vCPU is waiting in the
>     scheduling
>     > queue, another vCPU is running. So, if the I/O events can be
>     dynamically
>     > routed to the running vCPU, the events can be processed quickly,
>     without
>     > suffering from VM scheduling delays (tens of milliseconds). On the
>     other
>     > hand, no reschedule operations are introduced.
>     >
>     > Though users can set IRQ affinity in the guest OS, the current
>     > implementation forces to bind the IRQ to the first vCPU of the
>     > affinity mask [events.c: set_affinity_irq].
>     > If the hypervisor delivers the event to a different vCPU, the event
>     > will get lost because the guest OS has masked out this event in all
>     > non-notified vCPUs [events.c: bind_evtchn_to_cpu].
>     >
>     > [New Design]
>     > For per-OS event channel, add “vCPU affinity” support: one-to-many.
>     > The “affinity” should be consistent with the
>     ‘/proc/irq/#/smp_affinity’
>     > of the
>     > guest OS and users can change the mapping at runtime. But by default,
>     > all vCPUs should be enabled to serve I/O.
>     >
>     > When such flexibility is enabled, I/O balancing among vCPUs can be
>     > offloaded to the hypervisor. “irqbalance” is designed for physical
>     > SMP systems, not virtual SMP systems.
> 
> Thanks for your echoing, David. 
> 
>     It's an interesting idea but I'm not sure how useful it will be in
>     practise as often work is deferred to threads in the guest rather than
>     done directly in the interrupt handler.
> 
> Sure, but if the interrupt handler is not called timely, no irq threads will
> be created.
>  
> 
>     I don't see any way this could be implemented using the 2-level ABI.
> 
> Probably the implementation does not need to bother 2-level ABI.
> 
>     With the FIFO ABI, queues cannot move between VCPUs without some
>     additional locking (dequeuing an event is only safe with a single
>     consumer) but it may be possible (when an event is set pending) for Xen
>     to pick a queue from a set of queues, instead of always using the same
>     queue.
> 
>     I don't think this would result in balanced I/O between VCPUs, but the
>     opposite -- events would crowd onto the few VCPUs that are currently
>     running.
> 
> I think it is the hypervisor who plays the role of deciding which vCPU
> should
> be kicked to serve I/O. Different routing policies results in different
> results.
> Since all vCPUs are symmetrically scheduled, the events can therefore be 
> evenly distributed onto them. At one moment, vCPUx is running, while at 
> another moment, vCPUy is running. So, the events will not always crowd to
> very few ones.

So you will end up delivering one event to only one vCPU, you are not
going to deliver the event to all vCPUs in a domain?

If that's the case, I'm not sure there's anyway you can assure that it's
going to be faster than what we currently do, for example if the online
vCPU you are delivering the event is scheduled out before actually
processing the event it might actually be worse than what we currently do.

> 
> Currently, all I/O events are bound to vCPU0, which is just like what
> you said:
> events would crowd onto that vCPU. As a result, vCPU0 consumes much more
> CPU cycles than other ones, leading to unfairness. If some workload can be 
> dynamically migrated to other vCPUs, I believe more or less we can get 
> some benefit.

Are you sure about this? I'm not that familiar with the Linux event
code, but at least on FreeBSD all interrupts get automatically balanced
across all available CPUs by the OS itself.

next prev parent reply	other threads:[~2013-10-30  8:45 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-28 15:26 [PROPOSAL] Event channel for SMP-VMs: per-vCPU or per-OS? Luwei Cheng
2013-10-28 15:51 ` Roger Pau Monné
2013-10-29  2:56   ` Luwei Cheng
2013-10-29  8:19     ` Jan Beulich
2013-10-29  9:02       ` Luwei Cheng
2013-10-29  9:34         ` Jan Beulich
2013-10-29  9:49           ` Luwei Cheng
2013-10-29  9:57             ` Jan Beulich
2013-10-29 10:52               ` George Dunlap
2013-10-29 11:00                 ` Roger Pau Monné
2013-10-29 14:20                   ` Luwei Cheng
2013-10-29 14:30                     ` Wei Liu
2013-10-29 14:43                       ` Luwei Cheng
2013-10-29 15:25                         ` Wei Liu
2013-10-30  7:40                           ` Luwei Cheng
2013-10-30 10:27                             ` Wei Liu
2013-10-29 11:22                 ` Jan Beulich
2013-10-29 14:28                   ` Luwei Cheng
2013-10-29 14:42                     ` Jan Beulich
2013-10-29 15:20                       ` Luwei Cheng
2013-10-29 16:37                         ` Jan Beulich
2013-10-29 15:21 ` David Vrabel
2013-10-30  7:35   ` Luwei Cheng
2013-10-30  8:45     ` Roger Pau Monné [this message]
2013-10-30  8:45     ` Roger Pau Monné
2013-10-30 13:11       ` Luwei Cheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5270C72B.1020004@citrix.com \
    --to=roger.pau@citrix.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=chengluwei@gmail.com \
    --cc=david.vrabel@citrix.com \
    --cc=dvrabel@cantab.net \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).