From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: George Dunlap <dunlapg@umich.edu>
Cc: "xen-users@lists.xen.org" <xen-users@lists.xen.org>,
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: Request for input: Extended event channel support
Date: Wed, 27 Mar 2013 21:56:49 -0400 [thread overview]
Message-ID: <20130328015649.GA12810@phenom.dumpdata.com> (raw)
In-Reply-To: <CAFLBxZa98gw3gTOHBHUWkiyBBzSq9UZNZ1f7BJ895SJPYszTJw@mail.gmail.com>
On Wed, Mar 27, 2013 at 11:23:23AM +0000, George Dunlap wrote:
> * Executive summary
>
> The number of event channels available for dom0 is currently one of
> the biggest limitations on scaling up the number of VMs which can be
> created on a single system. There are two alternative implementations
> we could choose, one of which is ready now, the other of which is
> potentially technically superior, but will not be ready for the 4.3
> release.
>
> The core question we need to ask the community: How important is
> lifting the event channel scalability limit to 4.3? Will waiting
> until 4.4 cause a limit in the uptake of the Xen platform?
>
> * The issue
>
> The existing event channel implementation for PV guests is implemented
> as 2-level bit array. This limits the total number of event channels
> to word_size ^ 2, which is 1024 for 32-bit guests and 4096 for 64-bit
> guests.
>
> This sounds like a lot, until you consider that in a typical system,
> each VM needs 4 or more event channels in domain 0. This means that
> for a 32-bit dom0, there is a theoretical maximum of 256 guests -- and
> in practice it's more like 180 or so, because of event channels
> required for other things. XenServer already has customers using VDI
> that require more VMs than this.
>
> * The dilemma
>
> When we began the 4.3 release cycle, this was one of the items we
> identified as a key feature we needed to get for 4.3. Wei Liu started
> work on an extension of the existing implmentation, allowing 3 levels
> of event channels. The draft of this is ready, and just needs the
> last bit of polishing and bug-chasing before it can be accepted.
>
> However, several months ago, David Vrabel came up with an alternate
> design which in theory was more scalable, based on queues of linked
> lists (which we have internally been calling "FIFO" for short). David
> has been working on the implementation since, and has a draft
> protoype; but it's in no shape to be included in 4.3.
>
> There are some things that are attractive about the second solution,
> including the flexible assignment of interrupt priorities, ease of
> scalability, and potentially even the FIFO nature of the interrupt
> delivery.
>
> The question at hand then, is whether to take what we have in the
> 3-level implementation for 4.3, or wait to see how the FIFO
> implementation turns out (taking either it or the 3-level
> implementation in 4.4).
>
> * The solution in hand: 3-level event channels
>
> The basic idea behind 3-level event channels is to extend the existing
> 2-level implementation to 3 levels. Going to 3 levels would give us
> 32k event channels for 32-bit, and 256k for 64-bit.
>
> One of the advantages of this method is that since it is similar to
> the existing method, the general concepts and race conditions are
> fairly well understood and tested.
>
> One of the disadvantages that this method inherits from the 2-level
> event channels is the lack of priority. In the initial implementation
> of event channels, priority was handled by event channel order: scans
> for events always started at 0 and went upwards. However, this was
> not very scalable, as lower-numbered events could easily completely
> lock out higher-numbered events; and frequently "lower-numbered"
> simply meant "created earlier". Event channels were forced into a
> priority even if one was not wanted.
>
> So the implementation was tweaked, so that scans don't start at 0, but
> continue where the last event left off. This made it so that earlier
> events were not prioritized and removed the starvation issue, but at
> the cost of removing all event priorities. Certain events, like the
> timer event, are special-cased to be always checked, but this is
> rather a bit of a hack and not very scalable or flexible.
Hm, I actually think that is not in the upstream kernel at all. That
would explain why on very heavily busy guest the hrtimer: interrupt
took XXxXXXXxx ns is printed.
Is this patch somewhere available?
next prev parent reply other threads:[~2013-03-28 1:56 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-27 11:23 Request for input: Extended event channel support George Dunlap
2013-03-27 19:36 ` Anil Madhavapeddy
2013-03-27 21:53 ` David Vrabel
2013-03-27 22:28 ` Anil Madhavapeddy
2013-03-27 22:31 ` Wei Liu
2013-03-28 12:51 ` Felipe Franciosi
2013-03-28 12:54 ` Anil Madhavapeddy
2013-03-28 13:02 ` Felipe Franciosi
2013-04-10 10:45 ` [Xen-users] " Ian Campbell
2013-04-10 16:14 ` Anil Madhavapeddy
2013-04-10 10:45 ` Ian Campbell
2013-03-28 1:56 ` Konrad Rzeszutek Wilk [this message]
2013-03-28 11:10 ` George Dunlap
2013-03-28 11:34 ` Jan Beulich
2013-03-29 13:05 ` Konrad Rzeszutek Wilk
2013-04-02 7:44 ` Jan Beulich
2013-04-02 14:20 ` Konrad Rzeszutek Wilk
2013-04-04 13:31 ` George Dunlap
2013-04-10 10:49 ` [Xen-users] " Ian Campbell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130328015649.GA12810@phenom.dumpdata.com \
--to=konrad.wilk@oracle.com \
--cc=dunlapg@umich.edu \
--cc=xen-devel@lists.xen.org \
--cc=xen-users@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.