Re: [Hackathon minutes] PV frontends/backends and NUMA machines

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: George Dunlap <george.dunlap@eu.citrix.com>
To: Tim Deegan <tim@xen.org>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
	Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Subject: Re: [Hackathon minutes] PV frontends/backends and NUMA machines
Date: Tue, 21 May 2013 10:45:37 +0100	[thread overview]
Message-ID: <519B4241.2040006@eu.citrix.com> (raw)
In-Reply-To: <20130521092003.GE9626@ocelot.phlegethon.org>

On 05/21/2013 10:20 AM, Tim Deegan wrote:
> At 09:47 +0100 on 21 May (1369129629), George Dunlap wrote:
>> On Tue, May 21, 2013 at 9:32 AM, Tim Deegan <tim@xen.org> wrote:
>>> At 14:48 +0100 on 20 May (1369061330), George Dunlap wrote:
>>>> So the work items I remember are as follows:
>>>> 1. Implement NUMA affinity for vcpus
>>>> 2. Implement Guest NUMA support for PV guests
>>>> 3. Teach Xen how to make a sensible NUMA allocation layout for dom0
>>>
>>> Does Xen need to do this?  Or could dom0 sort that out for itself after
>>> boot?
>>
>> There are two aspects of this.  First would be, if dom0.nvcpus <
>> host.npcpus, to place the vcpus reasonably on the various numa nodes.
>
> Well, that part at least seems like it can be managed quite nicely from
> dom0 userspace, in a Xen init script.  But...
>
>> The second is to make the pfn -> NUMA node layout reasonable.  At the
>> moment, as I understand it, pfns will be striped across nodes.  In
>> theory dom0 could deal with this, but it seems like in practice it's
>> going to be nasty trying to sort that stuff out.  It would be much
>> better, if you have (say) 4 nodes and 4GiB of memory assigned to dom0,
>> to have pfn 0-1G on node 0, 1-2G on node 2, &c.
>
> Yeah, I can see that fixing that post-hoc would be a PITA.  I guess if
> you figure out the vcpu assignments at dom0-build time, the normal NUMA
> memory allocation code will just DTRT (since that's what you'd want for
> a comparable domU)?

I'm not sure why you think so -- for one, please correct me if I'm 
wrong, but NUMA affinity is a domain construct, not a vcpu construct. 
Memory is allocated on behalf of a domain, not a vcpu, and is allocated 
a batch at a time.  So how is the memory allocator supposed to know that 
the current allocation request is in the middle of the second gigabyte 
of a 4G total, and thus to allocate from node 1?

What we would want for a comparable domU -- a domU that was NUMA-aware 
-- was to have the pfn layout in batches across the nodes to which it 
will be pinned.  E.g., if a domU has its NUMA affinity set to nodes 2-3, 
then you'd want the first half of the pfns to come from node 2, the 
second half from node 3.

In both cases, the domain builder will need to call the allocator with 
specific numa nodes for specific regions of the PFN space.

  -George

next prev parent reply	other threads:[~2013-05-21  9:45 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-20 13:44 [Hackathon minutes] PV frontends/backends and NUMA machines Stefano Stabellini
2013-05-20 13:48 ` George Dunlap
2013-05-21  8:32   ` Tim Deegan
2013-05-21  8:47     ` George Dunlap
2013-05-21  8:49       ` George Dunlap
2013-05-21 10:03         ` Dario Faggioli
2013-05-21  9:20       ` Tim Deegan
2013-05-21  9:45         ` George Dunlap [this message]
2013-05-21 10:24           ` Tim Deegan
2013-05-21 10:28             ` George Dunlap
2013-05-21 11:12               ` Dario Faggioli
2013-05-21  9:53         ` Dario Faggioli
2013-05-21 10:06       ` Jan Beulich
2013-05-21 10:30         ` Dario Faggioli
2013-05-21 10:43           ` Jan Beulich
2013-05-21 10:58             ` Dario Faggioli
2013-05-21 11:47               ` Jan Beulich
2013-05-21 13:43                 ` Dario Faggioli
2013-05-24 16:00                   ` George Dunlap
2013-05-25 13:57                     ` Dario Faggioli
2013-05-21  8:44   ` Roger Pau Monné
2013-05-21  9:24     ` Wei Liu
2013-05-21  9:53       ` George Dunlap
2013-05-21 10:17         ` Dario Faggioli
2013-05-21 11:10   ` Dario Faggioli
2013-05-23 17:21     ` Dario Faggioli
2013-05-22  1:28   ` Konrad Rzeszutek Wilk
2013-05-22  7:44     ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=519B4241.2040006@eu.citrix.com \
    --to=george.dunlap@eu.citrix.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=tim@xen.org \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).