xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Andre Przywara <andre.przywara@amd.com>
To: Dario Faggioli <raistlin@linux.it>
Cc: Anil Madhavapeddy <anil@recoil.org>,
	George Dunlap <dunlapg@gmail.com>,
	xen-devel <xen-devel@lists.xen.org>,
	Jan Beulich <JBeulich@suse.com>,
	Andrew Cooper <Andrew.Cooper3@citrix.com>,
	"Zhang, Yang Z" <yang.z.zhang@intel.com>
Subject: Re: NUMA TODO-list for xen-devel
Date: Fri, 3 Aug 2012 12:02:40 +0200	[thread overview]
Message-ID: <501BA1C0.7040100@amd.com> (raw)
In-Reply-To: <1343837796.4958.32.camel@Solace>

On 08/01/2012 06:16 PM, Dario Faggioli wrote:
> Hi everyone,
>
> With automatic placement finally landing into xen-unstable, I stated
> thinking about what I could work on next, still in the field of
> improving Xen's NUMA support. Well, it turned out that running out of
> things to do is not an option! :-O
>
> In fact, I can think of quite a bit of open issues in that area, that I'm
> just braindumping here.

> ...
>
>         * automatic placement of Dom0, if possible (my current series is
>           only affecting DomU)

I think Dom0 NUMA awareness should be one of the top priorities. If I 
boot my 8-node box with Xen, I end up with a NUMA-clueless Dom0 which 
actually has memory from all 8 nodes and thinks it's memory is flat.
There are some tricks to confine it to node 0 (dom0_mem=<memory of 
node0> dom0_vcpus=<cores in node0> dom0_vcpus_pin), but this requires 
intimate knowledge of the systems parameters and is error-prone. Also 
this does not work well with ballooning.
Actually we could improve the NUMA placement with that: By asking the 
Dom0 explicitly for memory from a certain node.

>         * having internal xen data structure honour the placement (e.g.,
>           I've been told that right now vcpu stacks are always allocated
>           on node 0... Andrew?).
>
> [D] - NUMA aware scheduling in Xen. Don't pin vcpus on nodes' pcpus,
>        just have them _prefer_ running on the nodes where their memory
>        is.

This would be really cool. I once thought about something like a 
home-node. We start with placement to allocate memory from one node. 
Then we relax the VCPU-pinning, but mark this node as special for this 
guest, so that it if possible happens to get run there. But in times of 
CPU pressure we are happy to let it run on other nodes: CPU starving is 
much worse than NUMA penalty.

>
> [D] - Dynamic memory migration between different nodes of the host. As
>        the counter-part of the NUMA-aware scheduler.

I once read about a VMware feature: bandwith-limited migration in the 
background, hot pages first. So we get flexibility and avoid CPU 
starving, but still don't hog the system with memory copying.
Sounds quite ambitious, though.

Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany

  parent reply	other threads:[~2012-08-03 10:02 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-01 16:16 NUMA TODO-list for xen-devel Dario Faggioli
2012-08-01 16:24 ` Dario Faggioli
2012-08-01 16:30 ` Andrew Cooper
2012-08-01 16:47   ` Dario Faggioli
2012-08-01 16:53     ` Andrew Cooper
2012-08-02  9:40   ` Jan Beulich
2012-08-02 13:21     ` Dario Faggioli
2012-08-01 16:32 ` Anil Madhavapeddy
2012-08-01 16:58   ` Dario Faggioli
2012-08-02  0:04     ` Malte Schwarzkopf
2012-08-07 23:53       ` Dario Faggioli
2012-08-02  1:04 ` Zhang, Yang Z
2012-08-07 22:56   ` Dario Faggioli
2012-08-02  9:43 ` Jan Beulich
2012-08-02 13:34   ` Dario Faggioli
2012-08-02 14:07     ` Jan Beulich
2012-08-02 16:36     ` George Dunlap
2012-08-03  9:23       ` Jan Beulich
2012-08-03  9:48         ` Andre Przywara
2012-08-03 10:03           ` Jan Beulich
2012-08-03 22:40             ` Dan Magenheimer
2012-08-03 11:00           ` George Dunlap
2012-08-03 22:34   ` Dan Magenheimer
2012-08-06  7:15     ` Jan Beulich
2012-08-06 16:28       ` Dan Magenheimer
2012-08-03 10:02 ` Andre Przywara [this message]
2012-08-03 10:40   ` Jan Beulich
2012-08-03 11:26     ` Andre Przywara
2012-08-03 11:38       ` Jan Beulich
2012-08-03 13:14         ` Dario Faggioli
2012-08-03 13:52           ` Jan Beulich
2012-08-03 22:42   ` Dan Magenheimer
2012-08-08  7:07     ` Dario Faggioli
2012-08-08  7:43   ` Dario Faggioli
2012-08-03 22:22 ` Dan Magenheimer
2012-08-07 23:49   ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=501BA1C0.7040100@amd.com \
    --to=andre.przywara@amd.com \
    --cc=Andrew.Cooper3@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=anil@recoil.org \
    --cc=dunlapg@gmail.com \
    --cc=raistlin@linux.it \
    --cc=xen-devel@lists.xen.org \
    --cc=yang.z.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).