From mboxrd@z Thu Jan  1 00:00:00 1970
From: Andre Przywara <andre.przywara@amd.com>
Subject: Re: NUMA TODO-list for xen-devel
Date: Fri, 3 Aug 2012 12:02:40 +0200
Message-ID: <501BA1C0.7040100@amd.com>
References: <1343837796.4958.32.camel@Solace>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <1343837796.4958.32.camel@Solace>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Dario Faggioli <raistlin@linux.it>
Cc: Anil Madhavapeddy <anil@recoil.org>, George Dunlap <dunlapg@gmail.com>, xen-devel <xen-devel@lists.xen.org>, Jan Beulich <JBeulich@suse.com>, Andrew Cooper <Andrew.Cooper3@citrix.com>, "Zhang,
	Yang Z" <yang.z.zhang@intel.com>
List-Id: xen-devel@lists.xenproject.org

On 08/01/2012 06:16 PM, Dario Faggioli wrote:
> Hi everyone,
>
> With automatic placement finally landing into xen-unstable, I stated
> thinking about what I could work on next, still in the field of
> improving Xen's NUMA support. Well, it turned out that running out of
> things to do is not an option! :-O
>
> In fact, I can think of quite a bit of open issues in that area, that I'm
> just braindumping here.

> ...
>
>         * automatic placement of Dom0, if possible (my current series is
>           only affecting DomU)

I think Dom0 NUMA awareness should be one of the top priorities. If I 
boot my 8-node box with Xen, I end up with a NUMA-clueless Dom0 which 
actually has memory from all 8 nodes and thinks it's memory is flat.
There are some tricks to confine it to node 0 (dom0_mem=<memory of 
node0> dom0_vcpus=<cores in node0> dom0_vcpus_pin), but this requires 
intimate knowledge of the systems parameters and is error-prone. Also 
this does not work well with ballooning.
Actually we could improve the NUMA placement with that: By asking the 
Dom0 explicitly for memory from a certain node.

>         * having internal xen data structure honour the placement (e.g.,
>           I've been told that right now vcpu stacks are always allocated
>           on node 0... Andrew?).
>
> [D] - NUMA aware scheduling in Xen. Don't pin vcpus on nodes' pcpus,
>        just have them _prefer_ running on the nodes where their memory
>        is.

This would be really cool. I once thought about something like a 
home-node. We start with placement to allocate memory from one node. 
Then we relax the VCPU-pinning, but mark this node as special for this 
guest, so that it if possible happens to get run there. But in times of 
CPU pressure we are happy to let it run on other nodes: CPU starving is 
much worse than NUMA penalty.

>
> [D] - Dynamic memory migration between different nodes of the host. As
>        the counter-part of the NUMA-aware scheduler.

I once read about a VMware feature: bandwith-limited migration in the 
background, hot pages first. So we get flexibility and avoid CPU 
starving, but still don't hog the system with memory copying.
Sounds quite ambitious, though.

Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany