From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Vrabel Subject: Re: RFC: vNUMA project Date: Tue, 11 Nov 2014 18:03:22 +0000 Message-ID: <54624F6A.40002@citrix.com> References: <20141111173606.GC21312@zion.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20141111173606.GC21312@zion.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Wei Liu , xen-devel@lists.xen.org Cc: Dario Faggioli , David Vrabel , Jan Beulich List-Id: xen-devel@lists.xenproject.org On 11/11/14 17:36, Wei Liu wrote: > # What's already implemented? > > PV vNUMA support in libxl/xl and Linux kernel. Linux doesn't have vnuma yet, although the last set of patches I saw looked fine and were waiting for acks from x86 maintainers I think. > # NUMA-aware ballooning > > It's agreed that NUMA-aware ballooning should be achieved solely in > hypervisor. Everything should happen under the hood without guest > knowing vnode to pnode mapping. > > As far as I can tell, existing guests (Linux and FreeBSD) use > XENMEM_populate_physmap to balloon up. There's a hypercall > called XENMEM_increase_reservation but it's not used > by Linux and FreeBSD. > > I can think of two options to implement NUMA-aware ballooning: > > 1. Modify XENMEM_populate_physmap to take into account vNUMA hint > when it tries to allocate a page for guest. [...] > Option #1 requires less modification to guest, because guest won't > need to switch to new hypercall. It's unclear at this point if a guest > asks to populate a gpfn that doesn't belong to any vnode, what Xen > should do about it. Should it be permissive or strict? There are XENMEMF flags to request exact node or not -- leave it up to the balloon driver. The Linux balloon driver could try exact on all nodes before falling back to permissive or just always try inexact. Perhaps a XENMEMF_vnode bit to indicate the node is virtual? > > # HVM vNUMA > > HVM vNUMA is implemented as followed: > > 1. Libxl generates vNUMA information and passes it to hvmloader. > 2. Hvmloader build SRAT table. > > Note that hvmloader is capable of relocating memory. This means > toolstack and guest can have different ideas of the memory layout. Why can't hvmloader update the vnuma tables after it has relocated memory? David