From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Wright Subject: Re: [RFC PATCH] Exporting Guest RAM information for NUMA binding Date: Tue, 8 Nov 2011 09:33:04 -0800 Message-ID: <20111108173304.GA14486@sequoia.sous-sol.org> References: <20111029184502.GH11038@in.ibm.com> <7816C401-9BE5-48A9-8BA9-4CDAD1B39FC8@suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Andrea Arcangeli , Peter Zijlstra , kvm list , bharata@linux.vnet.ibm.com, qemu-devel Developers , dipankar@in.ibm.com, Vaidyanathan S To: Alexander Graf Return-path: Content-Disposition: inline In-Reply-To: <7816C401-9BE5-48A9-8BA9-4CDAD1B39FC8@suse.de> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org Sender: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org List-Id: kvm.vger.kernel.org * Alexander Graf (agraf@suse.de) wrote: > On 29.10.2011, at 20:45, Bharata B Rao wrote: > > As guests become NUMA aware, it becomes important for the guests to > > have correct NUMA policies when they run on NUMA aware hosts. > > Currently limited support for NUMA binding is available via libvirt > > where it is possible to apply a NUMA policy to the guest as a whole. > > However multinode guests would benefit if guest memory belonging to > > different guest nodes are mapped appropriately to different host NUMA nodes. > > > > To achieve this we would need QEMU to expose information about > > guest RAM ranges (Guest Physical Address - GPA) and their host virtual > > address mappings (Host Virtual Address - HVA). Using GPA and HVA, any external > > tool like libvirt would be able to divide the guest RAM as per the guest NUMA > > node geometry and bind guest memory nodes to corresponding host memory nodes > > using HVA. This needs both QEMU (and libvirt) changes as well as changes > > in the kernel. > > Ok, let's take a step back here. You are basically growing libvirt into a memory resource manager that know how much memory is available on which nodes and how these nodes would possibly fit into the host's memory layout. > > Shouldn't that be the kernel's job? It seems to me that architecturally the kernel is the place I would want my memory resource controls to be in. I think that both Peter and Andrea are looking at this. Before we commit an API to QEMU that has a different semantic than a possible new kernel interface (that perhaps QEMU could use directly to inform kernel of the binding/relationship between vcpu thread and it's memory at VM startuup) it would be useful to see what these guys are working on... thanks, -chris