From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:43339) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RSfbo-0006BT-SP for qemu-devel@nongnu.org; Mon, 21 Nov 2011 20:57:20 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RSfbk-0008Tl-TE for qemu-devel@nongnu.org; Mon, 21 Nov 2011 20:57:16 -0500 Received: from mail-gy0-f173.google.com ([209.85.160.173]:35394) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RSfbk-0008Th-Mr for qemu-devel@nongnu.org; Mon, 21 Nov 2011 20:57:12 -0500 Received: by ghbg19 with SMTP id g19so3097835ghb.4 for ; Mon, 21 Nov 2011 17:57:12 -0800 (PST) Message-ID: <4ECB0175.8080504@codemonkey.ws> Date: Mon, 21 Nov 2011 19:57:09 -0600 From: Anthony Liguori MIME-Version: 1.0 References: <20111029184502.GH11038@in.ibm.com> <7816C401-9BE5-48A9-8BA9-4CDAD1B39FC8@suse.de> <20111108173304.GA14486@sequoia.sous-sol.org> <20111121150054.GA3602@in.ibm.com> <1321889126.28118.5.camel@twins> <20111121160001.GB3602@in.ibm.com> <1321894980.28118.16.camel@twins> <20111121225010.GE3344@sequoia.sous-sol.org> In-Reply-To: <20111121225010.GE3344@sequoia.sous-sol.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH] Exporting Guest RAM information for NUMA binding List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Chris Wright Cc: Andrea Arcangeli , Peter Zijlstra , kvm list , bharata@linux.vnet.ibm.com, Alexander Graf , qemu-devel Developers , dipankar@in.ibm.com, Vaidyanathan S On 11/21/2011 04:50 PM, Chris Wright wrote: > * Peter Zijlstra (a.p.zijlstra@chello.nl) wrote: >> On Mon, 2011-11-21 at 21:30 +0530, Bharata B Rao wrote: >>> >>> In the original post of this mail thread, I proposed a way to export >>> guest RAM ranges (Guest Physical Address-GPA) and their corresponding host >>> host virtual mappings (Host Virtual Address-HVA) from QEMU (via QEMU monitor). >>> The idea was to use this GPA to HVA mappings from tools like libvirt to bind >>> specific parts of the guest RAM to different host nodes. This needed an >>> extension to existing mbind() to allow binding memory of a process(QEMU) from a >>> different process(libvirt). This was needed since we wanted to do all this from >>> libvirt. >>> >>> Hence I was coming from that background when I asked for extending >>> ms_mbind() to take a tid parameter. If QEMU community thinks that NUMA >>> binding should all be done from outside of QEMU, it is needed, otherwise >>> what you have should be sufficient. >> >> That's just retarded, and no you won't get such extentions. Poking at >> another process's virtual address space is just daft. Esp. if there's no >> actual reason for it. > > Need to separate the binding vs the policy mgmt. The policy mgmt could > still be done outside, whereas the binding could still be done from w/in > QEMU. A simple monitor interface to rebalance vcpu memory allcoations > to different nodes could very well schedule vcpu thread work in QEMU. I really would prefer to avoid having such an interface. It's a shot gun that will only result in many poor feet being maimed. I can't tell you the number of times I've encountered people using CPU pinning when they have absolutely no business doing CPU pinning. If we really believe such an interface should exist, then the interface should really be from the kernel. Once we have memgroups, there's no reason to involve QEMU at all. QEMU can define the memgroups based on the NUMA nodes and then it's up to the kernel as to whether it exposes controls to explicitly bind memgroups within a process or not. Regards, Anthony Liguori > So, I agree, even if there is some external policy mgmt, it could still > easily work w/ QEMU to use Peter's proposed interface. > > thanks, > -chris >