From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:41420) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RSfWF-000573-TC for qemu-devel@nongnu.org; Mon, 21 Nov 2011 20:51:32 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RSfWE-0007w7-Se for qemu-devel@nongnu.org; Mon, 21 Nov 2011 20:51:31 -0500 Received: from mail-gy0-f173.google.com ([209.85.160.173]:43667) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RSfWE-0007w1-Lu for qemu-devel@nongnu.org; Mon, 21 Nov 2011 20:51:30 -0500 Received: by ghbg19 with SMTP id g19so3092813ghb.4 for ; Mon, 21 Nov 2011 17:51:29 -0800 (PST) Message-ID: <4ECB0019.7020800@codemonkey.ws> Date: Mon, 21 Nov 2011 19:51:21 -0600 From: Anthony Liguori MIME-Version: 1.0 References: <20111029184502.GH11038@in.ibm.com> <7816C401-9BE5-48A9-8BA9-4CDAD1B39FC8@suse.de> <20111108173304.GA14486@sequoia.sous-sol.org> <20111121150054.GA3602@in.ibm.com> <1321889126.28118.5.camel@twins> <20111121160001.GB3602@in.ibm.com> <1321894980.28118.16.camel@twins> In-Reply-To: <1321894980.28118.16.camel@twins> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH] Exporting Guest RAM information for NUMA binding List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Zijlstra Cc: Andrea Arcangeli , kvm list , dipankar@in.ibm.com, qemu-devel Developers , Alexander Graf , Chris Wright , bharata@linux.vnet.ibm.com, Vaidyanathan S On 11/21/2011 11:03 AM, Peter Zijlstra wrote: > On Mon, 2011-11-21 at 21:30 +0530, Bharata B Rao wrote: >> >> In the original post of this mail thread, I proposed a way to export >> guest RAM ranges (Guest Physical Address-GPA) and their corresponding host >> host virtual mappings (Host Virtual Address-HVA) from QEMU (via QEMU monitor). >> The idea was to use this GPA to HVA mappings from tools like libvirt to bind >> specific parts of the guest RAM to different host nodes. This needed an >> extension to existing mbind() to allow binding memory of a process(QEMU) from a >> different process(libvirt). This was needed since we wanted to do all this from >> libvirt. >> >> Hence I was coming from that background when I asked for extending >> ms_mbind() to take a tid parameter. If QEMU community thinks that NUMA >> binding should all be done from outside of QEMU, it is needed, otherwise >> what you have should be sufficient. > > That's just retarded, and no you won't get such extentions. Poking at > another process's virtual address space is just daft. Esp. if there's no > actual reason for it. Yes, that would be a terrible interface. Fundamentally, the entity that should be deciding what memory should be present and where it should located is the kernel. I'm fundamentally opposed to trying to make QEMU override the scheduler/mm by using cpu or memory pinning in QEMU. From what I can tell about ms_mbind(), it just uses process knowledge to bind specific areas of memory to a memsched group and let's the kernel decide what to do with that knowledge. This is exactly the type of interface that QEMU should be using. QEMU should tell the kernel enough information such that the kernel can make good decisions. QEMU should not be the one making the decisions. It looks like ms_mbind() takes a flags argument which I assume is the same flags as mbind(). The current implementation ignores flags and just uses MPOL_BIND. I would hope that the flags argument would only be treated as advisory by the kernel. Regards, Anthony Liguori > > Furthermore, it would make libvirt a required part of qemu, and since I > don't think I've ever use libvirt that's another reason to object, I > don't need that stinking mess. >