From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anthony Liguori Subject: Re: [Qemu-devel] [RFC PATCH] Exporting Guest RAM information for NUMA binding Date: Mon, 21 Nov 2011 19:51:21 -0600 Message-ID: <4ECB0019.7020800@codemonkey.ws> References: <20111029184502.GH11038@in.ibm.com> <7816C401-9BE5-48A9-8BA9-4CDAD1B39FC8@suse.de> <20111108173304.GA14486@sequoia.sous-sol.org> <20111121150054.GA3602@in.ibm.com> <1321889126.28118.5.camel@twins> <20111121160001.GB3602@in.ibm.com> <1321894980.28118.16.camel@twins> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: bharata@linux.vnet.ibm.com, Andrea Arcangeli , kvm list , Alexander Graf , qemu-devel Developers , Chris Wright , dipankar@in.ibm.com, Vaidyanathan S To: Peter Zijlstra Return-path: Received: from mail-gx0-f174.google.com ([209.85.161.174]:35551 "EHLO mail-gx0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751596Ab1KVBva (ORCPT ); Mon, 21 Nov 2011 20:51:30 -0500 Received: by ggnr5 with SMTP id r5so3026255ggn.19 for ; Mon, 21 Nov 2011 17:51:29 -0800 (PST) In-Reply-To: <1321894980.28118.16.camel@twins> Sender: kvm-owner@vger.kernel.org List-ID: On 11/21/2011 11:03 AM, Peter Zijlstra wrote: > On Mon, 2011-11-21 at 21:30 +0530, Bharata B Rao wrote: >> >> In the original post of this mail thread, I proposed a way to export >> guest RAM ranges (Guest Physical Address-GPA) and their corresponding host >> host virtual mappings (Host Virtual Address-HVA) from QEMU (via QEMU monitor). >> The idea was to use this GPA to HVA mappings from tools like libvirt to bind >> specific parts of the guest RAM to different host nodes. This needed an >> extension to existing mbind() to allow binding memory of a process(QEMU) from a >> different process(libvirt). This was needed since we wanted to do all this from >> libvirt. >> >> Hence I was coming from that background when I asked for extending >> ms_mbind() to take a tid parameter. If QEMU community thinks that NUMA >> binding should all be done from outside of QEMU, it is needed, otherwise >> what you have should be sufficient. > > That's just retarded, and no you won't get such extentions. Poking at > another process's virtual address space is just daft. Esp. if there's no > actual reason for it. Yes, that would be a terrible interface. Fundamentally, the entity that should be deciding what memory should be present and where it should located is the kernel. I'm fundamentally opposed to trying to make QEMU override the scheduler/mm by using cpu or memory pinning in QEMU. From what I can tell about ms_mbind(), it just uses process knowledge to bind specific areas of memory to a memsched group and let's the kernel decide what to do with that knowledge. This is exactly the type of interface that QEMU should be using. QEMU should tell the kernel enough information such that the kernel can make good decisions. QEMU should not be the one making the decisions. It looks like ms_mbind() takes a flags argument which I assume is the same flags as mbind(). The current implementation ignores flags and just uses MPOL_BIND. I would hope that the flags argument would only be treated as advisory by the kernel. Regards, Anthony Liguori > > Furthermore, it would make libvirt a required part of qemu, and since I > don't think I've ever use libvirt that's another reason to object, I > don't need that stinking mess. >