From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:52103) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Sllo8-0004Pn-GX for qemu-devel@nongnu.org; Mon, 02 Jul 2012 14:57:14 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Sllo6-0000vv-5V for qemu-devel@nongnu.org; Mon, 02 Jul 2012 14:57:12 -0400 Received: from mx1.redhat.com ([209.132.183.28]:37462) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Sllo5-0000ve-TJ for qemu-devel@nongnu.org; Mon, 02 Jul 2012 14:57:10 -0400 Date: Mon, 2 Jul 2012 19:56:58 +0100 From: "Daniel P. Berrange" Message-ID: <20120702185658.GD10310@redhat.com> References: <1341252398-12268-1-git-send-email-ehabkost@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <1341252398-12268-1-git-send-email-ehabkost@redhat.com> Subject: Re: [Qemu-devel] [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2) Reply-To: "Daniel P. Berrange" List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eduardo Habkost Cc: Andrea Arcangeli , Andre Przywara , kvm@vger.kernel.org, qemu-devel@nongnu.org, Bharata B Rao , Bill Gray On Mon, Jul 02, 2012 at 03:06:32PM -0300, Eduardo Habkost wrote: > Resending series, after fixing some coding style issues. Does anybody has any > feedback about this proposal? > > Changes v1 -> v2: > - Coding style fixes > > Original cover letter: > > I was investigating if there are any mechanisms that allow manually pinning of > guest RAM to specific host NUMA nodes, in the case of multi-node KVM guests, and > noticed that -mem-path could be used for that, except that it currently removes > any files it creates (using mkstemp()) immediately, not allowing numactl to be > used on the backing files, as a result. This patches add a -keep-mem-path-files > option to make QEMU create the files inside -mem-path with more predictable > names, and not remove them after creation. > > Some previous discussions about the subject, for reference: > - Message-ID: <1281534738-8310-1-git-send-email-andre.przywara@amd.com> > http://article.gmane.org/gmane.comp.emulators.kvm.devel/57684 > - Message-ID: <4C7D7C2A.7000205@codemonkey.ws> > http://article.gmane.org/gmane.comp.emulators.kvm.devel/58835 > > A more recent thread can be found at: > - Message-ID: <20111029184502.GH11038@in.ibm.com> > http://article.gmane.org/gmane.comp.emulators.qemu/123001 > > Note that this is just a mechanism to facilitate manual static binding using > numactl on hugetlbfs later, for optimization. This may be especially useful for > single large multi-node guests use-cases (and, of course, has to be used with > care). > > I don't know if it is a good idea to use the memory range names as a publicly- > visible interface. Another option may be to use a single file instead, and mmap > different regions inside the same file for each memory region. I an open to > comments and suggestions. > > Example (untested) usage to bind manually each half of the RAM of a guest to a > different NUMA node: > > $ qemu-system-x86_64 [...] -m 2048 -smp 4 \ > -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \ > -mem-prealloc -keep-mem-path-files -mem-path /mnt/hugetlbfs/FOO > $ numactl --offset=1G --length=1G --membind=1 --file /mnt/hugetlbfs/FOO/pc.ram > $ numactl --offset=0 --length=1G --membind=2 --file /mnt/hugetlbfs/FOO/pc.ram I'd suggest that instead of making the memory file name into a public ABI QEMU needs to maintain, QEMU could expose the info via a monitor command. eg $ qemu-system-x86_64 [...] -m 2048 -smp 4 \ -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \ -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \ -monitor stdio (qemu) info mem-nodes node0: file=/proc/self/fd/3, offset=0G, length=1G node1: file=/proc/self/fd/3, offset=1G, length=1G This example takes advantage of the fact that with Linux, you can still access a deleted file via /proc/self/fd/NNN, which AFAICT, would avoid the need for a --keep-mem-path-files. By returning info via a monitor command you also avoid hardcoding the use of 1 single file for all of memory. You also avoid hardcoding the fact that QEMU stores the nodes in contiguous order inside the node. eg QEMU could easily return data like this $ qemu-system-x86_64 [...] -m 2048 -smp 4 \ -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \ -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \ -monitor stdio (qemu) info mem-nodes node0: file=/proc/self/fd/3, offset=0G, length=1G node1: file=/proc/self/fd/4, offset=0G, length=1G or more ingeneous options Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|