From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:37649) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Slz1X-0001rn-5m for qemu-devel@nongnu.org; Tue, 03 Jul 2012 05:04:05 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Slz1S-0006Cs-UG for qemu-devel@nongnu.org; Tue, 03 Jul 2012 05:03:54 -0400 Received: from mx1.redhat.com ([209.132.183.28]:16619) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Slz1S-0006Cg-Lv for qemu-devel@nongnu.org; Tue, 03 Jul 2012 05:03:50 -0400 Date: Tue, 3 Jul 2012 10:03:43 +0100 From: "Daniel P. Berrange" Message-ID: <20120703090343.GB12702@redhat.com> References: <1341252398-12268-1-git-send-email-ehabkost@redhat.com> <20120702185658.GD10310@redhat.com> <20120702195403.GA3239@otherpad.lan.raisama.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20120702195403.GA3239@otherpad.lan.raisama.net> Subject: Re: [Qemu-devel] [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2) Reply-To: "Daniel P. Berrange" List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eduardo Habkost Cc: Andrea Arcangeli , Andre Przywara , kvm@vger.kernel.org, qemu-devel@nongnu.org, Bharata B Rao , Bill Gray On Mon, Jul 02, 2012 at 04:54:03PM -0300, Eduardo Habkost wrote: > On Mon, Jul 02, 2012 at 07:56:58PM +0100, Daniel P. Berrange wrote: > > On Mon, Jul 02, 2012 at 03:06:32PM -0300, Eduardo Habkost wrote: > > > Resending series, after fixing some coding style issues. Does anybody has any > > > feedback about this proposal? > > > > > > Changes v1 -> v2: > > > - Coding style fixes > > > > > > Original cover letter: > > > > > > I was investigating if there are any mechanisms that allow manually pinning of > > > guest RAM to specific host NUMA nodes, in the case of multi-node KVM guests, and > > > noticed that -mem-path could be used for that, except that it currently removes > > > any files it creates (using mkstemp()) immediately, not allowing numactl to be > > > used on the backing files, as a result. This patches add a -keep-mem-path-files > > > option to make QEMU create the files inside -mem-path with more predictable > > > names, and not remove them after creation. > > > > > > Some previous discussions about the subject, for reference: > > > - Message-ID: <1281534738-8310-1-git-send-email-andre.przywara@amd.com> > > > http://article.gmane.org/gmane.comp.emulators.kvm.devel/57684 > > > - Message-ID: <4C7D7C2A.7000205@codemonkey.ws> > > > http://article.gmane.org/gmane.comp.emulators.kvm.devel/58835 > > > > > > A more recent thread can be found at: > > > - Message-ID: <20111029184502.GH11038@in.ibm.com> > > > http://article.gmane.org/gmane.comp.emulators.qemu/123001 > > > > > > Note that this is just a mechanism to facilitate manual static binding using > > > numactl on hugetlbfs later, for optimization. This may be especially useful for > > > single large multi-node guests use-cases (and, of course, has to be used with > > > care). > > > > > > I don't know if it is a good idea to use the memory range names as a publicly- > > > visible interface. Another option may be to use a single file instead, and mmap > > > different regions inside the same file for each memory region. I an open to > > > comments and suggestions. > > > > > > Example (untested) usage to bind manually each half of the RAM of a guest to a > > > different NUMA node: > > > > > > $ qemu-system-x86_64 [...] -m 2048 -smp 4 \ > > > -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \ > > > -mem-prealloc -keep-mem-path-files -mem-path /mnt/hugetlbfs/FOO > > > $ numactl --offset=1G --length=1G --membind=1 --file /mnt/hugetlbfs/FOO/pc.ram > > > $ numactl --offset=0 --length=1G --membind=2 --file /mnt/hugetlbfs/FOO/pc.ram > > > > I'd suggest that instead of making the memory file name into a > > public ABI QEMU needs to maintain, QEMU could expose the info > > via a monitor command. eg > > > > $ qemu-system-x86_64 [...] -m 2048 -smp 4 \ > > -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \ > > -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \ > > -monitor stdio > > (qemu) info mem-nodes > > node0: file=/proc/self/fd/3, offset=0G, length=1G > > node1: file=/proc/self/fd/3, offset=1G, length=1G > > > > This example takes advantage of the fact that with Linux, you can > > still access a deleted file via /proc/self/fd/NNN, which AFAICT, > > would avoid the need for a --keep-mem-path-files. > > I like the suggestion. > > But other processes still need to be able to open those files if we want > to do anything useful with them. In this case, I guess it's better to > let QEMU itself build a "/proc//fd/" string instead of > using "/proc/self" and forcing the client to find out what's the right > PID? > > Anyway, even if we want to avoid file-descriptor and /proc tricks, we > can still use the interface you suggest. Then we wouldn't need to have > any filename assumptions: the filenames could be completly random, as > they would be reported using the new monitor command. Opps, yes of course. I did intend that client apps could use the files, so I should have used /proc/$PID and not /proc/self > > > > > By returning info via a monitor command you also avoid hardcoding > > the use of 1 single file for all of memory. You also avoid hardcoding > > the fact that QEMU stores the nodes in contiguous order inside the > > node. eg QEMU could easily return data like this > > > > > > $ qemu-system-x86_64 [...] -m 2048 -smp 4 \ > > -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \ > > -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \ > > -monitor stdio > > (qemu) info mem-nodes > > node0: file=/proc/self/fd/3, offset=0G, length=1G > > node1: file=/proc/self/fd/4, offset=0G, length=1G > > > > or more ingeneous options > > Sounds good. > > -- > Eduardo -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|