qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Daniel P. Berrange" <berrange@redhat.com>
To: Eduardo Habkost <ehabkost@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	Andre Przywara <andre.przywara@amd.com>,
	kvm@vger.kernel.org, qemu-devel@nongnu.org,
	Bharata B Rao <bharata@linux.vnet.ibm.com>,
	Bill Gray <bgray@redhat.com>
Subject: Re: [Qemu-devel] [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2)
Date: Tue, 3 Jul 2012 10:03:43 +0100	[thread overview]
Message-ID: <20120703090343.GB12702@redhat.com> (raw)
In-Reply-To: <20120702195403.GA3239@otherpad.lan.raisama.net>

On Mon, Jul 02, 2012 at 04:54:03PM -0300, Eduardo Habkost wrote:
> On Mon, Jul 02, 2012 at 07:56:58PM +0100, Daniel P. Berrange wrote:
> > On Mon, Jul 02, 2012 at 03:06:32PM -0300, Eduardo Habkost wrote:
> > > Resending series, after fixing some coding style issues. Does anybody has any
> > > feedback about this proposal?
> > > 
> > > Changes v1 -> v2:
> > >  - Coding style fixes
> > > 
> > > Original cover letter:
> > > 
> > > I was investigating if there are any mechanisms that allow manually pinning of
> > > guest RAM to specific host NUMA nodes, in the case of multi-node KVM guests, and
> > > noticed that -mem-path could be used for that, except that it currently removes
> > > any files it creates (using mkstemp()) immediately, not allowing numactl to be
> > > used on the backing files, as a result. This patches add a -keep-mem-path-files
> > > option to make QEMU create the files inside -mem-path with more predictable
> > > names, and not remove them after creation.
> > > 
> > > Some previous discussions about the subject, for reference:
> > >  - Message-ID: <1281534738-8310-1-git-send-email-andre.przywara@amd.com>
> > >    http://article.gmane.org/gmane.comp.emulators.kvm.devel/57684
> > >  - Message-ID: <4C7D7C2A.7000205@codemonkey.ws>
> > >    http://article.gmane.org/gmane.comp.emulators.kvm.devel/58835
> > > 
> > > A more recent thread can be found at:
> > >  - Message-ID: <20111029184502.GH11038@in.ibm.com>
> > >    http://article.gmane.org/gmane.comp.emulators.qemu/123001
> > > 
> > > Note that this is just a mechanism to facilitate manual static binding using
> > > numactl on hugetlbfs later, for optimization. This may be especially useful for
> > > single large multi-node guests use-cases (and, of course, has to be used with
> > > care).
> > > 
> > > I don't know if it is a good idea to use the memory range names as a publicly-
> > > visible interface. Another option may be to use a single file instead, and mmap
> > > different regions inside the same file for each memory region. I an open to
> > > comments and suggestions.
> > > 
> > > Example (untested) usage to bind manually each half of the RAM of a guest to a
> > > different NUMA node:
> > > 
> > >  $ qemu-system-x86_64 [...] -m 2048 -smp 4 \
> > >    -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
> > >    -mem-prealloc -keep-mem-path-files -mem-path /mnt/hugetlbfs/FOO
> > >  $ numactl --offset=1G --length=1G --membind=1 --file /mnt/hugetlbfs/FOO/pc.ram
> > >  $ numactl --offset=0  --length=1G --membind=2 --file /mnt/hugetlbfs/FOO/pc.ram
> > 
> > I'd suggest that instead of making the memory file name into a
> > public ABI QEMU needs to maintain, QEMU could expose the info
> > via a monitor command. eg
> > 
> >    $ qemu-system-x86_64 [...] -m 2048 -smp 4 \
> >      -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
> >      -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \
> >      -monitor stdio
> >    (qemu) info mem-nodes
> >     node0: file=/proc/self/fd/3, offset=0G, length=1G
> >     node1: file=/proc/self/fd/3, offset=1G, length=1G
> > 
> > This example takes advantage of the fact that with Linux, you can
> > still access a deleted file via /proc/self/fd/NNN, which AFAICT,
> > would avoid the need for a --keep-mem-path-files.
> 
> I like the suggestion.
> 
> But other processes still need to be able to open those files if we want
> to do anything useful with them. In this case, I guess it's better to
> let QEMU itself build a "/proc/<getpid()>/fd/<fd>" string instead of
> using "/proc/self" and forcing the client to find out what's the right
> PID?
> 
> Anyway, even if we want to avoid file-descriptor and /proc tricks, we
> can still use the interface you suggest. Then we wouldn't need to have
> any filename assumptions: the filenames could be completly random, as
> they would be reported using the new monitor command.

Opps, yes of course. I did intend that client apps could use the
files, so I should have used  /proc/$PID and not /proc/self

> 
> > 
> > By returning info via a monitor command you also avoid hardcoding
> > the use of 1 single file for all of memory. You also avoid hardcoding
> > the fact that QEMU stores the nodes in contiguous order inside the
> > node. eg QEMU could easily return data like this
> > 
> > 
> >    $ qemu-system-x86_64 [...] -m 2048 -smp 4 \
> >      -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
> >      -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \
> >      -monitor stdio
> >    (qemu) info mem-nodes
> >     node0: file=/proc/self/fd/3, offset=0G, length=1G
> >     node1: file=/proc/self/fd/4, offset=0G, length=1G
> > 
> > or more ingeneous options
> 
> Sounds good.
> 
> -- 
> Eduardo

-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

      reply	other threads:[~2012-07-03  9:04 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-02 18:06 [Qemu-devel] [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2) Eduardo Habkost
2012-07-02 18:06 ` [Qemu-devel] [PATCH 1/6] file_ram_alloc(): coding style fixes Eduardo Habkost
2012-07-03 19:16   ` Blue Swirl
2012-07-02 18:06 ` [Qemu-devel] [PATCH 2/6] file_ram_alloc(): use g_strdup_printf() instead of asprintf() Eduardo Habkost
2012-07-03 19:16   ` Blue Swirl
2012-07-02 18:06 ` [Qemu-devel] [PATCH 3/6] vl.c: change mem_prealloc to bool (v2) Eduardo Habkost
2012-07-02 18:06 ` [Qemu-devel] [PATCH 4/6] file_ram_alloc: change length argument to size_t (v2) Eduardo Habkost
2012-07-02 18:06 ` [Qemu-devel] [PATCH 5/6] file_ram_alloc(): extract temporary-file creation code to separate function (v2) Eduardo Habkost
2012-07-02 18:06 ` [Qemu-devel] [RFC PATCH 6/6] add -keep-mem-path-files option (v2) Eduardo Habkost
2012-07-02 18:56 ` [Qemu-devel] [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2) Daniel P. Berrange
2012-07-02 19:54   ` Eduardo Habkost
2012-07-03  9:03     ` Daniel P. Berrange [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120703090343.GB12702@redhat.com \
    --to=berrange@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=andre.przywara@amd.com \
    --cc=bgray@redhat.com \
    --cc=bharata@linux.vnet.ibm.com \
    --cc=ehabkost@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).