From: Dipankar Sarma <dipankar@in.ibm.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
kvm list <kvm@vger.kernel.org>,
qemu-devel Developers <qemu-devel@nongnu.org>,
Alexander Graf <agraf@suse.de>,
Chris Wright <chrisw@sous-sol.org>,
bharata@linux.vnet.ibm.com, Vaidyanathan S <svaidy@in.ibm.com>
Subject: Re: [Qemu-devel] [RFC PATCH] Exporting Guest RAM information for NUMA binding
Date: Thu, 1 Dec 2011 22:55:20 +0530 [thread overview]
Message-ID: <20111201172520.GA26737@in.ibm.com> (raw)
In-Reply-To: <20111130174113.GM23466@redhat.com>
On Wed, Nov 30, 2011 at 06:41:13PM +0100, Andrea Arcangeli wrote:
> On Wed, Nov 30, 2011 at 09:52:37PM +0530, Dipankar Sarma wrote:
> > create the guest topology correctly and optimize for NUMA. This
> > would work for us.
>
> Even on the case of 1 guest that fits in one node, you're not going to
> max out the full bandwidth of all memory channels with this.
>
> qemu all can do with ms_mbind/tbind is to create a vtopology that
> matches the hardware topology. It has these limits:
>
> 1) requires all userland applications to be modified to scan either
> the physical topology if run on host, or the vtopology if run on
> guest to get the full benefit.
Not sure why you would need that. qemu can reflect the
topology based on -numa specifications and the corresponding
ms_tbind/mbind in FDT (in the case of Power, I guess ACPI
tables for x86) and guest kernel would detect this virtualized
topology. So there is no need for two types of topologies afaics.
It will all be reflected in /sys/devices/system/node in the guest.
>
> 2) breaks across live migration if host physical topology changes
That is indeed an issue. Either VM placement software needs to
be really smart to migrate VMs that fit well or, more likely,
we will have to find a way to make guest kernels aware of
topology changes. But the latter has impact on userspace
as well for applications that might have optimized for NUMA.
> 3) 1 small guest on a idle numa system that fits in one numa node will
> tell not enough information to the host kernel
>
> 4) if used outside of qemu and one threads allocates more memory than
> what fits in one node it won't tell enough info to the host kernel.
>
> About 3): if you've just one guest that fits in one node, each vcpu
> should be spread across all the nodes probably, and behave like
> MADV_INTERLEAVE if the guest CPU scheduler migrate guests processes in
> reverse, the global memory bandwidth will still be used full even if
> they will both access remote memory. I've just seen benchmarks where
> no pinning runs more than _twice_ as fast than pinning with just 1
> guest and only 10 vcpu threads, probably because of that.
I agree. Specifying NUMA topology for guest can result in
sub-optimal performance in some cases, it is a tradeoff.
> In short it's an incremental step that moves some logic to the kernel
> but I don't see it solving all situations optimally and it shares a
> lot of the limits of the hard bindings.
Agreed.
Thanks
Dipankar
next prev parent reply other threads:[~2011-12-01 17:25 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-10-29 18:45 [Qemu-devel] [RFC PATCH] Exporting Guest RAM information for NUMA binding Bharata B Rao
2011-10-29 19:57 ` Alexander Graf
2011-10-30 9:32 ` Vaidyanathan Srinivasan
2011-11-08 17:33 ` Chris Wright
2011-11-21 15:18 ` Bharata B Rao
2011-11-21 15:25 ` Peter Zijlstra
2011-11-21 16:00 ` Bharata B Rao
2011-11-21 17:03 ` Peter Zijlstra
2011-11-21 22:50 ` Chris Wright
2011-11-22 1:57 ` Anthony Liguori
2011-11-22 1:51 ` Anthony Liguori
2011-11-23 15:03 ` Andrea Arcangeli
2011-11-23 18:34 ` Alexander Graf
2011-11-23 20:19 ` Andrea Arcangeli
2011-11-30 16:22 ` Dipankar Sarma
2011-11-30 16:25 ` Peter Zijlstra
2011-11-30 16:33 ` Chris Wright
2011-11-30 17:41 ` Andrea Arcangeli
2011-12-01 17:25 ` Dipankar Sarma [this message]
2011-12-01 17:36 ` Andrea Arcangeli
2011-12-01 17:49 ` Dipankar Sarma
2011-12-01 17:40 ` Peter Zijlstra
2011-12-22 11:01 ` Marcelo Tosatti
2011-12-22 17:13 ` Anthony Liguori
2011-12-22 17:55 ` Marcelo Tosatti
2011-12-22 19:04 ` Peter Zijlstra
2011-12-22 11:24 ` Marcelo Tosatti
2011-11-21 18:03 ` Avi Kivity
2011-11-21 19:31 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111201172520.GA26737@in.ibm.com \
--to=dipankar@in.ibm.com \
--cc=a.p.zijlstra@chello.nl \
--cc=aarcange@redhat.com \
--cc=agraf@suse.de \
--cc=bharata@linux.vnet.ibm.com \
--cc=chrisw@sous-sol.org \
--cc=kvm@vger.kernel.org \
--cc=qemu-devel@nongnu.org \
--cc=svaidy@in.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).