From: Anthony Liguori <anthony@codemonkey.ws>
To: "André Przywara" <osp@andrep.de>
Cc: Avi Kivity <avi@redhat.com>, kvm@vger.kernel.org
Subject: Re: [PATCH 0/3] v2: KVM-userspace: add NUMA support for guests
Date: Mon, 08 Dec 2008 16:01:02 -0600 [thread overview]
Message-ID: <493D991E.1000008@codemonkey.ws> (raw)
In-Reply-To: <493D95A9.1090307@andrep.de>
André Przywara wrote:
> I was partly wrong, the code is in BOCHS CVS, but not in qemu. It wasn't
> in BOCHS 2.3.7 release, which qemu is currently based on. Could you pull
> the latest BIOS code from BOCHS CVS to qemu? This would give us the
> firmware interface for free and I could more easily port my patches.
Working on that right now. BOCHS CVS has diverged a fair bit from what
we have so I'm adjusting our current patches and doing regression testing.
> What's actually bothering you with libnuma dependency? I
> could directly use the Linux mbind syscall, but I think using a library
> is more sane (and probably more portable).
You're making a default policy decision (pin nodes and pin cpus). Your
assuming that Linux will do the wrong thing by default and that the
decision we'll be making is better.
That policy decision requires more validation. We need benchmarks
showing what the perf is like when not pinning vs pinning and we need to
understand whether the bad performance is a Linux bug that can be fixed
or whether it's something fundamental.
What I'm concerned about, is that it'll make the default situation
worse. I advocated punting to management tools because that at least
gives the user the ability to make their own decisions which means you
don't have to prove that this is the correct default decision.
I don't care about a libnuma dependency. Library dependencies are fine
as long as they're optional.
>>> Almost right, but simply calling qemu-system-x86_64 can lead to bad
>>> situations. I lately saw that VCPU #0 was scheduled on one node and
>>> VCPU #1 on another. This leads to random (probably excessive) remote
>>> accesses from the VCPUs, since the guest assumes uniform memory
>> That seems like Linux is behaving badly, no? Can you describe the
>> situation more?
> That is just my observation. I have to do more research to get a decent
> explanation, but I think the problem is that in this early state the
> threads barely touch any memory, so Linux tries to distribute them as
> best as possible. Just a quick run on a quad node machine with 16 cores
> in total:
How does memory migration fit into all of this though? Statistically
speaking, if your NUMA guest is behaving well, it should be easy to
recognize the groupings and perform the appropriate page migration. I
would think even the most naive page migration tool would be able to do
the right thing.
>> NUMA systems are expensive. If a customer cares about performance
>> (as opposed to just getting more memory), then I think tools like
>> numactl are pretty well known.
> Well, expensive depends, especially if I think of your employer ;-) In
> fact every AMD dual socket server is NUMA, and Intel will join the
> game next year.
But the NUMA characteristics on an AMD system are relatively minor. I
doubt that doing static pinning would be what most users wanted since it
could reduce overall system performance noticably.
Even with more traditional NUMA systems, the cost of remote memory
access is often lost by the opportunity cost of leaving a CPU idle.
That's what pinning does, it leaves CPUs potentially idle.
> Additionally one could use some kind of home node, so one temporarily
> could change the VCPUs affinity and later return to the optimal
> affinity (where the memory is located) without specifying it again.
Please resubmit with the first three patches in the front. I don't
think exposing NUMA attributes to a guest is at all controversial so
that's relatively easy to apply.
I'm not saying that the last patch can't be applied, but I don't think
it's as obvious that it's going to be a win when you start doing
performance tests.
Regards,
Anthony Liguori
> Comments are welcome.
>
> Regards,
> Andre.
>
next prev parent reply other threads:[~2008-12-08 22:01 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-05 13:29 [PATCH 0/3] v2: KVM-userspace: add NUMA support for guests Andre Przywara
2008-12-05 14:28 ` Anthony Liguori
2008-12-05 15:22 ` Andre Przywara
2008-12-05 15:41 ` Anthony Liguori
2008-12-08 21:46 ` André Przywara
2008-12-08 22:01 ` Anthony Liguori [this message]
2008-12-09 14:24 ` Avi Kivity
2008-12-09 14:55 ` Anthony Liguori
2008-12-05 15:27 ` Avi Kivity
2008-12-05 15:34 ` Anthony Liguori
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=493D991E.1000008@codemonkey.ws \
--to=anthony@codemonkey.ws \
--cc=avi@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=osp@andrep.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).