kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Anthony Liguori <anthony@codemonkey.ws>
To: "André Przywara" <osp@andrep.de>
Cc: Avi Kivity <avi@redhat.com>, kvm@vger.kernel.org
Subject: Re: [PATCH 0/3] v2: KVM-userspace: add NUMA support for guests
Date: Mon, 08 Dec 2008 16:01:02 -0600	[thread overview]
Message-ID: <493D991E.1000008@codemonkey.ws> (raw)
In-Reply-To: <493D95A9.1090307@andrep.de>

André Przywara wrote:
> I was partly wrong, the code is in BOCHS CVS, but not in qemu. It wasn't
> in BOCHS 2.3.7 release, which qemu is currently based on. Could you pull
> the latest BIOS code from BOCHS CVS to qemu? This would give us the
> firmware interface for free and I could more easily port my patches.

Working on that right now.  BOCHS CVS has diverged a fair bit from what 
we have so I'm adjusting our current patches and doing regression testing.

> What's actually bothering you with libnuma dependency? I
> could directly use the Linux mbind syscall, but I think using a library
> is more sane (and probably more portable).

You're making a default policy decision (pin nodes and pin cpus).  Your 
assuming that Linux will do the wrong thing by default and that the 
decision we'll be making is better.

That policy decision requires more validation.  We need benchmarks 
showing what the perf is like when not pinning vs pinning and we need to 
understand whether the bad performance is a Linux bug that can be fixed 
or whether it's something fundamental.

What I'm concerned about, is that it'll make the default situation 
worse.  I advocated punting to management tools because that at least 
gives the user the ability to make their own decisions which means you 
don't have to prove that this is the correct default decision.

I don't care about a libnuma dependency.  Library dependencies are fine 
as long as they're optional.

>>> Almost right, but simply calling qemu-system-x86_64 can lead to bad 
>>> situations. I lately saw that VCPU #0 was scheduled on one node and 
>>> VCPU #1 on another. This leads to random (probably excessive) remote 
>>> accesses from the VCPUs, since the guest assumes uniform memory
>> That seems like Linux is behaving badly, no?  Can you describe the 
>> situation more?
> That is just my observation. I have to do more research to get a decent
> explanation, but I think the problem is that in this early state the
> threads barely touch any memory, so Linux tries to distribute them as
> best as possible. Just a quick run on a quad node machine with 16 cores
> in total:

How does memory migration fit into all of this though?  Statistically 
speaking, if your NUMA guest is behaving well, it should be easy to 
recognize the groupings and perform the appropriate page migration.  I 
would think even the most naive page migration tool would be able to do 
the right thing.

>> NUMA systems are expensive.  If a customer cares about performance 
>> (as opposed to just getting more memory), then I think tools like 
>> numactl are pretty well known.
> Well, expensive depends, especially if I think of your employer ;-) In
> fact every AMD dual socket server is NUMA, and Intel will join the 
> game next year.

But the NUMA characteristics on an AMD system are relatively minor.  I 
doubt that doing static pinning would be what most users wanted since it 
could reduce overall system performance noticably.

Even with more traditional NUMA systems, the cost of remote memory 
access is often lost by the opportunity cost of leaving a CPU idle.  
That's what pinning does, it leaves CPUs potentially idle.

> Additionally one could use some kind of home node, so one temporarily 
> could change the VCPUs affinity and later return to the optimal 
> affinity (where the memory is located) without specifying it again.

Please resubmit with the first three patches in the front.  I don't 
think exposing NUMA attributes to a guest is at all controversial so 
that's relatively easy to apply.

I'm not saying that the last patch can't be applied, but I don't think 
it's as obvious that it's going to be a win when you start doing 
performance tests.

Regards,

Anthony Liguori

> Comments are welcome.
>
> Regards,
> Andre.
>


  reply	other threads:[~2008-12-08 22:01 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-05 13:29 [PATCH 0/3] v2: KVM-userspace: add NUMA support for guests Andre Przywara
2008-12-05 14:28 ` Anthony Liguori
2008-12-05 15:22   ` Andre Przywara
2008-12-05 15:41     ` Anthony Liguori
2008-12-08 21:46       ` André Przywara
2008-12-08 22:01         ` Anthony Liguori [this message]
2008-12-09 14:24         ` Avi Kivity
2008-12-09 14:55           ` Anthony Liguori
2008-12-05 15:27   ` Avi Kivity
2008-12-05 15:34     ` Anthony Liguori

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=493D991E.1000008@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=osp@andrep.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).