Re: [PATCH 0/3] KVM-userspace: add NUMA support for guests

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Andi Kleen <andi@firstfloor.org>
To: Avi Kivity <avi@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>,
	Andre Przywara <andre.przywara@amd.com>,
	kvm@vger.kernel.org
Subject: Re: [PATCH 0/3] KVM-userspace: add NUMA support for guests
Date: Sun, 30 Nov 2008 18:04:14 +0100	[thread overview]
Message-ID: <20081130170414.GX6703@one.firstfloor.org> (raw)
In-Reply-To: <4932C176.7020102@redhat.com>

On Sun, Nov 30, 2008 at 06:38:14PM +0200, Avi Kivity wrote:
> The guest allocates when it touches the page for the first time.  This 
> means very little since all of memory may be touched during guest bootup 
> or shortly afterwards.  Even if not, it is still a one-time operation, 
> and any choices we make based on it will last the lifetime of the guest.

I was more thinking about some heuristics that checks when a page
is first mapped into user space. The only problem is that it is zeroed
through the direct mapping before, but perhaps there is a way around it. 
That's one of the rare cases when 32bit highmem actually makes things easier.
It might be also easier on some other OS than Linux who don't use
direct mapping that aggressively.
> 
> >This is roughly equivalent of getting a fresh new demand fault page,
> >but doesn't require to unmap/free/remap.
> >  
> 
> Lost again, sorry.

free/unmap/remap gives you normally local memory. I tend to call
it poor man's NUMA policy API.

The alternative is to keep your own pools and allocate from the
correct pool, but then you either need pinning or getcpu()

> 
> >The tricky bit is probably figuring out what is a fresh new page for
> >the guest. That might need some paravirtual help.
> >  
> 
> The guest typically recycles its own pages (exception is ballooning).  
> Also it doesn't make sense to manage this on a per page basis as the 
> guest won't do that. 

> We need to mimic real hardware.

The underlying allocation is in pages, so the NUMA affinity can 
be as well handled by this. 

Basic algorithm:
- If guest touches virtual node that is the same as the local node
of the current vcpu assume it's a local allocation.
- On allocation get the underlying page from the correct underlying
node based on a dynamic getcpu relationship.
- Find some way to get rid of unused pages. e.g. keep track of 
the number of mappings to a page and age or use pv help.

> The static case is simple.  We allocate memory from a few nodes (for 
> small guests, only one) and establish a guest_node -> host_node 
> mapping.  vcpus on guest node X are constrained to host node according 
> to this mapping.
> 
> The dynamic case is really complicated.  We can allow vcpus to wander to 
> other cpus on cpu overcommit, but need to pull them back soonish, or 
> alternatively migrate the entire node, taking into account the cost of 
> the migration, cpu availability on the target node, and memory 
> availability on the target node.  Since the cost is so huge, this needs 
> to be done on a very coarse scale.

I wrote a scheduler that did that on 2.4 (it was called homenode scheduling),
but it never worked well on small systems. It was moderately successfull on
some big NUMA boxes though. The fundamental problem is that not using
a CPU is always worse than using remote memory on the small systems.

Always migrating memory on CPU migration is also too costly in the general
case, but it might be possible to make it work in the special case 
of vCPU guests with some tweaks.

-Andi

-- 
ak@linux.intel.com

next prev parent reply	other threads:[~2008-11-30 16:53 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-27 22:23 [PATCH 0/3] KVM-userspace: add NUMA support for guests Andre Przywara
2008-11-28  8:14 ` Andi Kleen
2008-11-29 18:43   ` Avi Kivity
2008-11-29 20:10     ` Andi Kleen
2008-11-29 20:35       ` Avi Kivity
2008-11-30 15:41         ` Andi Kleen
2008-11-30 15:38           ` Avi Kivity
2008-11-30 16:05             ` Andi Kleen
2008-11-30 16:38               ` Avi Kivity
2008-11-30 17:04                 ` Andi Kleen [this message]
2008-11-30 17:11                   ` Avi Kivity
2008-11-30 17:42                     ` Andi Kleen
2008-11-30 18:07                       ` Avi Kivity
2008-11-30 18:55                         ` Andi Kleen
2008-11-30 19:11                           ` Skywing
2008-11-30 20:08                             ` Avi Kivity
2008-11-30 20:07                           ` Avi Kivity
2008-11-30 21:41                             ` Andi Kleen
2008-11-30 21:50                               ` Avi Kivity
2008-11-30 22:08                                 ` Skywing
2008-11-28 10:40 ` Daniel P. Berrange
2008-11-29 18:29 ` Avi Kivity
2008-12-01 14:15   ` Andre Przywara
2008-12-01 14:29     ` Avi Kivity
2008-12-01 15:27       ` Anthony Liguori
2008-12-01 15:34         ` Anthony Liguori
2008-12-01 15:37         ` Avi Kivity
2008-12-01 15:49           ` Anthony Liguori
2008-12-01 14:44     ` Daniel P. Berrange
2008-12-01 14:53       ` Avi Kivity
2008-12-01 15:18 ` Anthony Liguori
2008-12-01 15:35   ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081130170414.GX6703@one.firstfloor.org \
    --to=andi@firstfloor.org \
    --cc=andre.przywara@amd.com \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox