All of lore.kernel.org
 help / color / mirror / Atom feed
From: Avi Kivity <avi@redhat.com>
To: Paul Mackerras <paulus@samba.org>
Cc: Alexander Graf <agraf@suse.de>,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH] KVM: PPC: Book3S HV: Make the guest MMU hash table size configurable
Date: Mon, 30 Apr 2012 08:31:42 +0000	[thread overview]
Message-ID: <4F9E4DEE.60104@redhat.com> (raw)
In-Reply-To: <20120430044014.GA5428@drongo>

On 04/30/2012 07:40 AM, Paul Mackerras wrote:
> On Sun, Apr 29, 2012 at 04:37:33PM +0300, Avi Kivity wrote:
>
> > How difficult is it to have the kernel resize the HPT on demand?
>
> Quite difficult, unfortunately.  The guest kernel knows the size of
> the HPT, and the paravirt interface for updating it relies on the
> guest knowing it, since it is used in the hash function (the computed
> hash is taken modulo the HPT size).
>
> And even if it were possible to notify the guest that the size was
> changing, since it is a hash table, changing the size requires
> traversing the table to move hash entries to their new locations.
> When reducing the size one only has to traverse the part that is going
> away, but even that will be at least half of the table since the size
> is always a power of 2.

I'm no x86 fan but I'm glad we have nothing like that over there.

>
> >  Guest
> > size is meaningless in the presence of memory hotplug, and having
> > unprivileged userspace pin down large amounts of kernel memory us
> > undesirable.
>
> I agree.  The HPT is certainly not ideal.  However, it's what we have
> to deal with on POWER hardware.
>
> One idea I had is to reserve some contiguous physical memory at boot
> time, say a couple of percent of system memory, and use that as a pool
> to allocate HPTs from.  That would limit the impact on the rest of the
> system and also make it more likely that we can find the necessary
> amount of physically contiguous memory.

Doesn't that limit the number of guests that can run?

> > On x86 we grow and shrink the mmu resources in response to guest demand
> > and host memory pressure.  We can do this because the data structures
> > are not authoritative (don't know it that's the case for ppc) and
> > because they can be grown incrementally (pretty sure that isn't the case
> > on ppc).  Still, if we can do this at KVM_SET_USER_MEMORY_REGION time
> > instead of a separate ioctl, I think it's better.
>
> It's not practical to grow the HPT after the guest has started
> booting.  It is possible to have two HPTs: one that the guest sees,
> which can be in pageable memory, and another shadow HPT that the
> hardware uses, which has to be in physically contiguous memory.  In
> this model the size of the shadow HPT can be changed at will, at the
> expense of having to reestablish the entries in it, though that can be
> done on demand.  I have avoided that approach until now because it
> uses more memory and is slower than just having a single HPT.

This is similar to x86 in the pre npt/ept days, it's indeed slow.  I
guess we'll be stuck with the pv hash until you get nested lookups (at
least a nested hash lookup is just 3 accesses instead of 24).

How are limits managed?  Won't a user creating a thousand guests with a
16MB hash each bring a server to its knees?

-- 
error compiling committee.c: too many arguments to function


WARNING: multiple messages have this Message-ID (diff)
From: Avi Kivity <avi@redhat.com>
To: Paul Mackerras <paulus@samba.org>
Cc: Alexander Graf <agraf@suse.de>,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH] KVM: PPC: Book3S HV: Make the guest MMU hash table size configurable
Date: Mon, 30 Apr 2012 11:31:42 +0300	[thread overview]
Message-ID: <4F9E4DEE.60104@redhat.com> (raw)
In-Reply-To: <20120430044014.GA5428@drongo>

On 04/30/2012 07:40 AM, Paul Mackerras wrote:
> On Sun, Apr 29, 2012 at 04:37:33PM +0300, Avi Kivity wrote:
>
> > How difficult is it to have the kernel resize the HPT on demand?
>
> Quite difficult, unfortunately.  The guest kernel knows the size of
> the HPT, and the paravirt interface for updating it relies on the
> guest knowing it, since it is used in the hash function (the computed
> hash is taken modulo the HPT size).
>
> And even if it were possible to notify the guest that the size was
> changing, since it is a hash table, changing the size requires
> traversing the table to move hash entries to their new locations.
> When reducing the size one only has to traverse the part that is going
> away, but even that will be at least half of the table since the size
> is always a power of 2.

I'm no x86 fan but I'm glad we have nothing like that over there.

>
> >  Guest
> > size is meaningless in the presence of memory hotplug, and having
> > unprivileged userspace pin down large amounts of kernel memory us
> > undesirable.
>
> I agree.  The HPT is certainly not ideal.  However, it's what we have
> to deal with on POWER hardware.
>
> One idea I had is to reserve some contiguous physical memory at boot
> time, say a couple of percent of system memory, and use that as a pool
> to allocate HPTs from.  That would limit the impact on the rest of the
> system and also make it more likely that we can find the necessary
> amount of physically contiguous memory.

Doesn't that limit the number of guests that can run?

> > On x86 we grow and shrink the mmu resources in response to guest demand
> > and host memory pressure.  We can do this because the data structures
> > are not authoritative (don't know it that's the case for ppc) and
> > because they can be grown incrementally (pretty sure that isn't the case
> > on ppc).  Still, if we can do this at KVM_SET_USER_MEMORY_REGION time
> > instead of a separate ioctl, I think it's better.
>
> It's not practical to grow the HPT after the guest has started
> booting.  It is possible to have two HPTs: one that the guest sees,
> which can be in pageable memory, and another shadow HPT that the
> hardware uses, which has to be in physically contiguous memory.  In
> this model the size of the shadow HPT can be changed at will, at the
> expense of having to reestablish the entries in it, though that can be
> done on demand.  I have avoided that approach until now because it
> uses more memory and is slower than just having a single HPT.

This is similar to x86 in the pre npt/ept days, it's indeed slow.  I
guess we'll be stuck with the pv hash until you get nested lookups (at
least a nested hash lookup is just 3 accesses instead of 24).

How are limits managed?  Won't a user creating a thousand guests with a
16MB hash each bring a server to its knees?

-- 
error compiling committee.c: too many arguments to function

  reply	other threads:[~2012-04-30  8:31 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-27  3:55 [PATCH] KVM: PPC: Book3S HV: Make the guest MMU hash table size configurable Paul Mackerras
2012-04-27  3:55 ` Paul Mackerras
2012-04-29 13:37 ` Avi Kivity
2012-04-29 13:37   ` Avi Kivity
2012-04-30  4:40   ` Paul Mackerras
2012-04-30  4:40     ` Paul Mackerras
2012-04-30  8:31     ` Avi Kivity [this message]
2012-04-30  8:31       ` Avi Kivity
2012-04-30 11:54       ` Paul Mackerras
2012-04-30 11:54         ` Paul Mackerras
2012-04-30 13:34         ` Avi Kivity
2012-04-30 13:34           ` Avi Kivity
2012-05-01 21:49           ` Paul Mackerras
2012-05-01 21:49             ` Paul Mackerras
2012-05-02 12:52 ` Alexander Graf
2012-05-02 12:52   ` Alexander Graf
2012-05-02 23:49   ` Paul Mackerras
2012-05-02 23:49     ` Paul Mackerras

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F9E4DEE.60104@redhat.com \
    --to=avi@redhat.com \
    --cc=agraf@suse.de \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.