[PATCH v7 22/23] KVM: x86/mmu: Extend Eager Page Splitting to nested MMUs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: kvm-riscv@lists.infradead.org
Subject: [PATCH v7 22/23] KVM: x86/mmu: Extend Eager Page Splitting to nested MMUs
Date: Thu, 23 Jun 2022 19:48:02 +0000	[thread overview]
Message-ID: <YrTDcrsn0/+alpzf@google.com> (raw)
In-Reply-To: <CALzav=fH_9_LKVE0_UCftwy2KZaB3nSBoWU07aPWALag4_mcHQ@mail.gmail.com>

On Thu, Jun 23, 2022, David Matlack wrote:
> On Wed, Jun 22, 2022 at 12:27 PM Paolo Bonzini <pbonzini@redhat.com> wrote:

Please trim replies.

> > +static int topup_split_caches(struct kvm *kvm)
> > +{
> > +       int r;
> > +
> > +       lockdep_assert_held(&kvm->slots_lock);
> > +
> > +       /*
> > +        * It's common to need all SPLIT_DESC_CACHE_MIN_NR_OBJECTS (513) objects
> > +        * when splitting a page, but setting capacity == min would cause
> > +        * KVM to drop mmu_lock even if just one object was consumed from the
> > +        * cache.  So make capacity larger than min and handle two huge pages
> > +        * without having to drop the lock.
> 
> I was going to do some testing this week to confirm, but IIUC KVM will
> only allocate from split_desc_cache if the L1 hypervisor has aliased a
> huge page in multiple {E,N}PT12 page table entries. i.e. L1 is mapping
> a huge page into an L2 multiple times, or mapped into multiple L2s.
> This should be common in traditional, process-level, shadow paging,
> but I think will be quite rare for nested shadow paging.

Ooooh, right, I forgot that that pte_list_add() needs to allocate if and only if
there are multiple rmap entries, otherwise rmap->val points that the one and only
rmap directly.

Doubling the capacity is all but guaranteed to be pointless overhead.  What about
buffering with the default capacity?  That way KVM doesn't have to topup if it
happens to encounter an aliased gfn.  It's arbitrary, but so is the default capacity
size.

E.g. as fixup

---
 arch/x86/kvm/mmu/mmu.c | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 22b87007efff..90d6195edcf3 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6125,19 +6125,23 @@ static bool need_topup_split_caches_or_resched(struct kvm *kvm)

 static int topup_split_caches(struct kvm *kvm)
 {
-	int r;
-
-	lockdep_assert_held(&kvm->slots_lock);
-
 	/*
-	 * It's common to need all SPLIT_DESC_CACHE_MIN_NR_OBJECTS (513) objects
-	 * when splitting a page, but setting capacity == min would cause
-	 * KVM to drop mmu_lock even if just one object was consumed from the
-	 * cache.  So make capacity larger than min and handle two huge pages
-	 * without having to drop the lock.
+	 * Allocating rmap list entries when splitting huge pages for nested
+	 * MMUs is rare as KVM needs to allocate if and only if there is more
+	 * than one rmap entry for the gfn, i.e. requires an L1 gfn to be
+	 * aliased by multiple L2 gfns, which is very atypical for VMMs.  If
+	 * there is only one rmap entry, rmap->val points directly at that one
+	 * entry and doesn't need to allocate a list.  Buffer the cache by the
+	 * default capacity so that KVM doesn't have to topup the cache if it
+	 * encounters an aliased gfn or two.
 	 */
-	r = __kvm_mmu_topup_memory_cache(&kvm->arch.split_desc_cache,
-					 2 * SPLIT_DESC_CACHE_MIN_NR_OBJECTS,
+	const int capacity = SPLIT_DESC_CACHE_MIN_NR_OBJECTS +
+			     KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE;
+	int r;
+
+	lockdep_assert_held(&kvm->slots_lock);
+
+	r = __kvm_mmu_topup_memory_cache(&kvm->arch.split_desc_cache, capacity,
 					 SPLIT_DESC_CACHE_MIN_NR_OBJECTS);
 	if (r)
 		return r;

base-commit: 436b1c29f36ed3d4385058ba6f0d6266dbd2a882
--

WARNING: multiple messages have this Message-ID (diff)

From: Sean Christopherson <seanjc@google.com>
To: David Matlack <dmatlack@google.com>
Cc: Marc Zyngier <maz@kernel.org>, kvm list <kvm@vger.kernel.org>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	LKML <linux-kernel@vger.kernel.org>,
	LinuxMIPS <linux-mips@vger.kernel.org>,
	"open list:KERNEL VIRTUAL MACHINE FOR RISC-V \(KVM/riscv\)"
	<kvm-riscv@lists.infradead.org>, Ben Gardon <bgardon@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>,
	KVMARM <kvmarm@lists.cs.columbia.edu>,
	Peter Feiner <pfeiner@google.com>
Subject: Re: [PATCH v7 22/23] KVM: x86/mmu: Extend Eager Page Splitting to nested MMUs
Date: Thu, 23 Jun 2022 19:48:02 +0000	[thread overview]
Message-ID: <YrTDcrsn0/+alpzf@google.com> (raw)
In-Reply-To: <CALzav=fH_9_LKVE0_UCftwy2KZaB3nSBoWU07aPWALag4_mcHQ@mail.gmail.com>

On Thu, Jun 23, 2022, David Matlack wrote:
> On Wed, Jun 22, 2022 at 12:27 PM Paolo Bonzini <pbonzini@redhat.com> wrote:

Please trim replies.

> > +static int topup_split_caches(struct kvm *kvm)
> > +{
> > +       int r;
> > +
> > +       lockdep_assert_held(&kvm->slots_lock);
> > +
> > +       /*
> > +        * It's common to need all SPLIT_DESC_CACHE_MIN_NR_OBJECTS (513) objects
> > +        * when splitting a page, but setting capacity == min would cause
> > +        * KVM to drop mmu_lock even if just one object was consumed from the
> > +        * cache.  So make capacity larger than min and handle two huge pages
> > +        * without having to drop the lock.
> 
> I was going to do some testing this week to confirm, but IIUC KVM will
> only allocate from split_desc_cache if the L1 hypervisor has aliased a
> huge page in multiple {E,N}PT12 page table entries. i.e. L1 is mapping
> a huge page into an L2 multiple times, or mapped into multiple L2s.
> This should be common in traditional, process-level, shadow paging,
> but I think will be quite rare for nested shadow paging.

Ooooh, right, I forgot that that pte_list_add() needs to allocate if and only if
there are multiple rmap entries, otherwise rmap->val points that the one and only
rmap directly.

Doubling the capacity is all but guaranteed to be pointless overhead.  What about
buffering with the default capacity?  That way KVM doesn't have to topup if it
happens to encounter an aliased gfn.  It's arbitrary, but so is the default capacity
size.

E.g. as fixup

---
 arch/x86/kvm/mmu/mmu.c | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 22b87007efff..90d6195edcf3 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6125,19 +6125,23 @@ static bool need_topup_split_caches_or_resched(struct kvm *kvm)

 static int topup_split_caches(struct kvm *kvm)
 {
-	int r;
-
-	lockdep_assert_held(&kvm->slots_lock);
-
 	/*
-	 * It's common to need all SPLIT_DESC_CACHE_MIN_NR_OBJECTS (513) objects
-	 * when splitting a page, but setting capacity == min would cause
-	 * KVM to drop mmu_lock even if just one object was consumed from the
-	 * cache.  So make capacity larger than min and handle two huge pages
-	 * without having to drop the lock.
+	 * Allocating rmap list entries when splitting huge pages for nested
+	 * MMUs is rare as KVM needs to allocate if and only if there is more
+	 * than one rmap entry for the gfn, i.e. requires an L1 gfn to be
+	 * aliased by multiple L2 gfns, which is very atypical for VMMs.  If
+	 * there is only one rmap entry, rmap->val points directly at that one
+	 * entry and doesn't need to allocate a list.  Buffer the cache by the
+	 * default capacity so that KVM doesn't have to topup the cache if it
+	 * encounters an aliased gfn or two.
 	 */
-	r = __kvm_mmu_topup_memory_cache(&kvm->arch.split_desc_cache,
-					 2 * SPLIT_DESC_CACHE_MIN_NR_OBJECTS,
+	const int capacity = SPLIT_DESC_CACHE_MIN_NR_OBJECTS +
+			     KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE;
+	int r;
+
+	lockdep_assert_held(&kvm->slots_lock);
+
+	r = __kvm_mmu_topup_memory_cache(&kvm->arch.split_desc_cache, capacity,
 					 SPLIT_DESC_CACHE_MIN_NR_OBJECTS);
 	if (r)
 		return r;

base-commit: 436b1c29f36ed3d4385058ba6f0d6266dbd2a882
--

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

WARNING: multiple messages have this Message-ID (diff)

From: Sean Christopherson <seanjc@google.com>
To: David Matlack <dmatlack@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	kvm list <kvm@vger.kernel.org>, Marc Zyngier <maz@kernel.org>,
	Anup Patel <anup@brainfault.org>, Ben Gardon <bgardon@google.com>,
	Peter Xu <peterx@redhat.com>,
	"Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>,
	KVMARM <kvmarm@lists.cs.columbia.edu>,
	LinuxMIPS <linux-mips@vger.kernel.org>,
	"open list:KERNEL VIRTUAL MACHINE FOR RISC-V (KVM/riscv)" 
	<kvm-riscv@lists.infradead.org>,
	Peter Feiner <pfeiner@google.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>
Subject: Re: [PATCH v7 22/23] KVM: x86/mmu: Extend Eager Page Splitting to nested MMUs
Date: Thu, 23 Jun 2022 19:48:02 +0000	[thread overview]
Message-ID: <YrTDcrsn0/+alpzf@google.com> (raw)
In-Reply-To: <CALzav=fH_9_LKVE0_UCftwy2KZaB3nSBoWU07aPWALag4_mcHQ@mail.gmail.com>

On Thu, Jun 23, 2022, David Matlack wrote:
> On Wed, Jun 22, 2022 at 12:27 PM Paolo Bonzini <pbonzini@redhat.com> wrote:

Please trim replies.

> > +static int topup_split_caches(struct kvm *kvm)
> > +{
> > +       int r;
> > +
> > +       lockdep_assert_held(&kvm->slots_lock);
> > +
> > +       /*
> > +        * It's common to need all SPLIT_DESC_CACHE_MIN_NR_OBJECTS (513) objects
> > +        * when splitting a page, but setting capacity == min would cause
> > +        * KVM to drop mmu_lock even if just one object was consumed from the
> > +        * cache.  So make capacity larger than min and handle two huge pages
> > +        * without having to drop the lock.
> 
> I was going to do some testing this week to confirm, but IIUC KVM will
> only allocate from split_desc_cache if the L1 hypervisor has aliased a
> huge page in multiple {E,N}PT12 page table entries. i.e. L1 is mapping
> a huge page into an L2 multiple times, or mapped into multiple L2s.
> This should be common in traditional, process-level, shadow paging,
> but I think will be quite rare for nested shadow paging.

Ooooh, right, I forgot that that pte_list_add() needs to allocate if and only if
there are multiple rmap entries, otherwise rmap->val points that the one and only
rmap directly.

Doubling the capacity is all but guaranteed to be pointless overhead.  What about
buffering with the default capacity?  That way KVM doesn't have to topup if it
happens to encounter an aliased gfn.  It's arbitrary, but so is the default capacity
size.

E.g. as fixup

---
 arch/x86/kvm/mmu/mmu.c | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 22b87007efff..90d6195edcf3 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6125,19 +6125,23 @@ static bool need_topup_split_caches_or_resched(struct kvm *kvm)

 static int topup_split_caches(struct kvm *kvm)
 {
-	int r;
-
-	lockdep_assert_held(&kvm->slots_lock);
-
 	/*
-	 * It's common to need all SPLIT_DESC_CACHE_MIN_NR_OBJECTS (513) objects
-	 * when splitting a page, but setting capacity == min would cause
-	 * KVM to drop mmu_lock even if just one object was consumed from the
-	 * cache.  So make capacity larger than min and handle two huge pages
-	 * without having to drop the lock.
+	 * Allocating rmap list entries when splitting huge pages for nested
+	 * MMUs is rare as KVM needs to allocate if and only if there is more
+	 * than one rmap entry for the gfn, i.e. requires an L1 gfn to be
+	 * aliased by multiple L2 gfns, which is very atypical for VMMs.  If
+	 * there is only one rmap entry, rmap->val points directly at that one
+	 * entry and doesn't need to allocate a list.  Buffer the cache by the
+	 * default capacity so that KVM doesn't have to topup the cache if it
+	 * encounters an aliased gfn or two.
 	 */
-	r = __kvm_mmu_topup_memory_cache(&kvm->arch.split_desc_cache,
-					 2 * SPLIT_DESC_CACHE_MIN_NR_OBJECTS,
+	const int capacity = SPLIT_DESC_CACHE_MIN_NR_OBJECTS +
+			     KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE;
+	int r;
+
+	lockdep_assert_held(&kvm->slots_lock);
+
+	r = __kvm_mmu_topup_memory_cache(&kvm->arch.split_desc_cache, capacity,
 					 SPLIT_DESC_CACHE_MIN_NR_OBJECTS);
 	if (r)
 		return r;

base-commit: 436b1c29f36ed3d4385058ba6f0d6266dbd2a882
--

next prev parent reply	other threads:[~2022-06-23 19:48 UTC|newest]

Thread overview: 93+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-22 19:26 [PATCH v7 00/23] KVM: Extend Eager Page Splitting to the shadow MMU Paolo Bonzini
2022-06-22 19:26 ` Paolo Bonzini
2022-06-22 19:26 ` Paolo Bonzini
2022-06-22 19:26 ` [PATCH v7 01/23] KVM: x86/mmu: Optimize MMU page cache lookup for all direct SPs Paolo Bonzini
2022-06-22 19:26   ` Paolo Bonzini
2022-06-22 19:26   ` Paolo Bonzini
2022-06-22 19:26 ` [PATCH v7 02/23] KVM: x86/mmu: Use a bool for direct Paolo Bonzini
2022-06-22 19:26   ` Paolo Bonzini
2022-06-22 19:26   ` Paolo Bonzini
2022-06-22 19:26 ` [PATCH v7 03/23] KVM: x86/mmu: Stop passing "direct" to mmu_alloc_root() Paolo Bonzini
2022-06-22 19:26   ` Paolo Bonzini
2022-06-22 19:26   ` Paolo Bonzini
2022-06-22 19:26 ` [PATCH v7 04/23] KVM: x86/mmu: Derive shadow MMU page role from parent Paolo Bonzini
2022-06-22 19:26   ` Paolo Bonzini
2022-06-22 19:26   ` Paolo Bonzini
2022-06-22 19:26 ` [PATCH v7 05/23] KVM: x86/mmu: Always pass 0 for @quadrant when gptes are 8 bytes Paolo Bonzini
2022-06-22 19:26   ` Paolo Bonzini
2022-06-22 19:26   ` Paolo Bonzini
2022-06-22 19:26 ` [PATCH v7 06/23] KVM: x86/mmu: Decompose kvm_mmu_get_page() into separate functions Paolo Bonzini
2022-06-22 19:26   ` Paolo Bonzini
2022-06-22 19:26   ` Paolo Bonzini
2022-06-22 19:26 ` [PATCH v7 07/23] KVM: x86/mmu: Consolidate shadow page allocation and initialization Paolo Bonzini
2022-06-22 19:26   ` Paolo Bonzini
2022-06-22 19:26   ` Paolo Bonzini
2022-06-22 19:26 ` [PATCH v7 08/23] KVM: x86/mmu: Rename shadow MMU functions that deal with shadow pages Paolo Bonzini
2022-06-22 19:26   ` Paolo Bonzini
2022-06-22 19:26   ` Paolo Bonzini
2022-06-22 19:26 ` [PATCH v7 09/23] KVM: x86/mmu: Move guest PT write-protection to account_shadowed() Paolo Bonzini
2022-06-22 19:26   ` Paolo Bonzini
2022-06-22 19:26   ` Paolo Bonzini
2022-06-22 19:26 ` [PATCH v7 10/23] KVM: x86/mmu: Pass memory caches to allocate SPs separately Paolo Bonzini
2022-06-22 19:26   ` Paolo Bonzini
2022-06-22 19:26   ` Paolo Bonzini
2022-06-22 19:26 ` [PATCH v7 11/23] KVM: x86/mmu: Replace vcpu with kvm in kvm_mmu_alloc_shadow_page() Paolo Bonzini
2022-06-22 19:26   ` Paolo Bonzini
2022-06-22 19:26   ` Paolo Bonzini
2022-06-22 19:26 ` [PATCH v7 12/23] KVM: x86/mmu: Pass kvm pointer separately from vcpu to kvm_mmu_find_shadow_page() Paolo Bonzini
2022-06-22 19:26   ` Paolo Bonzini
2022-06-22 19:26   ` Paolo Bonzini
2022-06-22 19:27 ` [PATCH v7 13/23] KVM: x86/mmu: Allow NULL @vcpu in kvm_mmu_find_shadow_page() Paolo Bonzini
2022-06-22 19:27   ` Paolo Bonzini
2022-06-22 19:27   ` Paolo Bonzini
2022-06-22 19:27 ` [PATCH v7 14/23] KVM: x86/mmu: Pass const memslot to rmap_add() Paolo Bonzini
2022-06-22 19:27   ` Paolo Bonzini
2022-06-22 19:27   ` Paolo Bonzini
2022-06-22 19:27 ` [PATCH v7 15/23] KVM: x86/mmu: Decouple rmap_add() and link_shadow_page() from kvm_vcpu Paolo Bonzini
2022-06-22 19:27   ` Paolo Bonzini
2022-06-22 19:27   ` Paolo Bonzini
2022-06-22 19:27 ` [PATCH v7 16/23] KVM: x86/mmu: Update page stats in __rmap_add() Paolo Bonzini
2022-06-22 19:27   ` Paolo Bonzini
2022-06-22 19:27   ` Paolo Bonzini
2022-06-22 19:27 ` [PATCH v7 17/23] KVM: x86/mmu: Cache the access bits of shadowed translations Paolo Bonzini
2022-06-22 19:27   ` Paolo Bonzini
2022-06-22 19:27   ` Paolo Bonzini
2022-06-22 19:27 ` [PATCH v7 18/23] KVM: x86/mmu: Extend make_huge_page_split_spte() for the shadow MMU Paolo Bonzini
2022-06-22 19:27   ` Paolo Bonzini
2022-06-22 19:27   ` Paolo Bonzini
2022-06-22 19:27 ` [PATCH v7 19/23] KVM: x86/mmu: Zap collapsible SPTEs in shadow MMU at all possible levels Paolo Bonzini
2022-06-22 19:27   ` Paolo Bonzini
2022-06-22 19:27   ` Paolo Bonzini
2022-06-23 23:53   ` Sean Christopherson
2022-06-23 23:53     ` Sean Christopherson
2022-06-23 23:53     ` Sean Christopherson
2022-06-22 19:27 ` [PATCH v7 20/23] KVM: x86/mmu: pull call to drop_large_spte() into __link_shadow_page() Paolo Bonzini
2022-06-22 19:27   ` Paolo Bonzini
2022-06-22 19:27   ` Paolo Bonzini
2022-06-23 23:48   ` Sean Christopherson
2022-06-23 23:48     ` Sean Christopherson
2022-06-23 23:48     ` Sean Christopherson
2022-06-22 19:27 ` [PATCH v7 21/23] KVM: Allow for different capacities in kvm_mmu_memory_cache structs Paolo Bonzini
2022-06-22 19:27   ` Paolo Bonzini
2022-06-22 19:27   ` Paolo Bonzini
2022-06-29 12:34   ` Anup Patel
2022-06-29 12:34     ` Anup Patel
2022-06-29 12:34     ` Anup Patel
2022-06-22 19:27 ` [PATCH v7 22/23] KVM: x86/mmu: Extend Eager Page Splitting to nested MMUs Paolo Bonzini
2022-06-22 19:27   ` Paolo Bonzini
2022-06-22 19:27   ` Paolo Bonzini
2022-06-23 16:17   ` David Matlack
2022-06-23 16:17     ` David Matlack
2022-06-23 16:17     ` David Matlack
2022-06-23 19:48     ` Sean Christopherson [this message]
2022-06-23 19:48       ` Sean Christopherson
2022-06-23 19:48       ` Sean Christopherson
2022-06-23 22:36       ` David Matlack
2022-06-23 22:36         ` David Matlack
2022-06-23 22:36         ` David Matlack
2022-06-22 19:27 ` [PATCH v7 23/23] KVM: x86/mmu: Avoid unnecessary flush on eager page split Paolo Bonzini
2022-06-22 19:27   ` Paolo Bonzini
2022-06-22 19:27   ` Paolo Bonzini
2022-06-23 23:50 ` [PATCH v7 00/23] KVM: Extend Eager Page Splitting to the shadow MMU David Matlack
2022-06-23 23:50   ` David Matlack
2022-06-23 23:50   ` David Matlack

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:22b87007eff dfblob:90d6195edcf dfblob:22b87007eff
dfblob:90d6195edcf dfblob:22b87007eff dfblob:90d6195edcf )
 OR (
bs:"Re: [PATCH v7 22/23] KVM: x86/mmu: Extend Eager Page Splitting to nested MMUs" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YrTDcrsn0/+alpzf@google.com \
    --to=seanjc@google.com \
    --cc=kvm-riscv@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.