From: Sean Christopherson <seanjc@google.com>
To: David Matlack <dmatlack@google.com>
Cc: Vipin Sharma <vipinsh@google.com>,
pbonzini@redhat.com, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] KVM: x86/mmu: Split NX hugepage recovery flow into TDP and non-TDP flow
Date: Mon, 19 Aug 2024 11:31:22 -0700 [thread overview]
Message-ID: <ZsOPepvYXoWVv-_D@google.com> (raw)
In-Reply-To: <CALzav=cFPduBR4pmgnVrgY6q+wufTn_nS-4QDF4yw8uGQkV41Q@mail.gmail.com>
On Mon, Aug 19, 2024, David Matlack wrote:
> On Mon, Aug 19, 2024 at 10:20 AM Vipin Sharma <vipinsh@google.com> wrote:
> >
> > On 2024-08-16 16:29:11, Sean Christopherson wrote:
> > > On Mon, Aug 12, 2024, Vipin Sharma wrote:
> > > > + list_for_each_entry(sp, &kvm->arch.possible_nx_huge_pages, possible_nx_huge_page_link) {
> > > > + if (i++ >= max)
> > > > + break;
> > > > + if (is_tdp_mmu_page(sp) == tdp_mmu)
> > > > + return sp;
> > > > + }
> > >
> > > This is silly and wasteful. E.g. in the (unlikely) case there's one TDP MMU
> > > page amongst hundreds/thousands of shadow MMU pages, this will walk the list
> > > until @max, and then move on to the shadow MMU.
> > >
> > > Why not just use separate lists?
> >
> > Before this patch, NX huge page recovery calculates "to_zap" and then it
> > zaps first "to_zap" pages from the common list. This series is trying to
> > maintain that invarient.
I wouldn't try to maintain any specific behavior in the existing code, AFAIK it's
100% arbitrary and wasn't written with any meaningful sophistication. E.g. FIFO
is little more than blindly zapping pages and hoping for the best.
> > If we use two separate lists then we have to decide how many pages
> > should be zapped from TDP MMU and shadow MMU list. Few options I can
> > think of:
> >
> > 1. Zap "to_zap" pages from both TDP MMU and shadow MMU list separately.
> > Effectively, this might double the work for recovery thread.
> > 2. Try zapping "to_zap" page from one list and if there are not enough
> > pages to zap then zap from the other list. This can cause starvation.
> > 3. Do half of "to_zap" from one list and another half from the other
> > list. This can lead to situations where only half work is being done
> > by the recovery worker thread.
> >
> > Option (1) above seems more reasonable to me.
>
> I vote each should zap 1/nx_huge_pages_recovery_ratio of their
> respective list. i.e. Calculate to_zap separately for each list.
Yeah, I don't have a better idea since this is effectively a quick and dirty
solution to reduce guest jitter. We can at least add a counter so that the zap
is proportional to the number of pages on each list, e.g. this, and then do the
necessary math in the recovery paths.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 94e7b5a4fafe..3ff17ff4f78b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1484,6 +1484,8 @@ struct kvm_arch {
* the code to do so.
*/
spinlock_t tdp_mmu_pages_lock;
+
+ u64 tdp_mmu_nx_page_splits;
#endif /* CONFIG_X86_64 */
/*
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 928cf84778b0..b80fe5d4e741 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -870,6 +870,11 @@ void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
if (!list_empty(&sp->possible_nx_huge_page_link))
return;
+#ifdef CONFIG_X86_64
+ if (is_tdp_mmu_page(sp))
+ ++kvm->arch.tdp_mmu_nx_page_splits;
+#endif
+
++kvm->stat.nx_lpage_splits;
list_add_tail(&sp->possible_nx_huge_page_link,
&kvm->arch.possible_nx_huge_pages);
@@ -905,6 +910,10 @@ void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
if (list_empty(&sp->possible_nx_huge_page_link))
return;
+#ifdef CONFIG_X86_64
+ if (is_tdp_mmu_page(sp))
+ --kvm->arch.tdp_mmu_nx_page_splits;
+#endif
--kvm->stat.nx_lpage_splits;
list_del_init(&sp->possible_nx_huge_page_link);
next prev parent reply other threads:[~2024-08-19 18:31 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-12 17:13 [PATCH 0/2] KVM: x86/mmu: Run NX huge page recovery under MMU read lock Vipin Sharma
2024-08-12 17:13 ` [PATCH 1/2] KVM: x86/mmu: Split NX hugepage recovery flow into TDP and non-TDP flow Vipin Sharma
2024-08-16 23:29 ` Sean Christopherson
2024-08-19 17:20 ` Vipin Sharma
2024-08-19 17:28 ` David Matlack
2024-08-19 18:31 ` Sean Christopherson [this message]
2024-08-19 21:57 ` Vipin Sharma
2024-08-12 17:13 ` [PATCH 2/2] KVM: x86/mmu: Recover NX Huge pages belonging to TDP MMU under MMU read lock Vipin Sharma
2024-08-14 9:33 ` kernel test robot
2024-08-14 18:23 ` Vipin Sharma
2024-08-14 22:50 ` kernel test robot
2024-08-15 16:42 ` Vipin Sharma
2024-08-16 23:38 ` Sean Christopherson
2024-08-19 17:34 ` Vipin Sharma
2024-08-19 22:12 ` Sean Christopherson
2024-08-23 22:38 ` Vipin Sharma
2024-08-26 14:34 ` Sean Christopherson
2024-08-26 19:24 ` Vipin Sharma
2024-08-26 19:51 ` Sean Christopherson
2024-08-19 22:19 ` Sean Christopherson
2024-08-23 19:27 ` Vipin Sharma
2024-08-23 20:45 ` Sean Christopherson
2024-08-23 21:41 ` Vipin Sharma
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZsOPepvYXoWVv-_D@google.com \
--to=seanjc@google.com \
--cc=dmatlack@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=vipinsh@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.