Re: [PATCH 1/2] KVM: x86/mmu: Split NX hugepage recovery flow into TDP and non-TDP flow

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: David Matlack <dmatlack@google.com>
Cc: Vipin Sharma <vipinsh@google.com>,
	pbonzini@redhat.com, kvm@vger.kernel.org,
	 linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] KVM: x86/mmu: Split NX hugepage recovery flow into TDP and non-TDP flow
Date: Mon, 19 Aug 2024 11:31:22 -0700	[thread overview]
Message-ID: <ZsOPepvYXoWVv-_D@google.com> (raw)
In-Reply-To: <CALzav=cFPduBR4pmgnVrgY6q+wufTn_nS-4QDF4yw8uGQkV41Q@mail.gmail.com>

On Mon, Aug 19, 2024, David Matlack wrote:
> On Mon, Aug 19, 2024 at 10:20 AM Vipin Sharma <vipinsh@google.com> wrote:
> >
> > On 2024-08-16 16:29:11, Sean Christopherson wrote:
> > > On Mon, Aug 12, 2024, Vipin Sharma wrote:
> > > > +   list_for_each_entry(sp, &kvm->arch.possible_nx_huge_pages, possible_nx_huge_page_link) {
> > > > +           if (i++ >= max)
> > > > +                   break;
> > > > +           if (is_tdp_mmu_page(sp) == tdp_mmu)
> > > > +                   return sp;
> > > > +   }
> > >
> > > This is silly and wasteful.  E.g. in the (unlikely) case there's one TDP MMU
> > > page amongst hundreds/thousands of shadow MMU pages, this will walk the list
> > > until @max, and then move on to the shadow MMU.
> > >
> > > Why not just use separate lists?
> >
> > Before this patch, NX huge page recovery calculates "to_zap" and then it
> > zaps first "to_zap" pages from the common list. This series is trying to
> > maintain that invarient.

I wouldn't try to maintain any specific behavior in the existing code, AFAIK it's
100% arbitrary and wasn't written with any meaningful sophistication.  E.g. FIFO
is little more than blindly zapping pages and hoping for the best.

> > If we use two separate lists then we have to decide how many pages
> > should be zapped from TDP MMU and shadow MMU list. Few options I can
> > think of:
> >
> > 1. Zap "to_zap" pages from both TDP MMU and shadow MMU list separately.
> >    Effectively, this might double the work for recovery thread.
> > 2. Try zapping "to_zap" page from one list and if there are not enough
> >    pages to zap then zap from the other list. This can cause starvation.
> > 3. Do half of "to_zap" from one list and another half from the other
> >    list. This can lead to situations where only half work is being done
> >    by the recovery worker thread.
> >
> > Option (1) above seems more reasonable to me.
> 
> I vote each should zap 1/nx_huge_pages_recovery_ratio of their
> respective list. i.e. Calculate to_zap separately for each list.

Yeah, I don't have a better idea since this is effectively a quick and dirty
solution to reduce guest jitter.  We can at least add a counter so that the zap
is proportional to the number of pages on each list, e.g. this, and then do the
necessary math in the recovery paths.

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 94e7b5a4fafe..3ff17ff4f78b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1484,6 +1484,8 @@ struct kvm_arch {
         * the code to do so.
         */
        spinlock_t tdp_mmu_pages_lock;
+
+       u64 tdp_mmu_nx_page_splits;
 #endif /* CONFIG_X86_64 */
 
        /*
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 928cf84778b0..b80fe5d4e741 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -870,6 +870,11 @@ void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
        if (!list_empty(&sp->possible_nx_huge_page_link))
                return;
 
+#ifdef CONFIG_X86_64
+       if (is_tdp_mmu_page(sp))
+               ++kvm->arch.tdp_mmu_nx_page_splits;
+#endif
+
        ++kvm->stat.nx_lpage_splits;
        list_add_tail(&sp->possible_nx_huge_page_link,
                      &kvm->arch.possible_nx_huge_pages);
@@ -905,6 +910,10 @@ void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
        if (list_empty(&sp->possible_nx_huge_page_link))
                return;
 
+#ifdef CONFIG_X86_64
+       if (is_tdp_mmu_page(sp))
+               --kvm->arch.tdp_mmu_nx_page_splits;
+#endif
        --kvm->stat.nx_lpage_splits;
        list_del_init(&sp->possible_nx_huge_page_link);

next prev parent reply	other threads:[~2024-08-19 18:31 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-12 17:13 [PATCH 0/2] KVM: x86/mmu: Run NX huge page recovery under MMU read lock Vipin Sharma
2024-08-12 17:13 ` [PATCH 1/2] KVM: x86/mmu: Split NX hugepage recovery flow into TDP and non-TDP flow Vipin Sharma
2024-08-16 23:29   ` Sean Christopherson
2024-08-19 17:20     ` Vipin Sharma
2024-08-19 17:28       ` David Matlack
2024-08-19 18:31         ` Sean Christopherson [this message]
2024-08-19 21:57           ` Vipin Sharma
2024-08-12 17:13 ` [PATCH 2/2] KVM: x86/mmu: Recover NX Huge pages belonging to TDP MMU under MMU read lock Vipin Sharma
2024-08-14  9:33   ` kernel test robot
2024-08-14 18:23     ` Vipin Sharma
2024-08-14 22:50   ` kernel test robot
2024-08-15 16:42     ` Vipin Sharma
2024-08-16 23:38   ` Sean Christopherson
2024-08-19 17:34     ` Vipin Sharma
2024-08-19 22:12       ` Sean Christopherson
2024-08-23 22:38         ` Vipin Sharma
2024-08-26 14:34           ` Sean Christopherson
2024-08-26 19:24             ` Vipin Sharma
2024-08-26 19:51               ` Sean Christopherson
2024-08-19 22:19   ` Sean Christopherson
2024-08-23 19:27     ` Vipin Sharma
2024-08-23 20:45       ` Sean Christopherson
2024-08-23 21:41         ` Vipin Sharma

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:94e7b5a4faf dfblob:3ff17ff4f78 dfblob:928cf84778b
dfblob:b80fe5d4e74 )
 OR (
bs:"Re: [PATCH 1/2] KVM: x86/mmu: Split NX hugepage recovery flow into TDP and non-TDP flow" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZsOPepvYXoWVv-_D@google.com \
    --to=seanjc@google.com \
    --cc=dmatlack@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=vipinsh@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).