All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Yan Zhao <yan.y.zhao@intel.com>
Cc: pbonzini@redhat.com, rick.p.edgecombe@intel.com,
	 linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH 4/4] KVM: x86/mmu: Free obsolete roots when pre-faulting SPTEs
Date: Mon, 10 Feb 2025 14:41:40 -0800	[thread overview]
Message-ID: <Z6qApByaoCs_Y0eb@google.com> (raw)
In-Reply-To: <Z6bJF8uA9R0x3QGp@yzhao56-desk.sh.intel.com>

On Sat, Feb 08, 2025, Yan Zhao wrote:
> On Fri, Feb 07, 2025 at 07:12:04AM -0800, Sean Christopherson wrote:
> > On Fri, Feb 07, 2025, Yan Zhao wrote:
> > > Always free obsolete roots when pre-faulting SPTEs in case it's called
> > > after a root is invalidated (e.g., by memslot removal) but before any
> > > vcpu_enter_guest() processing of KVM_REQ_MMU_FREE_OBSOLETE_ROOTS.
> > > 
> > > Lack of kvm_mmu_free_obsolete_roots() in this scenario can lead to
> > > kvm_mmu_reload() failing to load a new root if the current root hpa is an
> > > obsolete root (which is not INVALID_PAGE). Consequently,
> > > kvm_arch_vcpu_pre_fault_memory() will retry infinitely due to the checking
> > > of is_page_fault_stale().
> > > 
> > > It's safe to call kvm_mmu_free_obsolete_roots() even if there are no
> > > obsolete roots or if it's called a second time when vcpu_enter_guest()
> > > later processes KVM_REQ_MMU_FREE_OBSOLETE_ROOTS. This is because
> > > kvm_mmu_free_obsolete_roots() sets an obsolete root to INVALID_PAGE and
> > > will do nothing to an INVALID_PAGE.
> > 
> > Why is userspace changing memslots while prefaulting?
> It currently only exists in the kvm selftest (written by myself...)
> Not sure if there's any real use case like this.

It's decidedly odd.  I asked, because maybe there's a way we can disallow the
scenario.  Doing that without making things more complex than simply handling
obsolete roots is probably a fool's errand though.

> > > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > > ---
> > >  arch/x86/kvm/mmu/mmu.c | 5 +++++
> > >  1 file changed, 5 insertions(+)
> > > 
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index 47fd3712afe6..72f68458049a 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -4740,7 +4740,12 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
> > >  	/*
> > >  	 * reload is efficient when called repeatedly, so we can do it on
> > >  	 * every iteration.
> > > +	 * Before reload, free obsolete roots in case the prefault is called
> > > +	 * after a root is invalidated (e.g., by memslot removal) but
> > > +	 * before any vcpu_enter_guest() processing of
> > > +	 * KVM_REQ_MMU_FREE_OBSOLETE_ROOTS.
> > >  	 */
> > > +	kvm_mmu_free_obsolete_roots(vcpu);
> > >  	r = kvm_mmu_reload(vcpu);
> > >  	if (r)
> > >  		return r;
> > 
> > I would prefer to do check for obsolete roots in kvm_mmu_reload() itself, but
> Yes, it's better!
> I previously considered doing in this way, but I was afraid to introduce
> overhead (the extra compare) to kvm_mmu_reload(), which is called quite
> frequently.
> 
> But maybe we can remove the check in vcpu_enter_guest() to reduce the overhead?
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index b2d9a16fd4d3..6a1f2780a094 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10731,8 +10731,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>                                 goto out;
>                         }
>                 }
> -               if (kvm_check_request(KVM_REQ_MMU_FREE_OBSOLETE_ROOTS, vcpu))
> -                       kvm_mmu_free_obsolete_roots(vcpu);
>                 if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu))
>                         __kvm_migrate_timers(vcpu);
>                 if (kvm_check_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu))
> 
> > keep the main kvm_check_request() so that the common case handles the resulting
> > TLB flush without having to loop back around in vcpu_enter_guest().
> Hmm, I'm a little confused.
> What's is the resulting TLB flush?

For the common case where KVM_REQ_MMU_FREE_OBSOLETE_ROOTS is pending before
vcpu_enter_guest, kvm_mmu_free_obsolete_roots() may trigger KVM_REQ_TLB_FLUSH
via kvm_mmu_commit_zap_page().  Processing KVM_REQ_MMU_FREE_OBSOLETE_ROOTS before
KVM_REQ_TLB_FLUSH means vcpu_enter_guest() doesn't have to "abort" and redo the
whole loop (the newly pending request won't be detected until kvm_vcpu_exit_request(),
which isn't that late in the entry sequence, but there is a decent amount of work
that needs to be undone).

On the other hand, the cost of kvm_check_request(), especially a check that's
guarded by kvm_request_pending(), is negligible.

That said, obsolete roots shouldn't actually require a TLB flush.  E.g. the TDP
MMU hasn't flushed invalid roots since commit fcdffe97f80e ("KVM: x86/mmu: Don't
do TLB flush when zappings SPTEs in invalid roots").  I'd have to think more about
whether or not that's safe/correct for the shadow MMU though.

For this case, I think it makes sense to just add the check in kvm_mmu_reload().

  reply	other threads:[~2025-02-10 22:41 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-07  3:06 [PATCH 0/4] Small changes related to prefetch and spurious faults Yan Zhao
2025-02-07  3:07 ` [PATCH 1/4] KVM: x86/mmu: Further check old SPTE is leaf for spurious prefetch fault Yan Zhao
2025-02-07  3:08 ` [PATCH 2/4] KVM: x86/tdp_mmu: Merge the prefetch into the is_access_allowed() check Yan Zhao
2025-02-07 15:03   ` Sean Christopherson
2025-02-08  2:29     ` Yan Zhao
2025-02-10 22:17       ` Sean Christopherson
2025-02-07  3:09 ` [PATCH 3/4] KVM: x86/mmu: Make sure pfn is not changed for spurious fault Yan Zhao
2025-02-07 15:07   ` Sean Christopherson
2025-02-08  2:37     ` Yan Zhao
2025-02-10 22:23       ` Sean Christopherson
2025-02-11  6:48         ` Yan Zhao
2025-02-07  3:09 ` [PATCH 4/4] KVM: x86/mmu: Free obsolete roots when pre-faulting SPTEs Yan Zhao
2025-02-07 15:12   ` Sean Christopherson
2025-02-08  3:01     ` Yan Zhao
2025-02-10 22:41       ` Sean Christopherson [this message]
2025-02-11  5:38         ` Yan Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z6qApByaoCs_Y0eb@google.com \
    --to=seanjc@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.