Re: [PATCH 2/3] KVM: x86/mmu: Add lockdep assert to enforce safe usage of kvm_unmap_gfn_range()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Yan Zhao <yan.y.zhao@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/3] KVM: x86/mmu: Add lockdep assert to enforce safe usage of kvm_unmap_gfn_range()
Date: Fri, 11 Oct 2024 14:22:32 -0700	[thread overview]
Message-ID: <ZwmXGIVysayOnj-k@google.com> (raw)
In-Reply-To: <Zwi5ogcOiu7aG5hK@yzhao56-desk.sh.intel.com>

On Fri, Oct 11, 2024, Yan Zhao wrote:
> On Thu, Oct 10, 2024 at 09:14:41AM -0700, Sean Christopherson wrote:
> > On Thu, Oct 10, 2024, Yan Zhao wrote:
> > > On Wed, Oct 09, 2024 at 12:23:44PM -0700, Sean Christopherson wrote:
> > > > Add a lockdep assertion in kvm_unmap_gfn_range() to ensure that either
> > > > mmu_invalidate_in_progress is elevated, or that the range is being zapped
> > > > due to memslot removal (loosely detected by slots_lock being held).
> > > > Zapping SPTEs without mmu_invalidate_{in_progress,seq} protection is unsafe
> > > > as KVM's page fault path snapshots state before acquiring mmu_lock, and
> > > > thus can create SPTEs with stale information if vCPUs aren't forced to
> > > > retry faults (due to seeing an in-progress or past MMU invalidation).
> > > > 
> > > > Memslot removal is a special case, as the memslot is retrieved outside of
> > > > mmu_invalidate_seq, i.e. doesn't use the "standard" protections, and
> > > > instead relies on SRCU synchronization to ensure any in-flight page faults
> > > > are fully resolved before zapping SPTEs.
> > > > 
> > > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > > ---
> > > >  arch/x86/kvm/mmu/mmu.c | 10 ++++++++++
> > > >  1 file changed, 10 insertions(+)
> > > > 
> > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > index 09494d01c38e..c6716fd3666f 100644
> > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > @@ -1556,6 +1556,16 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
> > > >  {
> > > >  	bool flush = false;
> > > >  
> > > > +	/*
> > > > +	 * To prevent races with vCPUs faulting in a gfn using stale data,
> > > > +	 * zapping a gfn range must be protected by mmu_invalidate_in_progress
> > > > +	 * (and mmu_invalidate_seq).  The only exception is memslot deletion,
> > > > +	 * in which case SRCU synchronization ensures SPTEs a zapped after all
> > > > +	 * vCPUs have unlocked SRCU and are guaranteed to see the invalid slot.
> > > > +	 */
> > > > +	lockdep_assert_once(kvm->mmu_invalidate_in_progress ||
> > > > +			    lockdep_is_held(&kvm->slots_lock));
> > > > +
> > > Is the detection of slots_lock too loose?
> > 
> > Yes, but I can't think of an easy way to tighten it.  My original thought was to
> > require range->slot to be invalid, but KVM (correctly) passes in the old, valid
> > memslot to kvm_arch_flush_shadow_memslot().
> > 
> > The goal with the assert is to detect as many bugs as possible, without adding
> > too much complexity, and also to document the rules for using kvm_unmap_gfn_range().
> > 
> > Actually, we can tighten the check, by verifying that the slot being unmapped is
> > valid, but that the slot that KVM sees is invalid.  I'm not sure I love it though,
> > as it's absurdly specific.
> Right. It doesn't reflect the wait in kvm_swap_active_memslots() for the old
> slot.
> 
>   CPU 0                  CPU 1
> 1. fault on old begins
>                        2. swap to new
> 		       3. zap old
> 4. fault on old ends
> 
> Without CPU 1 waiting for 1&4 complete between 2&3, stale data is still
> possible.
> 
> So, the detection in kvm_memslot_is_being_invalidated() only indicates the
> caller is from kvm_arch_flush_shadow_memslot() with current code.

Yep, which is why I don't love it.

> Given that, how do you feel about passing in a "bool is_flush_slot" to indicate
> the caller and asserting?

I like it even less than the ugliness I proposed :-)  It'd basically be a "I pinky
swear I know what I'm doing" flag, and I think the downsides of having true/false
literals in the code would outweigh the upside of the precise assertion.

next prev parent reply	other threads:[~2024-10-11 21:22 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-09 19:23 [PATCH 0/3] KVM: x86/mmu: Don't zap "direct" non-leaf SPTEs on memslot removal Sean Christopherson
2024-10-09 19:23 ` [PATCH 1/3] KVM: x86/mmu: Zap only SPs that shadow gPTEs when deleting memslot Sean Christopherson
2024-10-10  7:59   ` Yan Zhao
2024-10-09 19:23 ` [PATCH 2/3] KVM: x86/mmu: Add lockdep assert to enforce safe usage of kvm_unmap_gfn_range() Sean Christopherson
2024-10-10  7:01   ` Yan Zhao
2024-10-10 16:14     ` Sean Christopherson
2024-10-11  5:37       ` Yan Zhao
2024-10-11 21:22         ` Sean Christopherson [this message]
2024-10-09 19:23 ` [PATCH 3/3] KVM: x86: Clean up documentation for KVM_X86_QUIRK_SLOT_ZAP_ALL Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZwmXGIVysayOnj-k@google.com \
    --to=seanjc@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.