Re: [PATCH 2/4] KVM: x86/mmu: Document the "rules" for using host_pfn_mapping_level()

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Mingwei Zhang <mizhang@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/4] KVM: x86/mmu: Document the "rules" for using host_pfn_mapping_level()
Date: Mon, 18 Jul 2022 16:50:16 +0000	[thread overview]
Message-ID: <YtWPSILmAp/0m5eC@google.com> (raw)
In-Reply-To: <YtMIvgfsgIPWMgGM@google.com>

On Sat, Jul 16, 2022, Mingwei Zhang wrote:
> On Fri, Jul 15, 2022, Sean Christopherson wrote:
> > Add a comment to document how host_pfn_mapping_level() can be used safely,
> > as the line between safe and dangerous is quite thin.  E.g. if KVM were
> > to ever support in-place promotion to create huge pages, consuming the
> > level is safe if the caller holds mmu_lock and checks that there's an
> > existing _leaf_ SPTE, but unsafe if the caller only checks that there's a
> > non-leaf SPTE.
> > 
> > Opportunistically tweak the existing comments to explicitly document why
> > KVM needs to use READ_ONCE().
> > 
> > No functional change intended.
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  arch/x86/kvm/mmu/mmu.c | 42 +++++++++++++++++++++++++++++++++++-------
> >  1 file changed, 35 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index bebff1d5acd4..d5b644f3e003 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -2919,6 +2919,31 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep)
> >  	__direct_pte_prefetch(vcpu, sp, sptep);
> >  }
> >  
> > +/*
> > + * Lookup the mapping level for @gfn in the current mm.
> > + *
> > + * WARNING!  Use of host_pfn_mapping_level() requires the caller and the end
> > + * consumer to be tied into KVM's handlers for MMU notifier events!
> Since calling this function won't cause kernel crash now, I guess we can
> remove the warning sign here, but keep the remaining statement since it
> is necessary.

Calling this function won't _directly_ crash the kernel, but improper usage can
most definitely crash the host kernel, or even worse, silently corrupt host and
or guest data.  E.g. if KVM were to race with an mmu_notifier event and incorrectly
map a stale huge page into the guest.

So yes, the function itself is robust, but usage is still very subtle and delicate.

> > + *
> > + * There are several ways to safely use this helper:
> > + *
> > + * - Check mmu_notifier_retry_hva() after grabbing the mapping level, before
> > + *   consuming it.  In this case, mmu_lock doesn't need to be held during the
> > + *   lookup, but it does need to be held while checking the MMU notifier.
> 
> but it does need to be held while checking the MMU notifier and
> consuming the result.

I didn't want to include "consuming the result" because arguably the result is
being consumed while running the guest, and obviously KVM doesn't hold mmu_lock
while running the guest (though I fully acknowledge the above effectively uses
"consume" in the sense of shoving the result into SPTEs).  

> > + *
> > + * - Hold mmu_lock AND ensure there is no in-progress MMU notifier invalidation
> > + *   event for the hva.  This can be done by explicit checking the MMU notifier

s/explicit/explicitly

> > + *   or by ensuring that KVM already has a valid mapping that covers the hva.
> 
> Yes, more specifically, "mmu notifier sequence counter".

Heh, depends on what the reader interprets as "sequence counter".  If the reader
interprets that as the literal sequence counter, mmu_notifier_seq, then this phrasing
is incorrect as mmu_notifier_seq isn't bumped until the invalidation completes,
i.e. it guards against _past_ invalidations, not in-progress validations.

My preference is to intentionally not be precise in describing how to check for an
in-progress invalidation, e.g. so that this comment doesn't need to be updated if
the details change, and to also to try and force developers to do more than copy
and paste if they want to use this helper.

next prev parent reply	other threads:[~2022-07-18 16:50 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-15 23:21 [PATCH 0/4] Huge page related cleanups Sean Christopherson
2022-07-15 23:21 ` [PATCH 1/4] KVM: x86/mmu: Don't require refcounted "struct page" to create huge SPTEs Sean Christopherson
2022-07-16  5:53   ` Mingwei Zhang
2022-07-15 23:21 ` [PATCH 2/4] KVM: x86/mmu: Document the "rules" for using host_pfn_mapping_level() Sean Christopherson
2022-07-16 18:51   ` Mingwei Zhang
2022-07-18 16:50     ` Sean Christopherson [this message]
2022-07-18 20:48       ` Mingwei Zhang
2022-07-15 23:21 ` [PATCH 3/4] KVM: x86/mmu: Don't bottom out on leafs when zapping collapsible SPTEs Sean Christopherson
2022-07-15 23:21 ` [PATCH 4/4] KVM: selftests: Add an option to run vCPUs while disabling dirty logging Sean Christopherson
2022-07-19 18:02 ` [PATCH 0/4] Huge page related cleanups Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YtWPSILmAp/0m5eC@google.com \
    --to=seanjc@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mizhang@google.com \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox