From: Sean Christopherson <seanjc@google.com>
To: Kai Huang <kai.huang@intel.com>
Cc: "pbonzini@redhat.com" <pbonzini@redhat.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Yan Y Zhao <yan.y.zhao@intel.com>
Subject: Re: [PATCH 1/2] KVM: Allow calling mmu_invalidate_retry_hva() without holding mmu_lock
Date: Fri, 8 Sep 2023 10:36:57 -0700 [thread overview]
Message-ID: <ZPtVF5KKxLhMj58n@google.com> (raw)
In-Reply-To: <01d077d89d22cce541784be25c2f5c2143f8b5da.camel@intel.com>
On Thu, Sep 07, 2023, Kai Huang wrote:
> On Thu, 2023-08-24 at 19:07 -0700, Sean Christopherson wrote:
> > Allow checking mmu_invalidate_retry_hva() without holding mmu_lock, even
> > though mmu_lock must be held to guarantee correctness, i.e. to avoid
> > false negatives. Dropping the requirement that mmu_lock be held will
> > allow pre-checking for retry before acquiring mmu_lock, e.g. to avoid
> > contending mmu_lock when the guest is accessing a range that is being
> > invalidated by the host.
> >
> > Contending mmu_lock can have severe negative side effects for x86's TDP
> > MMU when running on preemptible kernels, as KVM will yield from the
> > zapping task (holds mmu_lock for write) when there is lock contention,
> > and yielding after any SPTEs have been zapped requires a VM-scoped TLB
> > flush.
> >
> > Wrap mmu_invalidate_in_progress in READ_ONCE() to ensure that calling
> > mmu_invalidate_retry_hva() in a loop won't put KVM into an infinite loop,
> > e.g. due to caching the in-progress flag and never seeing it go to '0'.
> >
> > Force a load of mmu_invalidate_seq as well, even though it isn't strictly
> > necessary to avoid an infinite loop, as doing so improves the probability
> > that KVM will detect an invalidation that already completed before
> > acquiring mmu_lock and bailing anyways.
>
> Without the READ_ONCE() on mmu_invalidate_seq, with patch 2 and
> mmu_invalidate_retry_hva() inlined IIUC the kvm_faultin_pfn() can look like
> this:
>
> fault->mmu_seq = vcpu->kvm->mmu_invalidate_seq; <-- (1)
> smp_rmb();
>
> ...
> READ_ONCE(vcpu->kvm->mmu_invalidate_in_progress);
> ...
>
> if (vcpu->kvm->mmu_invalidate_seq != fault->mmu_seq) <-- (2)
> ...
>
> Perhaps stupid question :-) Will compiler even believes both vcpu->kvm-
> >mmu_invaludate_seq and fault->mmu_seq are never changed thus eliminates the
> code in 1) and 2)? Or all the barriers between are enough to prevent compiler
Practically speaking, no, there's far too much going on in __kvm_faultin_pfn().
But, KVM _could_ do the freshness check before __kvm_faultin_pfn() since KVM
has the memslot and thus the host virtual addess at that point. I highly doubt
we'll ever do that, but it's possible. At that point, there'd be no spinlocks
or other barries to ensure the load+check wouldn't get elided. That's still
extremely theoretical though.
1) can't be eliminated because acquiring mmu_lock provides enough barries to
prevent the compiler from omitting the load, i.e. the compiler can't omit the
comparison that done inside the critical section. (2) can theoretically be optimized
away by the compiler (when called before acquiring mmu_lock), though it's extremely
unlikely since there's sooo much going on between the load and the check.
> > Note, adding READ_ONCE() isn't entirely free, e.g. on x86, the READ_ONCE()
> > may generate a load into a register instead of doing a direct comparison
> > (MOV+TEST+Jcc instead of CMP+Jcc), but practically speaking the added cost
> > is a few bytes of code and maaaaybe a cycle or three.
...
> > - if (kvm->mmu_invalidate_seq != mmu_seq)
> > +
> > + if (READ_ONCE(kvm->mmu_invalidate_seq) != mmu_seq)
> > return 1;
> > return 0;
> > }
>
> I am not sure how mmu_invalidate_retry_hva() can be called in a loop so looks
> all those should be theoretical thing, but the extra cost should be nearly empty
> as you said.
It's not currently called in a loop, but it wouldn't be wrong for KVM to do
something like the below instead of fully re-entering the guest, in which case
mmu_invalidate_retry_hva() would be called in a loop, just a big loop :-)
And if KVM checked for freshness before resolving the PFN, it'd be possible for
the loop to read and check the sequence in the loop without any barriers that would
guarantee a reload (again, very, very theoretically).
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 1e340098d034..c7617991e290 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5725,11 +5725,13 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
}
if (r == RET_PF_INVALID) {
- r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa,
- lower_32_bits(error_code), false,
- &emulation_type);
- if (KVM_BUG_ON(r == RET_PF_INVALID, vcpu->kvm))
- return -EIO;
+ do {
+ r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa,
+ lower_32_bits(error_code),
+ false, &emulation_type);
+ if (KVM_BUG_ON(r == RET_PF_INVALID, vcpu->kvm))
+ return -EIO;
+ while (r == RET_PF_RETRY);
}
if (r < 0)
next prev parent reply other threads:[~2023-09-08 17:37 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-25 2:07 [PATCH 0/2] KVM: Pre-check mmu_notifier retry on x86 Sean Christopherson
2023-08-25 2:07 ` [PATCH 1/2] KVM: Allow calling mmu_invalidate_retry_hva() without holding mmu_lock Sean Christopherson
2023-09-07 23:30 ` Huang, Kai
2023-09-08 17:36 ` Sean Christopherson [this message]
2023-08-25 2:07 ` [PATCH 2/2] KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing Sean Christopherson
2023-09-06 23:03 ` Huang, Kai
2023-09-07 14:45 ` Sean Christopherson
2023-09-07 22:34 ` Huang, Kai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZPtVF5KKxLhMj58n@google.com \
--to=seanjc@google.com \
--cc=kai.huang@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox