From: Sean Christopherson <seanjc@google.com>
To: James Houghton <jthoughton@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
Oliver Upton <oliver.upton@linux.dev>,
Marc Zyngier <maz@kernel.org>, Peter Xu <peterx@redhat.com>
Subject: Re: [PATCH 19/22] KVM: x86/mmu: Add infrastructure to allow walking rmaps outside of mmu_lock
Date: Mon, 9 Sep 2024 13:02:23 -0700 [thread overview]
Message-ID: <Zt9UT74XkezVpTuK@google.com> (raw)
In-Reply-To: <CADrL8HWACwbzraG=MbDoORJ8ramDxb-h9yb0p4nx9-wq4o3c6A@mail.gmail.com>
On Mon, Sep 09, 2024, James Houghton wrote:
> On Fri, Aug 9, 2024 at 12:44 PM Sean Christopherson <seanjc@google.com> wrote:
> > +/*
> > + * rmaps and PTE lists are mostly protected by mmu_lock (the shadow MMU always
> > + * operates with mmu_lock held for write), but rmaps can be walked without
> > + * holding mmu_lock so long as the caller can tolerate SPTEs in the rmap chain
> > + * being zapped/dropped _while the rmap is locked_.
> > + *
> > + * Other than the KVM_RMAP_LOCKED flag, modifications to rmap entries must be
> > + * done while holding mmu_lock for write. This allows a task walking rmaps
> > + * without holding mmu_lock to concurrently walk the same entries as a task
> > + * that is holding mmu_lock but _not_ the rmap lock. Neither task will modify
> > + * the rmaps, thus the walks are stable.
> > + *
> > + * As alluded to above, SPTEs in rmaps are _not_ protected by KVM_RMAP_LOCKED,
> > + * only the rmap chains themselves are protected. E.g. holding an rmap's lock
> > + * ensures all "struct pte_list_desc" fields are stable.
>
> This last sentence makes me think we need to be careful about memory ordering.
>
> > + */
> > +#define KVM_RMAP_LOCKED BIT(1)
> > +
> > +static unsigned long kvm_rmap_lock(struct kvm_rmap_head *rmap_head)
> > +{
> > + unsigned long old_val, new_val;
> > +
> > + old_val = READ_ONCE(rmap_head->val);
> > + if (!old_val)
> > + return 0;
> > +
> > + do {
> > + /*
> > + * If the rmap is locked, wait for it to be unlocked before
> > + * trying acquire the lock, e.g. to bounce the cache line.
> > + */
> > + while (old_val & KVM_RMAP_LOCKED) {
> > + old_val = READ_ONCE(rmap_head->val);
> > + cpu_relax();
> > + }
> > +
> > + /*
> > + * Recheck for an empty rmap, it may have been purged by the
> > + * task that held the lock.
> > + */
> > + if (!old_val)
> > + return 0;
> > +
> > + new_val = old_val | KVM_RMAP_LOCKED;
> > + } while (!try_cmpxchg(&rmap_head->val, &old_val, new_val));
>
> I think we (technically) need an smp_rmb() here. I think cmpxchg
> implicitly has that on x86 (and this code is x86-only), but should we
> nonetheless document that we need smp_rmb() (if it indeed required)?
> Perhaps we could/should condition the smp_rmb() on `if (old_val)`.
Hmm, no, not smp_rmb(). If anything, the appropriate barrier here would be
smp_mb__after_spinlock(), but I'm pretty sure even that is misleading, and arguably
even wrong.
For the !old_val case, there is a address/data dependency that can't be broken by
the CPU without violating the x86 memory model (all future actions with relevant
memory loads depend on rmap_head->val being non-zero). And AIUI, in the Linux
kernel memory model, READ_ONCE() is responsible for ensuring that the address
dependency can't be morphed into a control dependency by the compiler and
subsequently reordered by the CPU.
I.e. even if this were arm64, ignoring the LOCK CMPXCHG path for the moment, I
don't _think_ an smp_{r,w}mb() pair would be needed, as arm64's definition of
__READ_ONCE() promotes the operation to an acquire.
Back to the LOCK CMPXCHG path, KVM_RMAP_LOCKED implements a rudimentary spinlock,
hence my smp_mb__after_spinlock() suggestion. Though _because_ it's a spinlock,
the rmaps are fully protected by the critical section. And for the SPTEs, there
is no required ordering. The reader (aging thread) can observe a !PRESENT or a
PRESENT SPTE, and must be prepared for either. I.e. there is no requirement that
the reader observe a PRESENT SPTE if there is a valid rmap.
So, unless I'm missing something, I would prefer to not add a smp_mb__after_spinlock(),
even though it's a nop on x86 (unless KCSAN_WEAK_MEMORY=y), because it suggests
an ordering requirement that doesn't exist.
> kvm_rmap_lock_readonly() should have an smb_rmb(), but it seems like
> adding it here will do the right thing for the read-only lock side.
>
> > +
> > + /* Return the old value, i.e. _without_ the LOCKED bit set. */
> > + return old_val;
> > +}
> > +
> > +static void kvm_rmap_unlock(struct kvm_rmap_head *rmap_head,
> > + unsigned long new_val)
> > +{
> > + WARN_ON_ONCE(new_val & KVM_RMAP_LOCKED);
>
> Same goes with having an smp_wmb() here. Is it necessary? If so,
> should it at least be documented?
>
> And this is *not* necessary for kvm_rmap_unlock_readonly(), IIUC.
>
> > + WRITE_ONCE(rmap_head->val, new_val);
> > +}
next prev parent reply other threads:[~2024-09-09 20:02 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-09 19:43 [PATCH 00/22] KVM: x86/mmu: Allow yielding on mmu_notifier zap Sean Christopherson
2024-08-09 19:43 ` [PATCH 01/22] KVM: selftests: Check for a potential unhandled exception iff KVM_RUN succeeded Sean Christopherson
2024-08-09 19:43 ` [PATCH 02/22] KVM: selftests: Rename max_guest_memory_test to mmu_stress_test Sean Christopherson
2024-08-09 19:43 ` [PATCH 03/22] KVM: selftests: Only muck with SREGS on x86 in mmu_stress_test Sean Christopherson
2024-08-09 19:43 ` [PATCH 04/22] KVM: selftests: Compute number of extra pages needed " Sean Christopherson
2024-09-06 0:03 ` James Houghton
2024-09-06 4:42 ` Sean Christopherson
2024-09-06 18:25 ` James Houghton
2024-08-09 19:43 ` [PATCH 05/22] KVM: selftests: Enable mmu_stress_test on arm64 Sean Christopherson
2024-08-09 19:43 ` [PATCH 06/22] KVM: selftests: Use vcpu_arch_put_guest() in mmu_stress_test Sean Christopherson
2024-08-09 19:43 ` [PATCH 07/22] KVM: selftests: Precisely limit the number of guest loops " Sean Christopherson
2024-08-09 19:43 ` [PATCH 08/22] KVM: selftests: Add a read-only mprotect() phase to mmu_stress_test Sean Christopherson
2024-08-09 19:43 ` [PATCH 09/22] KVM: selftests: Verify KVM correctly handles mprotect(PROT_READ) Sean Christopherson
[not found] ` <CADrL8HXcD--jn1iLeCJycCd3Btv4_rBPxz6NMnTREXfeh0vRZA@mail.gmail.com>
2024-09-07 0:53 ` Sean Christopherson
2024-09-07 2:26 ` James Houghton
2024-09-09 15:49 ` Sean Christopherson
2024-08-09 19:43 ` [PATCH 10/22] KVM: x86/mmu: Move walk_slot_rmaps() up near for_each_slot_rmap_range() Sean Christopherson
2024-08-28 18:20 ` Sean Christopherson
2024-08-09 19:43 ` [PATCH 11/22] KVM: x86/mmu: Plumb a @can_yield parameter into __walk_slot_rmaps() Sean Christopherson
2024-08-09 19:43 ` [PATCH 12/22] KVM: x86/mmu: Add a helper to walk and zap rmaps for a memslot Sean Christopherson
2024-08-09 19:43 ` [PATCH 13/22] KVM: x86/mmu: Honor NEED_RESCHED when zapping rmaps and blocking is allowed Sean Christopherson
2024-08-09 19:43 ` [PATCH 14/22] KVM: x86/mmu: Morph kvm_handle_gfn_range() into an aging specific helper Sean Christopherson
2024-08-12 21:53 ` David Matlack
2024-08-12 21:58 ` David Matlack
2024-08-09 19:43 ` [PATCH 15/22] KVM: x86/mmu: Fold mmu_spte_age() into kvm_rmap_age_gfn_range() Sean Christopherson
2024-08-09 19:43 ` [PATCH 16/22] KVM: x86/mmu: Add KVM_RMAP_MANY to replace open coded '1' and '1ul' literals Sean Christopherson
2024-08-09 19:43 ` [PATCH 17/22] KVM: x86/mmu: Refactor low level rmap helpers to prep for walking w/o mmu_lock Sean Christopherson
2024-08-09 19:43 ` [PATCH 18/22] KVM: x86/mmu: Use KVM_PAGES_PER_HPAGE() instead of an open coded equivalent Sean Christopherson
2024-08-09 19:43 ` [PATCH 19/22] KVM: x86/mmu: Add infrastructure to allow walking rmaps outside of mmu_lock Sean Christopherson
2024-08-12 8:39 ` Lai Jiangshan
2024-08-12 15:22 ` Sean Christopherson
2024-08-13 6:35 ` Lai Jiangshan
2024-08-13 15:19 ` Sean Christopherson
2024-09-03 20:17 ` James Houghton
2024-09-03 21:27 ` Sean Christopherson
2024-09-03 21:40 ` James Houghton
2024-09-09 19:00 ` James Houghton
2024-09-09 20:02 ` Sean Christopherson [this message]
2024-09-09 20:46 ` James Houghton
2024-09-09 22:02 ` Sean Christopherson
2024-09-10 0:28 ` James Houghton
2024-09-10 1:42 ` Sean Christopherson
2024-09-10 21:11 ` James Houghton
2024-09-11 15:33 ` Sean Christopherson
2024-08-09 19:43 ` [PATCH 20/22] KVM: x86/mmu: Add support for lockless walks of rmap SPTEs Sean Christopherson
2024-08-09 19:43 ` [PATCH 21/22] KVM: x86/mmu: Support rmap walks without holding mmu_lock when aging gfns Sean Christopherson
2024-09-03 20:36 ` James Houghton
2024-09-04 15:48 ` Sean Christopherson
2024-08-09 19:43 ` [PATCH 22/22] ***HACK*** KVM: x86: Don't take " Sean Christopherson
2024-09-10 4:56 ` [PATCH 00/22] KVM: x86/mmu: Allow yielding on mmu_notifier zap Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zt9UT74XkezVpTuK@google.com \
--to=seanjc@google.com \
--cc=jthoughton@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maz@kernel.org \
--cc=oliver.upton@linux.dev \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox