From: Sean Christopherson <seanjc@google.com>
To: James Houghton <jthoughton@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
Oliver Upton <oliver.upton@linux.dev>,
Marc Zyngier <maz@kernel.org>, Peter Xu <peterx@redhat.com>
Subject: Re: [PATCH 19/22] KVM: x86/mmu: Add infrastructure to allow walking rmaps outside of mmu_lock
Date: Mon, 9 Sep 2024 13:02:23 -0700 [thread overview]
Message-ID: <Zt9UT74XkezVpTuK@google.com> (raw)
In-Reply-To: <CADrL8HWACwbzraG=MbDoORJ8ramDxb-h9yb0p4nx9-wq4o3c6A@mail.gmail.com>
On Mon, Sep 09, 2024, James Houghton wrote:
> On Fri, Aug 9, 2024 at 12:44 PM Sean Christopherson <seanjc@google.com> wrote:
> > +/*
> > + * rmaps and PTE lists are mostly protected by mmu_lock (the shadow MMU always
> > + * operates with mmu_lock held for write), but rmaps can be walked without
> > + * holding mmu_lock so long as the caller can tolerate SPTEs in the rmap chain
> > + * being zapped/dropped _while the rmap is locked_.
> > + *
> > + * Other than the KVM_RMAP_LOCKED flag, modifications to rmap entries must be
> > + * done while holding mmu_lock for write. This allows a task walking rmaps
> > + * without holding mmu_lock to concurrently walk the same entries as a task
> > + * that is holding mmu_lock but _not_ the rmap lock. Neither task will modify
> > + * the rmaps, thus the walks are stable.
> > + *
> > + * As alluded to above, SPTEs in rmaps are _not_ protected by KVM_RMAP_LOCKED,
> > + * only the rmap chains themselves are protected. E.g. holding an rmap's lock
> > + * ensures all "struct pte_list_desc" fields are stable.
>
> This last sentence makes me think we need to be careful about memory ordering.
>
> > + */
> > +#define KVM_RMAP_LOCKED BIT(1)
> > +
> > +static unsigned long kvm_rmap_lock(struct kvm_rmap_head *rmap_head)
> > +{
> > + unsigned long old_val, new_val;
> > +
> > + old_val = READ_ONCE(rmap_head->val);
> > + if (!old_val)
> > + return 0;
> > +
> > + do {
> > + /*
> > + * If the rmap is locked, wait for it to be unlocked before
> > + * trying acquire the lock, e.g. to bounce the cache line.
> > + */
> > + while (old_val & KVM_RMAP_LOCKED) {
> > + old_val = READ_ONCE(rmap_head->val);
> > + cpu_relax();
> > + }
> > +
> > + /*
> > + * Recheck for an empty rmap, it may have been purged by the
> > + * task that held the lock.
> > + */
> > + if (!old_val)
> > + return 0;
> > +
> > + new_val = old_val | KVM_RMAP_LOCKED;
> > + } while (!try_cmpxchg(&rmap_head->val, &old_val, new_val));
>
> I think we (technically) need an smp_rmb() here. I think cmpxchg
> implicitly has that on x86 (and this code is x86-only), but should we
> nonetheless document that we need smp_rmb() (if it indeed required)?
> Perhaps we could/should condition the smp_rmb() on `if (old_val)`.
Hmm, no, not smp_rmb(). If anything, the appropriate barrier here would be
smp_mb__after_spinlock(), but I'm pretty sure even that is misleading, and arguably
even wrong.
For the !old_val case, there is a address/data dependency that can't be broken by
the CPU without violating the x86 memory model (all future actions with relevant
memory loads depend on rmap_head->val being non-zero). And AIUI, in the Linux
kernel memory model, READ_ONCE() is responsible for ensuring that the address
dependency can't be morphed into a control dependency by the compiler and
subsequently reordered by the CPU.
I.e. even if this were arm64, ignoring the LOCK CMPXCHG path for the moment, I
don't _think_ an smp_{r,w}mb() pair would be needed, as arm64's definition of
__READ_ONCE() promotes the operation to an acquire.
Back to the LOCK CMPXCHG path, KVM_RMAP_LOCKED implements a rudimentary spinlock,
hence my smp_mb__after_spinlock() suggestion. Though _because_ it's a spinlock,
the rmaps are fully protected by the critical section. And for the SPTEs, there
is no required ordering. The reader (aging thread) can observe a !PRESENT or a
PRESENT SPTE, and must be prepared for either. I.e. there is no requirement that
the reader observe a PRESENT SPTE if there is a valid rmap.
So, unless I'm missing something, I would prefer to not add a smp_mb__after_spinlock(),
even though it's a nop on x86 (unless KCSAN_WEAK_MEMORY=y), because it suggests
an ordering requirement that doesn't exist.
> kvm_rmap_lock_readonly() should have an smb_rmb(), but it seems like
> adding it here will do the right thing for the read-only lock side.
>
> > +
> > + /* Return the old value, i.e. _without_ the LOCKED bit set. */
> > + return old_val;
> > +}
> > +
> > +static void kvm_rmap_unlock(struct kvm_rmap_head *rmap_head,
> > + unsigned long new_val)
> > +{
> > + WARN_ON_ONCE(new_val & KVM_RMAP_LOCKED);
>
> Same goes with having an smp_wmb() here. Is it necessary? If so,
> should it at least be documented?
>
> And this is *not* necessary for kvm_rmap_unlock_readonly(), IIUC.
>
> > + WRITE_ONCE(rmap_head->val, new_val);
> > +}
next prev parent reply other threads:[~2024-09-09 20:02 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-09 19:43 [PATCH 00/22] KVM: x86/mmu: Allow yielding on mmu_notifier zap Sean Christopherson
2024-08-09 19:43 ` [PATCH 01/22] KVM: selftests: Check for a potential unhandled exception iff KVM_RUN succeeded Sean Christopherson
2024-08-09 19:43 ` [PATCH 02/22] KVM: selftests: Rename max_guest_memory_test to mmu_stress_test Sean Christopherson
2024-08-09 19:43 ` [PATCH 03/22] KVM: selftests: Only muck with SREGS on x86 in mmu_stress_test Sean Christopherson
2024-08-09 19:43 ` [PATCH 04/22] KVM: selftests: Compute number of extra pages needed " Sean Christopherson
2024-09-06 0:03 ` James Houghton
2024-09-06 4:42 ` Sean Christopherson
2024-09-06 18:25 ` James Houghton
2024-08-09 19:43 ` [PATCH 05/22] KVM: selftests: Enable mmu_stress_test on arm64 Sean Christopherson
2024-08-09 19:43 ` [PATCH 06/22] KVM: selftests: Use vcpu_arch_put_guest() in mmu_stress_test Sean Christopherson
2024-08-09 19:43 ` [PATCH 07/22] KVM: selftests: Precisely limit the number of guest loops " Sean Christopherson
2024-08-09 19:43 ` [PATCH 08/22] KVM: selftests: Add a read-only mprotect() phase to mmu_stress_test Sean Christopherson
2024-08-09 19:43 ` [PATCH 09/22] KVM: selftests: Verify KVM correctly handles mprotect(PROT_READ) Sean Christopherson
[not found] ` <CADrL8HXcD--jn1iLeCJycCd3Btv4_rBPxz6NMnTREXfeh0vRZA@mail.gmail.com>
2024-09-07 0:53 ` Sean Christopherson
2024-09-07 2:26 ` James Houghton
2024-09-09 15:49 ` Sean Christopherson
2024-08-09 19:43 ` [PATCH 10/22] KVM: x86/mmu: Move walk_slot_rmaps() up near for_each_slot_rmap_range() Sean Christopherson
2024-08-28 18:20 ` Sean Christopherson
2024-08-09 19:43 ` [PATCH 11/22] KVM: x86/mmu: Plumb a @can_yield parameter into __walk_slot_rmaps() Sean Christopherson
2024-08-09 19:43 ` [PATCH 12/22] KVM: x86/mmu: Add a helper to walk and zap rmaps for a memslot Sean Christopherson
2024-08-09 19:43 ` [PATCH 13/22] KVM: x86/mmu: Honor NEED_RESCHED when zapping rmaps and blocking is allowed Sean Christopherson
2024-08-09 19:43 ` [PATCH 14/22] KVM: x86/mmu: Morph kvm_handle_gfn_range() into an aging specific helper Sean Christopherson
2024-08-12 21:53 ` David Matlack
2024-08-12 21:58 ` David Matlack
2024-08-09 19:43 ` [PATCH 15/22] KVM: x86/mmu: Fold mmu_spte_age() into kvm_rmap_age_gfn_range() Sean Christopherson
2024-08-09 19:43 ` [PATCH 16/22] KVM: x86/mmu: Add KVM_RMAP_MANY to replace open coded '1' and '1ul' literals Sean Christopherson
2024-08-09 19:43 ` [PATCH 17/22] KVM: x86/mmu: Refactor low level rmap helpers to prep for walking w/o mmu_lock Sean Christopherson
2024-08-09 19:43 ` [PATCH 18/22] KVM: x86/mmu: Use KVM_PAGES_PER_HPAGE() instead of an open coded equivalent Sean Christopherson
2024-08-09 19:43 ` [PATCH 19/22] KVM: x86/mmu: Add infrastructure to allow walking rmaps outside of mmu_lock Sean Christopherson
2024-08-12 8:39 ` Lai Jiangshan
2024-08-12 15:22 ` Sean Christopherson
2024-08-13 6:35 ` Lai Jiangshan
2024-08-13 15:19 ` Sean Christopherson
2024-09-03 20:17 ` James Houghton
2024-09-03 21:27 ` Sean Christopherson
2024-09-03 21:40 ` James Houghton
2024-09-09 19:00 ` James Houghton
2024-09-09 20:02 ` Sean Christopherson [this message]
2024-09-09 20:46 ` James Houghton
2024-09-09 22:02 ` Sean Christopherson
2024-09-10 0:28 ` James Houghton
2024-09-10 1:42 ` Sean Christopherson
2024-09-10 21:11 ` James Houghton
2024-09-11 15:33 ` Sean Christopherson
2024-08-09 19:43 ` [PATCH 20/22] KVM: x86/mmu: Add support for lockless walks of rmap SPTEs Sean Christopherson
2024-08-09 19:43 ` [PATCH 21/22] KVM: x86/mmu: Support rmap walks without holding mmu_lock when aging gfns Sean Christopherson
2024-09-03 20:36 ` James Houghton
2024-09-04 15:48 ` Sean Christopherson
2024-08-09 19:43 ` [PATCH 22/22] ***HACK*** KVM: x86: Don't take " Sean Christopherson
2024-09-10 4:56 ` [PATCH 00/22] KVM: x86/mmu: Allow yielding on mmu_notifier zap Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zt9UT74XkezVpTuK@google.com \
--to=seanjc@google.com \
--cc=jthoughton@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maz@kernel.org \
--cc=oliver.upton@linux.dev \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.