From: Sean Christopherson <seanjc@google.com>
To: James Houghton <jthoughton@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
Oliver Upton <oliver.upton@linux.dev>,
Marc Zyngier <maz@kernel.org>, Peter Xu <peterx@redhat.com>
Subject: Re: [PATCH 19/22] KVM: x86/mmu: Add infrastructure to allow walking rmaps outside of mmu_lock
Date: Wed, 11 Sep 2024 08:33:53 -0700 [thread overview]
Message-ID: <ZuG4YYzozOddPRCm@google.com> (raw)
In-Reply-To: <CADrL8HV1Erpg-D4LzuRHUk7dg6mvex8oQz5pBzwO7A3OjB8Uvw@mail.gmail.com>
On Tue, Sep 10, 2024, James Houghton wrote:
> On Mon, Sep 9, 2024 at 6:42 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Mon, Sep 09, 2024, James Houghton wrote:
> > > I take back what I said about this working on x86. I think it's
> > > possible for there to be a race.
> > >
> > > Say...
> > >
> > > 1. T1 modifies pte_list_desc then unlocks kvm_rmap_unlock().
> > > 2. T2 then locks kvm_rmap_lock_readonly().
> > >
> > > The modifications that T1 has made are not guaranteed to be visible to
> > > T2 unless T1 has an smp_wmb() (or equivalent) after the modfication
> > > and T2 has an smp_rmb() before reading the data.
> > >
> > > Now the way you had it, T2, because it uses try_cmpxchg() with full
> > > ordering, will effectively do a smp_rmb(). But T1 only does an
> > > smp_wmb() *after dropping the mmu_lock*, so there is a race. While T1
> > > still holds the mmu_lock but after releasing the kvm_rmap_lock(), T2
> > > may enter its critical section and then *later* observe the changes
> > > that T1 made.
> > >
> > > Now this is impossible on x86 (IIUC) if, in the compiled list of
> > > instructions, T1's writes occur in the same order that we have written
> > > them in C. I'm not sure if WRITE_ONCE guarantees that this reordering
> > > at compile time is forbidden.
> > >
> > > So what I'm saying is:
> > >
> > > 1. kvm_rmap_unlock() must have an smp_wmb().
> >
> > No, because beating a dead horse, this is not generic code, this is x86.
>
> What prevents the compiler from reordering (non-atomic, non-volatile)
> stores that happen before WRITE_ONCE() in kvm_rmap_unlock() to after
> the WRITE_ONCE()?
Oof, right, nothing. Which is why __smp_store_release() has an explicit
barrier() before its WRITE_ONCE().
> IMV, such a reordering is currently permitted[1] (i.e., a barrier() is
> missing), and should the compiler choose to do this, the lock will not
> function correctly.
>
> > If kvm_rmap_head.val were an int, i.e. could be unionized with an atomic_t, then
> > I wouldn't be opposed to doing this in the locking code to document things:
> >
> > s/READ_ONCE/atomic_read_acquire
> > s/WRITE_ONCE/atomic_set_release
> > s/try_cmpxchg/atomic_cmpxchg_acquire
>
> I think we can use atomic_long_t.
Duh. That's a /facepalm moment.
> It would be really great if we did a substitution like this. That
> would address my above concern about barrier() (atomic_set_release,
> for example, implies a barrier() that we otherwise need to include).
Ya, agreed, not just for warm fuzzies, but because it's necessary to prevent
the compiler from being clever.
next prev parent reply other threads:[~2024-09-11 15:33 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-09 19:43 [PATCH 00/22] KVM: x86/mmu: Allow yielding on mmu_notifier zap Sean Christopherson
2024-08-09 19:43 ` [PATCH 01/22] KVM: selftests: Check for a potential unhandled exception iff KVM_RUN succeeded Sean Christopherson
2024-08-09 19:43 ` [PATCH 02/22] KVM: selftests: Rename max_guest_memory_test to mmu_stress_test Sean Christopherson
2024-08-09 19:43 ` [PATCH 03/22] KVM: selftests: Only muck with SREGS on x86 in mmu_stress_test Sean Christopherson
2024-08-09 19:43 ` [PATCH 04/22] KVM: selftests: Compute number of extra pages needed " Sean Christopherson
2024-09-06 0:03 ` James Houghton
2024-09-06 4:42 ` Sean Christopherson
2024-09-06 18:25 ` James Houghton
2024-08-09 19:43 ` [PATCH 05/22] KVM: selftests: Enable mmu_stress_test on arm64 Sean Christopherson
2024-08-09 19:43 ` [PATCH 06/22] KVM: selftests: Use vcpu_arch_put_guest() in mmu_stress_test Sean Christopherson
2024-08-09 19:43 ` [PATCH 07/22] KVM: selftests: Precisely limit the number of guest loops " Sean Christopherson
2024-08-09 19:43 ` [PATCH 08/22] KVM: selftests: Add a read-only mprotect() phase to mmu_stress_test Sean Christopherson
2024-08-09 19:43 ` [PATCH 09/22] KVM: selftests: Verify KVM correctly handles mprotect(PROT_READ) Sean Christopherson
[not found] ` <CADrL8HXcD--jn1iLeCJycCd3Btv4_rBPxz6NMnTREXfeh0vRZA@mail.gmail.com>
2024-09-07 0:53 ` Sean Christopherson
2024-09-07 2:26 ` James Houghton
2024-09-09 15:49 ` Sean Christopherson
2024-08-09 19:43 ` [PATCH 10/22] KVM: x86/mmu: Move walk_slot_rmaps() up near for_each_slot_rmap_range() Sean Christopherson
2024-08-28 18:20 ` Sean Christopherson
2024-08-09 19:43 ` [PATCH 11/22] KVM: x86/mmu: Plumb a @can_yield parameter into __walk_slot_rmaps() Sean Christopherson
2024-08-09 19:43 ` [PATCH 12/22] KVM: x86/mmu: Add a helper to walk and zap rmaps for a memslot Sean Christopherson
2024-08-09 19:43 ` [PATCH 13/22] KVM: x86/mmu: Honor NEED_RESCHED when zapping rmaps and blocking is allowed Sean Christopherson
2024-08-09 19:43 ` [PATCH 14/22] KVM: x86/mmu: Morph kvm_handle_gfn_range() into an aging specific helper Sean Christopherson
2024-08-12 21:53 ` David Matlack
2024-08-12 21:58 ` David Matlack
2024-08-09 19:43 ` [PATCH 15/22] KVM: x86/mmu: Fold mmu_spte_age() into kvm_rmap_age_gfn_range() Sean Christopherson
2024-08-09 19:43 ` [PATCH 16/22] KVM: x86/mmu: Add KVM_RMAP_MANY to replace open coded '1' and '1ul' literals Sean Christopherson
2024-08-09 19:43 ` [PATCH 17/22] KVM: x86/mmu: Refactor low level rmap helpers to prep for walking w/o mmu_lock Sean Christopherson
2024-08-09 19:43 ` [PATCH 18/22] KVM: x86/mmu: Use KVM_PAGES_PER_HPAGE() instead of an open coded equivalent Sean Christopherson
2024-08-09 19:43 ` [PATCH 19/22] KVM: x86/mmu: Add infrastructure to allow walking rmaps outside of mmu_lock Sean Christopherson
2024-08-12 8:39 ` Lai Jiangshan
2024-08-12 15:22 ` Sean Christopherson
2024-08-13 6:35 ` Lai Jiangshan
2024-08-13 15:19 ` Sean Christopherson
2024-09-03 20:17 ` James Houghton
2024-09-03 21:27 ` Sean Christopherson
2024-09-03 21:40 ` James Houghton
2024-09-09 19:00 ` James Houghton
2024-09-09 20:02 ` Sean Christopherson
2024-09-09 20:46 ` James Houghton
2024-09-09 22:02 ` Sean Christopherson
2024-09-10 0:28 ` James Houghton
2024-09-10 1:42 ` Sean Christopherson
2024-09-10 21:11 ` James Houghton
2024-09-11 15:33 ` Sean Christopherson [this message]
2024-08-09 19:43 ` [PATCH 20/22] KVM: x86/mmu: Add support for lockless walks of rmap SPTEs Sean Christopherson
2024-08-09 19:43 ` [PATCH 21/22] KVM: x86/mmu: Support rmap walks without holding mmu_lock when aging gfns Sean Christopherson
2024-09-03 20:36 ` James Houghton
2024-09-04 15:48 ` Sean Christopherson
2024-08-09 19:43 ` [PATCH 22/22] ***HACK*** KVM: x86: Don't take " Sean Christopherson
2024-09-10 4:56 ` [PATCH 00/22] KVM: x86/mmu: Allow yielding on mmu_notifier zap Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZuG4YYzozOddPRCm@google.com \
--to=seanjc@google.com \
--cc=jthoughton@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maz@kernel.org \
--cc=oliver.upton@linux.dev \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox