linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Xu <peterx@redhat.com>, Gavin Shan <gshan@redhat.com>,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kselftest@vger.kernel.org, corbet@lwn.net,
	james.morse@arm.com, alexandru.elisei@arm.com,
	suzuki.poulose@arm.com, oliver.upton@linux.dev,
	catalin.marinas@arm.com, will@kernel.org, shuah@kernel.org,
	seanjc@google.com, dmatlack@google.com, bgardon@google.com,
	ricarkol@google.com, zhenyzha@redhat.com, shan.gavin@gmail.com
Subject: Re: [PATCH v1 1/5] KVM: arm64: Enable ring-based dirty memory tracking
Date: Fri, 26 Aug 2022 16:49:41 +0100	[thread overview]
Message-ID: <8735djvwbu.wl-maz@kernel.org> (raw)
In-Reply-To: <9e7cb09c-82c5-9492-bccd-5511f5bede26@redhat.com>

On Fri, 26 Aug 2022 11:50:24 +0100,
Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> On 8/24/22 00:47, Marc Zyngier wrote:
> >> I definitely don't think I 100% understand all the ordering things since
> >> they're complicated.. but my understanding is that the reset procedure
> >> didn't need memory barrier (unlike pushing, where we have explicit wmb),
> >> because we assumed the userapp is not hostile so logically it should only
> >> modify the flags which is a 32bit field, assuming atomicity guaranteed.
> > Atomicity doesn't guarantee ordering, unfortunately. Take the
> > following example: CPU0 is changing a bunch of flags for GFNs A, B, C,
> > D that exist in the ring in that order, and CPU1 performs an ioctl to
> > reset the page state.
> > 
> > CPU0:
> >      write_flag(A, KVM_DIRTY_GFN_F_RESET)
> >      write_flag(B, KVM_DIRTY_GFN_F_RESET)
> >      write_flag(C, KVM_DIRTY_GFN_F_RESET)
> >      write_flag(D, KVM_DIRTY_GFN_F_RESET)
> >      [...]
> > 
> > CPU1:
> >     ioctl(KVM_RESET_DIRTY_RINGS)
> > 
> > Since CPU0 writes do not have any ordering, CPU1 can observe the
> > writes in a sequence that have nothing to do with program order, and
> > could for example observe that GFN A and D have been reset, but not B
> > and C. This in turn breaks the logic in the reset code (B, C, and D
> > don't get reset), despite userspace having followed the spec to the
> > letter. If each was a store-release (which is the case on x86), it
> > wouldn't be a problem, but nothing calls it in the documentation.
> > 
> > Maybe that's not a big deal if it is expected that each CPU will issue
> > a KVM_RESET_DIRTY_RINGS itself, ensuring that it observe its own
> > writes. But expecting this to work across CPUs without any barrier is
> > wishful thinking.
> 
> Agreed, but that's a problem for userspace to solve.  If userspace
> wants to reset the fields in different CPUs, it has to synchronize
> with its own invoking of the ioctl.

userspace has no choice. It cannot order on its own the reads that the
kernel will do to *other* rings.

> That is, CPU0 must ensure that a ioctl(KVM_RESET_DIRTY_RINGS) is done
> after (in the memory-ordering sense) its last write_flag(D,
> KVM_DIRTY_GFN_F_RESET).  If there's no such ordering, there's no
> guarantee that the write_flag will have any effect.

The problem isn't on CPU0 The problem is that CPU1 does observe
inconsistent data on arm64, and I don't think this difference in
behaviour is acceptable. Nothing documents this, and there is a baked
in assumption that there is a strong ordering between writes as well
as between writes and read.

> The main reason why I preferred a global KVM_RESET_DIRTY_RINGS ioctl
> was because it takes kvm->slots_lock so the execution would be
> serialized anyway.  Turning slots_lock into an rwsem would be even
> worse because it also takes kvm->mmu_lock (since slots_lock is a
> mutex, at least two concurrent invocations won't clash with each other
> on the mmu_lock).

Whatever the reason, the behaviour should be identical on all
architectures. As is is, it only really works on x86, and I contend
this is a bug that needs fixing.

Thankfully, this can be done at zero cost for x86, and at that of a
set of load-acquires on other architectures.

	M.

-- 
Without deviation from the norm, progress is not possible.

  reply	other threads:[~2022-08-26 15:50 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-19  0:55 [PATCH v1 0/5] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
2022-08-19  0:55 ` [PATCH v1 1/5] " Gavin Shan
2022-08-19  8:00   ` Marc Zyngier
2022-08-22  1:58     ` Gavin Shan
2022-08-22 18:55       ` Peter Xu
2022-08-23  3:19         ` Gavin Shan
2022-08-22 21:42       ` Marc Zyngier
2022-08-23  5:22         ` Gavin Shan
2022-08-23 13:58           ` Peter Xu
2022-08-23 19:17             ` Marc Zyngier
2022-08-23 21:20               ` Peter Xu
2022-08-23 22:47                 ` Marc Zyngier
2022-08-23 23:19                   ` Peter Xu
2022-08-24 14:45                     ` Marc Zyngier
2022-08-24 16:21                       ` Peter Xu
2022-08-24 20:57                         ` Marc Zyngier
2022-08-26  6:05                           ` Gavin Shan
2022-08-26 10:50                   ` Paolo Bonzini
2022-08-26 15:49                     ` Marc Zyngier [this message]
2022-08-27  8:27                       ` Paolo Bonzini
2022-08-23 14:44         ` Oliver Upton
2022-08-23 20:35           ` Marc Zyngier
2022-08-26 10:58             ` Paolo Bonzini
2022-08-26 15:28               ` Marc Zyngier
2022-08-30 14:42                 ` Peter Xu
2022-09-02  0:19                   ` Paolo Bonzini
2022-08-19  0:55 ` [PATCH v1 2/5] KVM: selftests: Use host page size to map ring buffer in dirty_log_test Gavin Shan
2022-08-19  0:55 ` [PATCH v1 3/5] KVM: selftests: Dirty host pages " Gavin Shan
2022-08-19  5:28   ` Andrew Jones
2022-08-22  6:29     ` Gavin Shan
2022-08-23  3:09       ` Gavin Shan
2022-08-19  0:56 ` [PATCH v1 4/5] KVM: selftests: Clear dirty ring states between two modes " Gavin Shan
2022-08-19  0:56 ` [PATCH v1 5/5] KVM: selftests: Automate choosing dirty ring size " Gavin Shan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8735djvwbu.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=alexandru.elisei@arm.com \
    --cc=bgardon@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=dmatlack@google.com \
    --cc=gshan@redhat.com \
    --cc=james.morse@arm.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=ricarkol@google.com \
    --cc=seanjc@google.com \
    --cc=shan.gavin@gmail.com \
    --cc=shuah@kernel.org \
    --cc=suzuki.poulose@arm.com \
    --cc=will@kernel.org \
    --cc=zhenyzha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).