From: Oliver Upton <oliver.upton@linux.dev>
To: Gavin Shan <gshan@redhat.com>
Cc: kvmarm@lists.linux.dev, kvm@vger.kernel.org,
kvmarm@lists.cs.columbia.edu, andrew.jones@linux.dev,
ajones@ventanamicro.com, maz@kernel.org, bgardon@google.com,
catalin.marinas@arm.com, dmatlack@google.com, will@kernel.org,
pbonzini@redhat.com, peterx@redhat.com, seanjc@google.com,
james.morse@arm.com, shuah@kernel.org, suzuki.poulose@arm.com,
alexandru.elisei@arm.com, zhenyzha@redhat.com,
shan.gavin@gmail.com
Subject: Re: [PATCH v7 4/9] KVM: Support dirty ring in conjunction with bitmap
Date: Fri, 4 Nov 2022 01:06:06 +0000 [thread overview]
Message-ID: <Y2RlfkyQMCtD6Rbh@google.com> (raw)
In-Reply-To: <d5b86a73-e030-7ce3-e5f3-301f4f505323@redhat.com>
On Fri, Nov 04, 2022 at 08:12:21AM +0800, Gavin Shan wrote:
> Hi Oliver,
>
> On 11/4/22 7:32 AM, Oliver Upton wrote:
> > On Mon, Oct 31, 2022 at 08:36:16AM +0800, Gavin Shan wrote:
> > > ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
> > > enabled. It's conflicting with that ring-based dirty page tracking always
> > > requires a running VCPU context.
> > >
> > > Introduce a new flavor of dirty ring that requires the use of both VCPU
> > > dirty rings and a dirty bitmap. The expectation is that for non-VCPU
> > > sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
> > > the dirty bitmap. Userspace should scan the dirty bitmap before migrating
> > > the VM to the target.
> > >
> > > Use an additional capability to advertise this behavior. The newly added
> > > capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
> > > KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
> > > capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
> >
> > Whatever ordering requirements we settle on between these capabilities
> > needs to be documented as well.
> >
> > [...]
> >
>
> It's mentioned in 'Documentation/virt/kvm/api.rst' as below.
>
> After using the dirty rings, the userspace needs to detect the capability
> of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
> need to be backed by per-slot bitmaps. With this capability advertised
> and supported, it means the architecture can dirty guest pages without
> vcpu/ring context, so that some of the dirty information will still be
> maintained in the bitmap structure.
>
> The description may be not obvious about the ordering. For this, I can
> add the following sentence at end of the section.
>
> The capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP can't be enabled
> until the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL has been enabled.
>
> > > @@ -4588,6 +4594,13 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
> > > return -EINVAL;
> > > return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
> > > + case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
> > > + if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
> > > + !kvm->dirty_ring_size)
> >
> > I believe this ordering requirement is problematic, as it piles on top
> > of an existing problem w.r.t. KVM_CAP_DIRTY_LOG_RING v. memslot
> > creation.
> >
> > Example:
> > - Enable KVM_CAP_DIRTY_LOG_RING
> > - Create some memslots w/ dirty logging enabled (note that the bitmap
> > is _not_ allocated)
> > - Enable KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
> > - Save ITS tables and get a NULL dereference in
> > mark_page_dirty_in_slot():
> >
> > if (vcpu && kvm->dirty_ring_size)
> > kvm_dirty_ring_push(&vcpu->dirty_ring,
> > slot, rel_gfn);
> > else
> > -------> set_bit_le(rel_gfn, memslot->dirty_bitmap);
> >
> > Similarly, KVM may unnecessarily allocate bitmaps if dirty logging is
> > enabled on memslots before KVM_CAP_DIRTY_LOG_RING is enabled.
> >
> > You could paper over this issue by disallowing DIRTY_RING_WITH_BITMAP if
> > DIRTY_LOG_RING has already been enabled, but the better approach would
> > be to explicitly check kvm_memslots_empty() such that the real
> > dependency is obvious. Peter, hadn't you mentioned something about
> > checking against memslots in an earlier revision?
> >
>
> The userspace (QEMU) needs to ensure that no dirty bitmap is created
> before the capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP is enabled.
> It's unknown by QEMU that vgic/its is used when KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> is enabled.
I'm not worried about what QEMU (or any particular VMM for that matter)
does with the UAPI. The problem is this patch provides a trivial way for
userspace to cause a NULL dereference in the kernel. Imposing ordering
between the cap and memslot creation avoids the problem altogether.
So, looking at your example:
> kvm_initialization
> enable_KVM_CAP_DIRTY_LOG_RING_ACQ_REL // Where KVM_CAP_DIRTY_LOG_RING is enabled
> board_initialization // Where QEMU knows if vgic/its is used
Is it possible that QEMU could hoist enabling RING_WITH_BITMAP here?
Based on your description QEMU has decided to use the vGIC ITS but
hasn't yet added any memslots.
> add_memory_slots
> kvm_post_initialization
> enable_KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
> :
> start_migration
> enable_dirty_page_tracking
> create_dirty_bitmap // With KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP enabled
Just to make sure we're on the same page, there's two issues:
(1) If DIRTY_LOG_RING is enabled before memslot creation and
RING_WITH_BITMAP is enabled after memslots have been created w/
dirty logging enabled, memslot->dirty_bitmap == NULL and the
kernel will fault when attempting to save the ITS tables.
(2) Not your code, but a similar issue. If DIRTY_LOG_RING[_ACQ_REL] is
enabled after memslots have been created w/ dirty logging enabled,
memslot->dirty_bitmap != NULL and that memory is wasted until the
memslot is freed.
I don't expect you to fix #2, though I've mentioned it because using the
same approach to #1 and #2 would be nice.
--
Thanks,
Oliver
next prev parent reply other threads:[~2022-11-04 1:06 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-31 0:36 [PATCH v7 0/9] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
2022-10-31 0:36 ` [PATCH v7 1/9] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL Gavin Shan
2022-11-01 19:39 ` Sean Christopherson
2022-11-02 14:29 ` Peter Xu
2022-11-02 15:58 ` Marc Zyngier
2022-11-02 16:11 ` Sean Christopherson
2022-11-02 16:44 ` Marc Zyngier
2022-11-03 0:44 ` Gavin Shan
2022-11-02 16:23 ` Peter Xu
2022-11-02 16:33 ` Sean Christopherson
2022-11-02 16:43 ` Peter Xu
2022-11-02 16:48 ` Marc Zyngier
2022-11-02 14:31 ` Marc Zyngier
2022-10-31 0:36 ` [PATCH v7 2/9] KVM: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h Gavin Shan
2022-10-31 0:36 ` [PATCH v7 3/9] KVM: Check KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL} prior to enabling them Gavin Shan
2022-10-31 9:18 ` Oliver Upton
2022-10-31 0:36 ` [PATCH v7 4/9] KVM: Support dirty ring in conjunction with bitmap Gavin Shan
2022-11-03 19:33 ` Peter Xu
2022-11-03 23:32 ` Oliver Upton
2022-11-04 0:12 ` Gavin Shan
2022-11-04 1:06 ` Oliver Upton [this message]
2022-11-04 6:57 ` Gavin Shan
2022-11-04 20:12 ` Oliver Upton
2022-11-04 21:57 ` Gavin Shan
2022-11-04 22:23 ` Oliver Upton
2022-10-31 0:36 ` [PATCH v7 5/9] KVM: arm64: Improve no-running-vcpu report for dirty ring Gavin Shan
2022-10-31 9:08 ` Oliver Upton
2022-10-31 23:08 ` Gavin Shan
2022-11-02 17:18 ` Marc Zyngier
2022-10-31 0:36 ` [PATCH v7 6/9] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
2022-10-31 0:36 ` [PATCH v7 7/9] KVM: selftests: Use host page size to map ring buffer in dirty_log_test Gavin Shan
2022-10-31 0:36 ` [PATCH v7 8/9] KVM: selftests: Clear dirty ring states between two modes " Gavin Shan
2022-10-31 0:36 ` [PATCH v7 9/9] KVM: selftests: Automate choosing dirty ring size " Gavin Shan
2022-10-31 17:23 ` (subset) [PATCH v7 0/9] KVM: arm64: Enable ring-based dirty memory tracking Marc Zyngier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y2RlfkyQMCtD6Rbh@google.com \
--to=oliver.upton@linux.dev \
--cc=ajones@ventanamicro.com \
--cc=alexandru.elisei@arm.com \
--cc=andrew.jones@linux.dev \
--cc=bgardon@google.com \
--cc=catalin.marinas@arm.com \
--cc=dmatlack@google.com \
--cc=gshan@redhat.com \
--cc=james.morse@arm.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=kvmarm@lists.linux.dev \
--cc=maz@kernel.org \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=seanjc@google.com \
--cc=shan.gavin@gmail.com \
--cc=shuah@kernel.org \
--cc=suzuki.poulose@arm.com \
--cc=will@kernel.org \
--cc=zhenyzha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).